我能在熊猫身上表演动态累加吗？

循环无法避免，但可以使用“numba”的“njit”并行化：

from numba import njit, prange@njitdef dynamic_cumsum(seq, index, max_value):    cumsum = []    running = 0    for i in prange(len(seq)):        if running > max_value: cumsum.append([index[i], running]) running = 0        running += seq[i]     cumsum.append([index[-1], running])    return cumsum

The index is required here, assuming your index is not numeric/monotonically
increasing.

%timeit foo(df, 5)1.24 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)%timeit dynamic_cumsum(df.iloc(axis=1)[0].values, df.index.values, 5)77.2 µs ± 4.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

If the index is of

Int64Index

type, you can shorten this to:

@njitdef dynamic_cumsum2(seq, max_value):    cumsum = []    running = 0    for i in prange(len(seq)):        if running > max_value: cumsum.append([i, running]) running = 0        running += seq[i]     cumsum.append([i, running])    return cumsumlst = dynamic_cumsum2(df.iloc(axis=1)[0].values, 5)pd.Dataframe(lst, columns=['A', 'B']).set_index('A')    BA    3  107   89   4%timeit foo(df, 5)1.23 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)%timeit dynamic_cumsum2(df.iloc(axis=1)[0].values, 5)71.4 µs ± 1.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

njit

Functions Performance

perfplot.show(    setup=lambda n: pd.Dataframe(np.random.randint(0, 10, size=(n, 1))),    kernels=[        lambda df: list(cumsum_limit_nb(df.iloc[:, 0].values, 5)),        lambda df: dynamic_cumsum2(df.iloc[:, 0].values, 5)    ],    labels=['cumsum_limit_nb', 'dynamic_cumsum2'],    n_range=[2**k for k in range(0, 17)],    xlabel='N',    logx=True,    logy=True,    equality_check=None # TODO - update when @jpp adds in the final `yield`)

log-log图显示，生成器函数越大，速度越快输入：
一种可能的解释是，随着N的增加，附加到
“dynamic_cumsum2”中不断增长的列表变得突出。While

cumsumu limitu nb

只需要“屈服”。

我能在熊猫身上表演动态累加吗？

面试问答相关栏目本月热门文章