使用
strides views concept ondataframe,这是向量化方法-
get_sliding_window(df, 2).dot(X) # window size = 2
运行时测试-
In [101]: df = pd.Dataframe(np.random.rand(5, 2).round(2), columns=['A', 'B'])In [102]: X = np.array([2, 3])In [103]: rolled_df = roll(df, 2)In [104]: %timeit rolled_df.apply(lambda df: pd.Series(df.values.dot(X)))100 loops, best of 3: 5.51 ms per loopIn [105]: %timeit get_sliding_window(df, 2).dot(X)10000 loops, best of 3: 43.7 µs per loop
验证结果-
In [106]: rolled_df.apply(lambda df: pd.Series(df.values.dot(X)))Out[106]: 0 11 2.70 4.092 4.09 2.523 2.52 1.784 1.78 3.50In [107]: get_sliding_window(df, 2).dot(X)Out[107]: array([[ 2.7 , 4.09], [ 4.09, 2.52], [ 2.52, 1.78], [ 1.78, 3.5 ]])
在那里有巨大的进步,我希望在大型阵列上能保持明显的进步!



