df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean')) 会比 apply
In [2400]: dfOut[2400]: A B C D0 1 1 1 1.01 1 1 1 NaN2 1 1 1 3.03 3 3 3 5.0In [2401]: df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))Out[2401]:0 1.01 2.02 3.03 5.0Name: D, dtype: float64In [2402]: df['D'] = df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))In [2403]: dfOut[2403]: A B C D0 1 1 1 1.01 1 1 1 2.02 1 1 1 3.03 3 3 3 5.0细节
In [2396]: df.shapeOut[2396]: (10000, 4)In [2398]: %timeit df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))100 loops, best of 3: 3.44 ms per loopIn [2397]: %timeit df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))100 loops, best of 3: 5.34 ms per loop


