在groupby熊猫之后过滤行

您可以

value_counts

与

booleanindexing

和一起使用

isin

：

df = pd.Dataframe({    'LeafID':[1,1,2,1,3,3,1,6,3,5,1],    'pidx':[10,10,300,10,30,40,20,10,30,45,20],    'pidy':[20,20,400,20,15,20,12,43,54,112,23],    'count':[10,20,30,40,80,10,20,50,30,10,70],    'score':[10,10,10,22,22,3,4,5,9,0,1]})print (df)    LeafID  count  pidx  pidy  score0        1     10    10    20     101        1     20    10    20     102        2     30   300   400     103        1     40    10    20     224        3     80    30    15     225        3     10    40    20      36        1     20    20    12      47        6     50    10    43      58        3     30    30    54      99        5     10    45   112      010       1     70    20    23      1s = df.pidx.value_counts()idx = s[s>2].indexprint (df[df.pidx.isin(idx)])   LeafID  count  pidx  pidy  score0       1     10    10    20     101       1     20    10    20     103       1     40    10    20     227       6     50    10    43      5

时间：

np.random.seed(123)N = 1000000L1 = list('abcdefghijklmnopqrstu')L2 = list('efghijklmnopqrstuvwxyz')df = pd.Dataframe({'LeafId':np.random.randint(1000, size=N),        'pidx': np.random.randint(10000, size=N),        'pidy': np.random.choice(L2, N),        'count':np.random.randint(1000, size=N)})print (df)print (df.groupby('pidx').filter(lambda x: len(x) > 120))def jez(df):    s = df.pidx.value_counts()    return df[df.pidx.isin(s[s>120].index)]print (jez(df))In [55]: %timeit (df.groupby('pidx').filter(lambda x: len(x) > 120))1 loop, best of 3: 1.17 s per loopIn [56]: %timeit (jez(df))10 loops, best of 3: 141 ms per loopIn [62]: %timeit (df[df.groupby('pidx').pidx.transform('size') > 120])10 loops, best of 3: 102 ms per loopIn [63]: %timeit (df[df.groupby('pidx').pidx.transform(len) > 120])1 loop, best of 3: 685 ms per loopIn [64]: %timeit (df[df.groupby('pidx').pidx.transform('count') > 120])10 loops, best of 3: 104 ms per loop

对于

final_score

您可以使用：

df['final_score'] = df['count'].mul(.4).add(df.score.mul(.6))

在groupby熊猫之后过滤行

面试问答相关栏目本月热门文章