我认为您需要
transform:
df['total difference'] = df.groupby('city')['difference'].transform(sum) print (df) city difference total difference0 NY6 151 SF8 182 LA8 83 NY9 154 SF 10 18并且如果还需要排序列:
df['total difference'] = df.groupby('city')['difference'].transform('sum') df = df.sort_values('city')print (df) city difference total difference2 LA8 80 NY6 153 NY9 151 SF8 184 SF 10 18我对功能上的差异和时序非常相似感兴趣:
#[10000000 rows x 2 columns]np.random.seed(100)df = pd.Dataframe(np.random.randint(1000, size=(10000000,2)), columns=['city','difference'])#print (df)In [293]: %timeit (df.groupby('city')['difference'].transform('sum'))1 loop, best of 3: 570 ms per loopIn [294]: %timeit (df.groupby('city')['difference'].transform(sum))1 loop, best of 3: 567 ms per loopIn [295]: %timeit (df.groupby('city')['difference'].transform(np.sum))1 loop, best of 3: 561 ms per loop


