脾气暴躁的numpy.add.at
和pandas.factorize
目的是要快。但是,我也尝试将其组织为易于阅读。
i, r = pd.factorize(df.name)j, c = pd.factorize(df.color)n, m = len(r), len(c)b = np.zeros((n, m), dtype=np.int64)np.add.at(b, (i, j), 1)pd.Series(c[b.argmax(1)], r)John WhiteTom BlueJerry Blackdtype: object
groupby
,size
和idxmax
df.groupby(['name', 'color']).size().unstack().idxmax(1)nameJerry BlackJohn WhiteTom Bluedtype: objectnameJerry BlackJohn WhiteTom BlueName: color, dtype: object
Counter
¯_(ツ)_/¯
from collections import Counterdf.groupby('name').color.apply(lambda c: Counter(c).most_common(1)[0][0])nameJerry BlackJohn WhiteTom BlueName: color, dtype: object


