如何
groupby在熊猫中使用。首先修复代码中的一些问题:
import itertoolsimport numpy as npnp.random.seed(42)A = np.random.random_sample(1e4)B = (np.random.random_sample(1e4) + 10)*20C = (np.random.random_sample(1e4) + 20)*40D = (np.random.random_sample(1e4) + 80)*80# make the edges of the binsBbins = np.linspace(B.min(), B.max(), 10)Cbins = np.linspace(C.min(), C.max(), 12) # note different numberDbins = np.linspace(D.min(), D.max(), 24) # note different numberB_Bidx = np.digitize(B, Bbins)C_Cidx = np.digitize(C, Cbins)D_Didx = np.digitize(D, Dbins)a_bins = []for bb, cc, dd in itertools.product(np.unique(B_Bidx), np.unique(C_Cidx), np.unique(D_Didx)): a_bins.append([(bb, cc, dd), A[(B_Bidx==bb) & (C_Cidx==cc) & (D_Didx==dd)]])a_bins[1000]
输出:
[(4, 6, 17), array([ 0.70723863, 0.907611 , 0.46214047])]
这是返回熊猫相同结果的代码:
import pandas as pdcB = pd.cut(B, 9)cC = pd.cut(C, 11)cD = pd.cut(D, 23)sA = pd.Series(A)g = sA.groupby([cB.labels, cC.labels, cD.labels])g.get_group((3, 5, 16))
输出:
800 0.7072392320 0.9076119388 0.462140dtype: float64
如果要计算每个组的某些统计信息,可以调用的方法
g,例如:
g.mean()
返回:
0 0 0 0.343566 1 0.410979 2 0.700007 3 0.189936 4 0.452566 5 0.565330 6 0.539565 7 0.530867 8 0.568120 9 0.587762 11 0.352453 12 0.484903 13 0.477969 14 0.484328 15 0.467357...8 10 8 0.559859 9 0.570652 10 0.656718 11 0.353938 12 0.628980 13 0.372350 14 0.404543 15 0.387920 16 0.742292 17 0.530866 18 0.389236 19 0.628461 20 0.387384 21 0.541831 22 0.573023Length: 2250, dtype: float64



