你可以使用
pandas.cut:
bins = [0, 1, 5, 10, 25, 50, 100]df['binned'] = pd.cut(df['percentage'], bins)print (df) percentage binned0 46.50 (25, 50]1 44.20 (25, 50]2 100.00 (50, 100]3 42.12 (25, 50]bins = [0, 1, 5, 10, 25, 50, 100]labels = [1,2,3,4,5,6]df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)print (df) percentage binned0 46.50 51 44.20 52 100.00 63 42.12 5
或numpy.searchsorted:
bins = [0, 1, 5, 10, 25, 50, 100]df['binned'] = np.searchsorted(bins, df['percentage'].values)print (df) percentage binned0 46.50 51 44.20 52 100.00 63 42.12 5
…然后value_countsor groupby和合计size:
s = pd.cut(df['percentage'], bins=bins).value_counts()print (s)(25, 50] 3(50, 100] 1(10, 25] 0(5, 10] 0(1, 5] 0(0, 1] 0Name: percentage, dtype: int64
s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()print (s)percentage(0, 1] 0(1, 5] 0(5, 10] 0(10, 25] 0(25, 50] 3(50, 100] 1dtype: int64
默认cut返回
categorical。
Series像这样的方法
Series.value_counts()将使用所有类别,即使数据中不存在某些类别,也可以使用
categorical操作。



