test[‘range’] = pd.cut(test.days, [0,30,60], include_lowest=True)
print (test)
daysrange
0 0 (-0.001, 30.0]
1 31 (30.0, 60.0]
2 45 (30.0, 60.0]
看区别:
test = pd.Dataframe({'days': [0,20,30,31,45,60]})test['range1'] = pd.cut(test.days, [0,30,60], include_lowest=True)#30 value is in [30, 60) grouptest['range2'] = pd.cut(test.days, [0,30,60], right=False)#30 value is in (0, 30] grouptest['range3'] = pd.cut(test.days, [0,30,60])print (test) days range1 range2 range30 0 (-0.001, 30.0] [0, 30) NaN1 20 (-0.001, 30.0] [0, 30) (0, 30]2 30 (-0.001, 30.0] [30, 60) (0, 30]3 31 (30.0, 60.0] [30, 60) (30, 60]4 45 (30.0, 60.0] [30, 60) (30, 60]5 60 (30.0, 60.0] NaN (30, 60]或使用
numpy.searchsorted,但
dayshast的值必须排序:
arr = np.array([0,30,60])test['range1'] = arr.searchsorted(test.days)test['range2'] = arr.searchsorted(test.days, side='right') - 1print (test) days range1 range20 0 0 01 20 1 02 30 1 13 31 2 14 45 2 15 60 2 2



