import pandas as pd
d=pd.read_csv('D:/pandas活用/pandas_for_everyone-master/data/acs_ny.csv')
print(d.columns)
print('@'*66)
print(d.head())
Index(['Acres', 'FamilyIncome', 'FamilyType', 'NumBedrooms', 'NumChildren',
'NumPeople', 'NumRooms', 'NumUnits', 'NumVehicles', 'NumWorkers',
'OwnRent', 'YearBuilt', 'HouseCosts', 'ElectricBill', 'FoodStamp',
'HeatingFuel', 'Insurance', 'Language'],
dtype='object')
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Acres FamilyIncome FamilyType NumBedrooms NumChildren NumPeople
0 1-10 150 Married 4 1 3
1 1-10 180 Female Head 3 2 4
2 1-10 280 Female Head 4 0 2
3 1-10 330 Female Head 2 1 2
4 1-10 330 Male Head 3 1 2
NumRooms NumUnits NumVehicles NumWorkers OwnRent YearBuilt
0 9 Single detached 1 0 Mortgage 1950-1959
1 6 Single detached 2 0 Rented Before 1939
2 8 Single detached 3 1 Mortgage 2000-2004
3 4 Single detached 1 0 Rented 1950-1959
4 5 Single attached 1 0 Mortgage Before 1939
HouseCosts ElectricBill FoodStamp HeatingFuel Insurance Language
0 1800 90 No Gas 2500 English
1 850 90 No Oil 0 English
2 2600 260 No Oil 6600 Other European
3 1800 140 No Oil 0 English
4 860 150 No Gas 660 Spanish
以下对FamilyIncome 进行分箱操作:
#其中指定要进行分箱操作的列,指定收入在范围为0-150000的为0,150000到收入的最大值范围之间的为1,标签labels使用列表传入值,也可以指定字符串作为标签 d['income_15w']=pd.cut(d['FamilyIncome'],[0,150000,d['FamilyIncome'].max()],labels=[0,1]) print(d.info()) print(d['income_15w'].value_counts())
RangeIndex: 22745 entries, 0 to 22744 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Acres 22745 non-null object 1 FamilyIncome 22745 non-null int64 2 FamilyType 22745 non-null object 3 NumBedrooms 22745 non-null int64 4 NumChildren 22745 non-null int64 5 NumPeople 22745 non-null int64 6 NumRooms 22745 non-null int64 7 NumUnits 22745 non-null object 8 NumVehicles 22745 non-null int64 9 NumWorkers 22745 non-null int64 10 OwnRent 22745 non-null object 11 YearBuilt 22745 non-null object 12 HouseCosts 22745 non-null int64 13 ElectricBill 22745 non-null int64 14 FoodStamp 22745 non-null object 15 HeatingFuel 22745 non-null object 16 Insurance 22745 non-null int64 17 Language 22745 non-null object 18 income_15w 22745 non-null category dtypes: category(1), int64(10), object(8) memory usage: 3.1+ MB None @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 0 18294 1 4451 Name: income_15w, dtype: int64



