pandas数据透视表字段排序_pandas 透视图?

数据透视表

import pandas as pd

example = pd.Dataframe({'Amount': [74., 235., 175., 100., 115., 245., 180., 90., 88., 129., 273., 300.],
              'Category': ['Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment', 
                           'Transportation', 'Grocery', 'Household', 'Entertainment'],
              'Month': ['January', 'January', 'January', 'January', 
                        'February', 'February', 'February', 'February', 
                        'Marth', 'Marth', 'Marth', 'Marth']})
example

	Amount	Category	Month
0	74.0	Transportation	January
1	235.0	Grocery	January
2	175.0	Household	January
3	100.0	Entertainment	January
4	115.0	Transportation	February
5	245.0	Grocery	February
6	180.0	Household	February
7	90.0	Entertainment	February
8	88.0	Transportation	Marth
9	129.0	Grocery	Marth
10	273.0	Household	Marth
11	300.0	Entertainment	Marth

上面这个数据表感觉非常的杂乱，数据透视的意思就是按照我们的意思重新组织这张数据表

new_example = example.pivot(index='Category', columns='Month', values='Amount')

index 表示以原先数据的哪个属性的值为行坐标进行分类 columns 表示以原先数据的哪个属性的值为新的属性 values 表示以原先数据的哪个属性的值填充前面两个 index 和 columns 定义出的新表格

new_example

Month	February	January	Marth
Category
Entertainment	90.0	100.0	300.0
Grocery	245.0	235.0	129.0
Household	180.0	175.0	273.0
Transportation	115.0	74.0	88.0

new_example.sum(axis=1)

Category
Entertainment     490.0
Grocery           609.0
Household         628.0
Transportation    277.0
dtype: float64

new_example.sum(axis=0)

Month
February    630.0
January     584.0
Marth       790.0
dtype: float64

经过重新组织后，数据表的呈现方式更有意义

df = pd.read_csv('../../datasets/titanic/test.csv')
df.head(5)

	PassengerId	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	892	3	Kelly, Mr. James	male	34.5	0	0	330911	7.8292	NaN	Q
1	893	3	Wilkes, Mrs. James (Ellen Needs)	female	47.0	1	0	363272	7.0000	NaN	S
2	894	2	Myles, Mr. Thomas Francis	male	62.0	0	0	240276	9.6875	NaN	Q
3	895	3	Wirz, Mr. Albert	male	27.0	0	0	315154	8.6625	NaN	S
4	896	3	Hirvonen, Mrs. Alexander (Helga E Lindqvist)	female	22.0	1	1	3101298	12.2875	NaN	S

需求：统计男女在不同船舱等级的票价

df.pivot_table(index='Sex', columns='Pclass', values='Fare')   # 默认就是求平均值

Pclass	1	2	3
Sex
female	115.591168	26.438750	13.735129
male	75.586551	20.184654	11.826350

df.pivot_table(index='Sex', columns='Pclass', values='Fare', aggfunc='min')  # 求最小值

Pclass	1	2	3
Sex
female	25.7	10.5000	6.9500
male	0.0	9.6875	3.1708

需求：统计不同船舱等级里男女的平均年龄

df.pivot_table(index='Pclass', columns='Sex', values='Age')

Sex	female	male
Pclass
1	41.333333	40.520000
2	24.376552	30.940678
3	23.073400	24.525104

pandas数据透视表字段排序_pandas 透视图?

Python相关栏目本月热门文章