多索引 | 自定义排序
最后修改日期:2021年11月17日
多索引
- 自定义排序
- 知识点
- 聚合
- 多索引
- 提取索引值
- 添加 / 删改
- 添加
- 删改
- 去重
- 指定列
- 示例数据
- 自定义排序
- 乱序
- 低维索引
- 现排序列表
- 自定义排序列表
- 自定义排序实现
- 高维索引
- 现排序列表
- 自定义排序列表
- 自定义排序实现
- 参考文章
- 添加
- 删改
- 自定义排序实现
###### raw data set, 'data' ###### columns labled with 'A' and 'B' ###### group function, e.g. mean(), sum(), std() data.groupby(['A', 'B'], as_index=False).mean()
多索引 提取索引值
###### raw data set, 'data' ###### multi-index data.index.get_level_values(level=0) # here, level could not exceed index levels
添加 / 删改 添加
###### raw data set, 'data' # 增加名为‘月份’的列,新列的值为int(month) data1['月份']=int(month)
删改 去重
###### raw data set, 'data' ###### subset, 用来指定特定的列,默认所有列 ###### first, 删除重复项并保留第一次出现的项 ###### inplace, 是直接在原来数据上修改还是保留一个副本 data.drop_duplicates(subset='A', keep='first', inplace=False) # keep='first'/'last'指定列
###### raw data set, 'data'
# drop columns with labels 'column_name'
# axis=0, rows; axis=1, columns
# inplace, manipulate raw data; otherwise, return a copy of dataframe
data.drop('column_name',axis=1, inplace=True)
示例数据
# example
# print(data)
'''
creator Num ... priority efforts group
0 Y 962 ... Highest 20.703253 P
1 R 979 ... Highest 15.027863 N
...
560 t 193 ... Low 44.173754 N
'''
MultIndex = ['group', 'priority']
groupdata = data.groupby(MultIndex)['Num'].count() #
# groupdata = pd.Dataframe(groupdata)
groupdata = pd.Dataframe(data.groupby(MultIndex)['Num'].count()) #
'''
group priority
A High 5
Highest 7
Medium 36
B High 8
Highest 18
Medium 11
N High 94
Highest 88
Low 3
Medium 217
...
Name: Num, dtype: int64
'''
自定义排序 乱序 低维索引 现排序列表
groupVal = groupdata.index.get_level_values(level=0) ''' Index(['A', 'A', 'A', 'B', 'B', 'B', 'N', 'N', 'N', 'N', ...], dtype='object', name='group') ''' groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates() #自定义排序列表''' Index(['A', 'B', 'N', ...], dtype='object', name='group') ''' groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list() # ''' ['A', 'B', 'N', ...] '''
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3] #自定义排序实现''' ['A', 'B', ..., 'N'] '''
# groupdata['groupOrder'] = groupVal
groupdata['groupOrder'] = groupdata.index.get_level_values(level=0)
'''
Num groupOrder
group priority
A High 5 A
Highest 7 A
Medium 36 A
B High 8 B
Highest 18 B
Medium 11 B
N High 94 N
Highest 88 N
Low 3 N
Medium 217 N
...
'''
groupdata['groupOrder'] = groupdata['groupOrder'].astype('category')
groupdata['groupOrder'].cat.reorder_categories(groupReorder, ordered=True, inplace=True) #
groupdata.sort_values('groupOrder', inplace=True, ascending=True)
groupdata.drop(columns='groupOrder', inplace=True)
'''
Num groupOrder
group priority
A High 5 A
Highest 7 A
Medium 36 A
B High 8 B
Highest 18 B
Medium 11 B
...
N High 94 N
Highest 88 N
Low 3 N
Medium 217 N
'''
高维索引 现排序列表
### 此处index level有两级 groupVal = groupdata.index.get_level_values(level=0) ''' Index(['A', 'A', 'A', 'B', 'B', 'B', 'N', 'N', 'N', 'N', ...], dtype='object', name='group') ''' groupOrder = groupdata.index.get_level_values(level=0).drop_duplicates().to_list() ''' ['A', 'B', 'N', 'P', 'W', 'X'] ''' priorityVal = groupdata.index.get_level_values(level=1) ''' Index(['High', 'Highest', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Low', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Medium', 'High', 'Highest', 'Medium'], dtype='object', name='priority') ''' priorityOrder = groupdata.index.get_level_values(level=1).drop_duplicates().to_list() ''' ['High', 'Highest', 'Medium', 'Low'] '''自定义排序列表
groupReorder = groupOrder[:2] + groupOrder[3:] + groupOrder[2:3] ''' ['A', 'B', 'P', 'W', 'X', 'N'] ''' priorityReorder = priorityOrder[1:2] + priorityOrder[0:1] + priorityOrder[2:4] ''' ['Highest', 'High', 'Medium', 'Low'] '''自定义排序实现
# groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal]
# the same with
# groupdata['groupOrder'], groupdata['priorityOrder'] = [groupVal, priorityVal]
groupdata = groupdata.assign(groupOrder=groupVal, priorityOrder=priorityVal)
'''
Num groupOrder priorityOrder
group priority
A High 5 A High
Highest 7 A Highest
Medium 36 A Medium
B High 8 B High
Highest 18 B Highest
Medium 11 B Medium
N High 94 N High
Highest 88 N Highest
Low 3 N Low
Medium 217 N Medium
...
'''
# type(groupdata):
groupdata['groupOrder'] = pd.Categorical(groupdata['groupOrder'], categories=groupReorder)
groupdata['priorityOrder'] = pd.Categorical(groupdata['priorityOrder'], categories=priorityReorder)
groupdata.sort_values(['groupOrder', 'priorityOrder'], inplace=True, ascending=[True, True])
groupdata.drop(columns=['groupOrder', 'priorityOrder'], axis=1, inplace=True)
'''
Num
group priority
A Highest 7
High 5
Medium 36
B Highest 18
High 8
Medium 11
...
N Highest 88
High 94
Medium 217
Low 3
'''
参考文章
写此文时有借鉴以下资源
添加python pandas同时对多列进行赋值
替换 Pandas Dataframe 中的列值
pandas dataframe 新增单列和多列
pandas 之 groupby 聚合函数
Pandas之drop_duplicates:去除重复项
python进行数据处理——pandas的drop函数
Pandas Dataframe sort by categorical column but by specific class ordering



