假设您的Dataframe是
df:
import numpy as npimport pandas as pdnan = np.nandf = pd.Dataframe([ (nan, nan, nan, 'Auto loan', nan) , ('Branch Code', 'Branch Name', 'Region', 'No of accounts' , 'Portfolio Outstanding') , (3000, 'Name1', 'Central', 0, 0) , (3001, 'Name2', 'Central', 0, 0)])这样看起来像这样:
0 1 2 340 NaN NaN NaN Auto loan NaN1 Branch Code Branch Name Region No of accounts Portfolio Outstanding2 3000 Name1 Central 003 3001 Name2 Central 00
然后,将前两行中的NaN向前填充(例如,传播“自动借贷”)。
df.iloc[0:2] = df.iloc[0:2].fillna(method='ffill', axis=1)
接下来,用空字符串填充其余的NaN:
df.iloc[0:2] = df.iloc[0:2].fillna('')现在,将这两行与结合在一起,
.并将其分配为列级值:
df.columns = df.iloc[0:2].apply(lambda x: '.'.join([y for y in x if y]), axis=0)
最后,删除前两行:
df = df.iloc[2:]
这产生
Branch Code Branch Name Region Auto loan.No of accounts 2 3000 Name1 Central 0 3 3001 Name2 Central 0 Auto loan.Portfolio Outstanding 2 0 3 0
或者,您可以创建一个MultiIndex列而不是创建一个扁平列索引:
import numpy as npimport pandas as pdnan = np.nandf = pd.Dataframe([ (nan, nan, nan, 'Auto loan', nan) , ('Branch Code', 'Branch Name', 'Region', 'No of accounts' , 'Portfolio Outstanding') , (3000, 'Name1', 'Central', 0, 0) , (3001, 'Name2', 'Central', 0, 0)])df.iloc[0:2] = df.iloc[0:2].fillna(method='ffill', axis=1)df.iloc[0:2] = df.iloc[0:2].fillna('Area')df.columns = pd.MultiIndex.from_tuples( zip(*df.iloc[0:2].to_records(index=False).tolist()))df = df.iloc[2:]现在
df看起来像这样:
Area Auto loan Branch Code Branch Name Region No of accounts Portfolio Outstanding2 3000 Name1 Central 0 03 3001 Name2 Central 0 0
该列是一个MultiIndex:
In [275]: df.columnsOut[275]: MultiIndex(levels=[[u'Area', u'Auto loan'], [u'Branch Code', u'Branch Name', u'No of accounts', u'Portfolio Outstanding', u'Region']],labels=[[0, 0, 0, 1, 1], [0, 1, 4, 2, 3]])
该列有两个级别。第一级具有价值
[u'Area', u'Auto loan'],第二级具有价值
[u'Branch Code', u'BranchName', u'No of accounts', u'Portfolio Outstanding', u'Region']。
然后,您可以通过指定两个级别的值来访问列:
print(df.loc[:, ('Area', 'Branch Name')])# 2 Name1# 3 Name2# Name: (Area, Branch Name), dtype: objectprint(df.loc[:, ('Auto loan', 'No of accounts')])# 2 0# 3 0# Name: (Auto loan, No of accounts), dtype: object使用MultiIndex的优点之一是,您可以轻松选择具有特定级别值的所有列。例如,要选择与之相关的子Dataframe,
Auto loans可以使用:
In [279]: df.loc[:, 'Auto loan']Out[279]: No of accounts Portfolio Outstanding2 0 03 0 0
有关从MultiIndex中选择行和列的更多信息,请参见使用切片器进行MultiIndexing。



