您可以
.apply(pd.Series)为每个列运行,然后
stack合并结果。
对于系列
s = pd.Series([[0, 1], [2, 3, 4]], index=[2011, 2012])sOut[103]: 2011 [0, 1]2012 [2, 3, 4]dtype: object
它的工作原理如下
s.apply(pd.Series).stack()Out[104]: 2011 0 0.0 1 1.02012 0 2.0 1 3.0 2 4.0dtype: float64
该系列的内容长短不一(这很重要,因为2012年是a年)。中间系列(即before
stack)的
NaN值后来被删除。
现在,让我们来做一个框架:
a = list(range(14))b = list(range(20, 34))df = pd.Dataframe({'ID': [11111, 11111, 11112, 11112], 'Year': [2011, 2012, 2011, 2012], 'A': [a[:3], a[3:7], a[7:10], a[10:14]], 'B': [b[:3], b[3:7], b[7:10], b[10:14]]})dfOut[108]: A B ID Year0 [0, 1, 2] [20, 21, 22] 11111 20111 [3, 4, 5, 6] [23, 24, 25, 26] 11111 20122 [7, 8, 9] [27, 28, 29] 11112 20113 [10, 11, 12, 13] [30, 31, 32, 33] 11112 2012然后我们可以运行:
# set an index (each column will inherit it)df2 = df.set_index(['ID', 'Year'])# the trickunnested_lst = []for col in df2.columns: unnested_lst.append(df2[col].apply(pd.Series).stack())result = pd.concat(unnested_lst, axis=1, keys=df2.columns)
并获得:
resultOut[115]: A BID Year 11111 2011 0 0.0 20.01 1.0 21.02 2.0 22.0 2012 0 3.0 23.01 4.0 24.02 5.0 25.03 6.0 26.011112 2011 0 7.0 27.01 8.0 28.02 9.0 29.0 2012 0 10.0 30.01 11.0 31.02 12.0 32.03 13.0 33.0
其余的(日期时间索引)则不太直接。例如:
# DatetimeIndexyears = pd.to_datetime(result.index.get_level_values(1).astype(str))# TimedeltaIndexdays = pd.to_timedelta(result.index.get_level_values(2), unit='D')# If the above line doesn't work (a bug in pandas), try this:# days = result.index.get_level_values(2).astype('timedelta64[D]')# the sum is again a DatetimeIndexdates = years + daysdates.name = 'Date'new_index = pd.MultiIndex.from_arrays([result.index.get_level_values(0), dates])result.index = new_indexresultOut[130]:A BID Date 11111 2011-01-01 0.0 20.0 2011-01-02 1.0 21.0 2011-01-03 2.0 22.0 2012-01-01 3.0 23.0 2012-01-02 4.0 24.0 2012-01-03 5.0 25.0 2012-01-04 6.0 26.011112 2011-01-01 7.0 27.0 2011-01-02 8.0 28.0 2011-01-03 9.0 29.0 2012-01-01 10.0 30.0 2012-01-02 11.0 31.0 2012-01-03 12.0 32.0 2012-01-04 13.0 33.0


