您可以
df.itertuples用来遍历每一行,并使用列表推导将数据重塑为所需的形式:
import pandas as pddf = pd.Dataframe( {"name" : ["John", "Eric"], "days" : [[1, 3, 5, 7], [2,4]]})result = pd.Dataframe([(d, tup.name) for tup in df.itertuples() for d in tup.days])print(result)产量
0 10 1 John1 3 John2 5 John3 7 John4 2 Eric5 4 Eric
ivakar的解决方案,
using_repeat是最快的:
In [48]: %timeit using_repeat(df)1000 loops, best of 3: 834 µs per loopIn [5]: %timeit using_itertuples(df)100 loops, best of 3: 3.43 ms per loopIn [7]: %timeit using_apply(df)1 loop, best of 3: 379 ms per loopIn [8]: %timeit using_append(df)1 loop, best of 3: 3.59 s per loop
这是用于上述基准测试的设置:
import numpy as npimport pandas as pdN = 10**3df = pd.Dataframe( {"name" : np.random.choice(list('ABCD'), size=N), "days" : [np.random.randint(10, size=np.random.randint(5)) for i in range(N)]})def using_itertuples(df): return pd.Dataframe([(d, tup.name) for tup in df.itertuples() for d in tup.days])def using_repeat(df): lens = [len(item) for item in df['days']] return pd.Dataframe( {"name" : np.repeat(df['name'].values,lens), "days" : np.concatenate(df['days'].values)})def using_apply(df): return (df.apply(lambda x: pd.Series(x.days), axis=1) .stack() .reset_index(level=1, drop=1) .to_frame('day') .join(df['name']))def using_append(df): df2 = pd.Dataframe(columns = df.columns) for i,r in df.iterrows(): for e in r.days: new_r = r.copy() new_r.days = e df2 = df2.append(new_r) return df2


