Python_面试问答

Python

作为同时使用

和python，我已经多次看到这种类型的问题。

在

中，它们具有

tidyr

名为的包中的内置函数

unnest

。但是

Python（pandas

）中没有针对此类问题的内置函数。

我知道

object

列

type

总是使数据难以通过

pandas'

函数进行转换。当我收到这样的数据时，想到的第一件事就是“弄平”或取消嵌套列。

我正在使用

pandas

和

python

函数来解决此类问题。如果你担心上述解决方案的速度，请检查user3483203的答案，因为他正在使用

numpy

并且大多数时候

numpy

速度更快。我建议

Cpython

，并

numba

如果速度在你的情况很重要。

方法0 [pandas> = 0.25]
从pandas 0.25开始，如果只需要爆炸一列，则可以使用以下explode函数：

df.explode('B')       A  B    0  1  1    1  1  2    0  2  1    1  2  2

方法1

apply + pd.Series

（易于理解，但不建议使用性能。）

df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})Out[463]:    A  B0  1  11  1  20  2  11  2  2

方法2与构造函数一起
使用，重新创建你的数据框（擅长性能，不擅长多列）

repeatDataframe

df=pd.Dataframe({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})dfOut[465]:    A  B0  1  10  1  21  2  11  2  2

例如，方法2.1除了A之外，还有A.1 ..... An如果仍然使用上面的method（方法2），则很难一一重建列。

解决方案：

join

或

merge

与

index

后“UNNEST”单列

s=pd.Dataframe({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))s.join(df.drop('B',1),how='left')Out[477]:    B  A0  1  10  2  11  1  21  2  2

如果需要与以前完全相同的列顺序，请reindex在末尾添加。

s.join(df.drop('B',1),how='left').reindex(columns=df.columns)

方法3
重新创建

list

pd.Dataframe([[x] + [z] for x, y in df.values for z in y],columns=df.columns)Out[488]:    A  B0  1  11  1  22  2  13  2  2

如果超过两列，请使用

s=pd.Dataframe([[x] + [z] for x, y in zip(df.index,df.B) for z in y])s.merge(df,left_on=0,right_index=True)Out[491]:    0  1  A       B0  0  1  1  [1, 2]1  0  2  1  [1, 2]2  1  1  2  [1, 2]3  1  2  2  [1, 2]

方法4
使用

reindex

或

loc

df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))Out[554]:    A  B0  1  10  1  21  2  11  2  2#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))

列表仅包含唯一值时的方法5：

df=pd.Dataframe({'A':[1,2],'B':[[1,2],[3,4]]})from collections import ChainMapd = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))pd.Dataframe(list(d.items()),columns=df.columns[::-1])Out[574]:    B  A0  1  11  2  12  3  23  4  2

使用方法6numpy：

newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))pd.Dataframe(data=newvalues[0],columns=df.columns)   A  B0  1  11  1  22  2  13  2  2

方法7

使用基本函数

itertools cycle

和chain：Pure python解决方案只是为了好玩

from itertools import cycle,chainl=df.values.tolist()l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]pd.Dataframe(list(chain.from_iterable(l1)),columns=df.columns)   A  B0  1  11  1  22  2  13  2  2

归纳到多列

df=pd.Dataframe({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})dfOut[592]:    A       B       C0  1  [1, 2]  [1, 2]1  2  [3, 4]  [3, 4]

自我定义功能：

def unnesting(df, explode):    idx = df.index.repeat(df[explode[0]].str.len())    df1 = pd.concat([        pd.Dataframe({x: np.concatenate(df[x].values)}) for x in explode], axis=1)    df1.index = idx    return df1.join(df.drop(explode, 1), how='left')unnesting(df,['B','C'])Out[609]:    B  C  A0  1  1  10  2  2  11  3  3  21  4  4  2

列式嵌套

以上所有方法都在谈论垂直嵌套和爆炸，如果你确实需要水平扩展列表，请使用pd.Dataframe构造函数检查

df.join(pd.Dataframe(df.B.tolist(),index=df.index).add_prefix('B_'))Out[33]:    A       B       C  B_0  B_10  1  [1, 2]  [1, 2]    1    21  2  [3, 4]  [3, 4]    3    4

更新功能

def unnesting(df, explode, axis):    if axis==1:        idx = df.index.repeat(df[explode[0]].str.len())        df1 = pd.concat([ pd.Dataframe({x: np.concatenate(df[x].values)}) for x in explode], axis=1)        df1.index = idx        return df1.join(df.drop(explode, 1), how='left')    else :        df1 = pd.concat([   pd.Dataframe(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)        return df1.join(df.drop(explode, 1), how='left')

测试输出

unnesting(df, ['B','C'], axis=0)Out[36]:    B0  B1  C0  C1  A0   1   2   1   2  11   3   4   3   4  2

Python

列式嵌套

面试问答相关栏目本月热门文章