我们还可以使用sklearn.preprocessing.MultiLabelBinarizer:
通常,我们想将 稀疏 Dataframe用于现实世界的数据,以节省大量RAM。
稀疏解决方案
from sklearn.preprocessing import MultiLabelBinarizermlb = MultiLabelBinarizer(sparse_output=True)df = df.join( pd.Dataframe.sparse.from_spmatrix( mlb.fit_transform(df.pop('Col3')), index=df.index, columns=mlb.classes_))结果:
In [38]: dfOut[38]: Col1 Col2 Apple Banana Grape Orange0 C 33.0 1 1 0 11 A 2.5 1 0 1 02 B 42.0 0 1 0 0In [39]: df.dtypesOut[39]:Col1 objectCol2 float64Apple Sparse[int32, 0]Banana Sparse[int32, 0]Grape Sparse[int32, 0]Orange Sparse[int32, 0]dtype: objectIn [40]: df.memory_usage()Out[40]:Index 128Col1 24Col2 24Apple 16 # <--- NOTE!Banana 16 # <--- NOTE!Grape 8 # <--- NOTE!Orange 8 # <--- NOTE!dtype: int64
致密溶液
mlb = MultiLabelBinarizer()df = df.join(pd.Dataframe(mlb.fit_transform(df.pop('Col3')), columns=mlb.classes_, index=df.index))结果:
In [77]: dfOut[77]: Col1 Col2 Apple Banana Grape Orange0 C 33.0 1 1 0 11 A 2.5 1 0 1 02 B 42.0 0 1 0 0



