对于这些操作,纯Python可能更有效。
%timeit pd.Series([set1.union(set2) for set1, set2 in zip(df['A'], df['B'])])10 loops, best of 3: 43.3 ms per loop%timeit df.apply(lambda x: x.A.union(x.B), axis=1)1 loop, best of 3: 2.6 s per loop
如果我们可以使用
+,则可能会花费一半的时间(继承可能不值得):
%timeit df['A'] - df['B']10 loops, best of 3: 22.1 ms per loop%timeit pd.Series([set1.difference(set2) for set1, set2 in zip(df['A'], df['B'])])10 loops, best of 3: 35.7 ms per loop
时序的Dataframe:
import pandas as pdimport numpy as npl1 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]l2 = [set(np.random.choice(list('abcdefg'), np.random.randint(1, 5))) for _ in range(100000)]df = pd.Dataframe({'A': l1, 'B': l2})


