pd.concat要求 索引
是唯一的。要删除索引重复的行,请使用
df = df.loc[~df.index.duplicated(keep='first')]
import pandas as pdfrom pandas import Timestampdf1 = pd.Dataframe( {'price': [0.7286, 0.7286, 0.7286, 0.7286], 'side': [2, 2, 2, 2], 'timestamp': [1451865675631331, 1451865675631400, 1451865675631861, 1451865675631866]}, index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))df2 = pd.Dataframe( {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285], 'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000], 'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729], 'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]}, index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))df1 = df1.loc[~df1.index.duplicated(keep='first')]# price side timestamp# 2000-01-01 0.7286 2 1451865675631331# 2001-01-01 0.7286 2 1451865675631861# 2002-01-01 0.7286 2 1451865675631866df2 = df2.loc[~df2.index.duplicated(keep='first')]# bid bid_size offer offer_size# 2000-01-01 0.7284 4000000 0.7285 1000000# 2001-01-01 0.7284 4000000 0.7290 4000000# 2002-01-01 0.7284 5000000 0.7286 4000000# 2003-01-01 0.7285 1000000 0.7286 4000000# 2004-01-01 0.7285 4000000 0.7290 4000000result = pd.concat([df1, df2], axis=0)print(result) bid bid_size offer offer_size price side timestamp2000-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+152001-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+152002-01-01 NaN NaN NaN NaN 0.7286 2 1.451866e+152000-01-01 0.7284 4000000 0.7285 1000000 NaN NaNNaN2001-01-01 0.7284 4000000 0.7290 4000000 NaN NaNNaN2002-01-01 0.7284 5000000 0.7286 4000000 NaN NaNNaN2003-01-01 0.7285 1000000 0.7286 4000000 NaN NaNNaN2004-01-01 0.7285 4000000 0.7290 4000000 NaN NaNNaN请注意,还有
pd.join,可以根据数据帧的索引加入Dataframe,并根据
how参数处理非唯一索引。具有重复索引的行不会被删除。
In [94]: df1.join(df2)Out[94]: price side timestamp bid bid_size offer 2000-01-01 0.7286 2 1451865675631331 0.7284 4000000 0.7285 2000-01-01 0.7286 2 1451865675631400 0.7284 4000000 0.7285 2001-01-01 0.7286 2 1451865675631861 0.7284 4000000 0.7290 2002-01-01 0.7286 2 1451865675631866 0.7284 5000000 0.7286 offer_size 2000-01-01 1000000 2000-01-01 1000000 2001-01-01 4000000 2002-01-01 4000000In [95]: df1.join(df2, how='outer')Out[95]: price side timestamp bid bid_size offer offer_size2000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 10000002000-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7285 10000002001-01-01 0.7286 2 1.451866e+15 0.7284 4000000 0.7290 40000002002-01-01 0.7286 2 1.451866e+15 0.7284 5000000 0.7286 40000002003-01-01 NaN NaNNaN 0.7285 1000000 0.7286 40000002004-01-01 NaN NaNNaN 0.7285 4000000 0.7290 4000000



