栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 面试经验 > 面试问答

Concat DataFrame重新索引仅对唯一值的Index对象有效

面试问答 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

Concat DataFrame重新索引仅对唯一值的Index对象有效

pd.concat
要求 索引
是唯一的。要删除索引重复的行,请使用

df = df.loc[~df.index.duplicated(keep='first')]

import pandas as pdfrom pandas import Timestampdf1 = pd.Dataframe(    {'price': [0.7286, 0.7286, 0.7286, 0.7286],     'side': [2, 2, 2, 2],     'timestamp': [1451865675631331, 1451865675631400,       1451865675631861, 1451865675631866]},    index=pd.DatetimeIndex(['2000-1-1', '2000-1-1', '2001-1-1', '2002-1-1']))df2 = pd.Dataframe(    {'bid': [0.7284, 0.7284, 0.7284, 0.7285, 0.7285],     'bid_size': [4000000, 4000000, 5000000, 1000000, 4000000],     'offer': [0.7285, 0.729, 0.7286, 0.7286, 0.729],     'offer_size': [1000000, 4000000, 4000000, 4000000, 4000000]},    index=pd.DatetimeIndex(['2000-1-1', '2001-1-1', '2002-1-1', '2003-1-1', '2004-1-1']))df1 = df1.loc[~df1.index.duplicated(keep='first')]#   price  side         timestamp# 2000-01-01  0.7286     2  1451865675631331# 2001-01-01  0.7286     2  1451865675631861# 2002-01-01  0.7286     2  1451865675631866df2 = df2.loc[~df2.index.duplicated(keep='first')]#     bid  bid_size   offer  offer_size# 2000-01-01  0.7284   4000000  0.7285     1000000# 2001-01-01  0.7284   4000000  0.7290     4000000# 2002-01-01  0.7284   5000000  0.7286     4000000# 2003-01-01  0.7285   1000000  0.7286     4000000# 2004-01-01  0.7285   4000000  0.7290     4000000result = pd.concat([df1, df2], axis=0)print(result)    bid  bid_size   offer  offer_size   price  side     timestamp2000-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+152001-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+152002-01-01     NaN       NaN     NaN         NaN  0.7286     2  1.451866e+152000-01-01  0.7284   4000000  0.7285     1000000     NaN   NaNNaN2001-01-01  0.7284   4000000  0.7290     4000000     NaN   NaNNaN2002-01-01  0.7284   5000000  0.7286     4000000     NaN   NaNNaN2003-01-01  0.7285   1000000  0.7286     4000000     NaN   NaNNaN2004-01-01  0.7285   4000000  0.7290     4000000     NaN   NaNNaN

请注意,还有

pd.join
,可以根据数据帧的索引加入Dataframe,并根据
how
参数处理非唯一索引。具有重复索引的行不会被删除。

In [94]: df1.join(df2)Out[94]:   price  side         timestamp     bid  bid_size   offer  2000-01-01  0.7286     2  1451865675631331  0.7284   4000000  0.7285   2000-01-01  0.7286     2  1451865675631400  0.7284   4000000  0.7285   2001-01-01  0.7286     2  1451865675631861  0.7284   4000000  0.7290   2002-01-01  0.7286     2  1451865675631866  0.7284   5000000  0.7286 offer_size  2000-01-01     1000000  2000-01-01     1000000  2001-01-01     4000000  2002-01-01     4000000In [95]: df1.join(df2, how='outer')Out[95]:   price  side     timestamp     bid  bid_size   offer  offer_size2000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     10000002000-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7285     10000002001-01-01  0.7286     2  1.451866e+15  0.7284   4000000  0.7290     40000002002-01-01  0.7286     2  1.451866e+15  0.7284   5000000  0.7286     40000002003-01-01     NaN   NaNNaN  0.7285   1000000  0.7286     40000002004-01-01     NaN   NaNNaN  0.7285   4000000  0.7290     4000000


转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/470491.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号