熊猫合并()允许
outer,
left,
right连接(而不仅仅是
inner连接)两个数据帧之间,这样你就可以返回匹配的记录。此外,
merge()甚至可以泛化为返回交叉联接(两个数据帧之间的所有组合匹配),并且随后进行过滤可以返回不匹配的记录。还有,还有isin() pandas方法。
考虑下面的演示。下面是我们喜欢的两种数据框架, 计算机语言
。如图所示,第一数据帧是第二数据帧的子集。外部联接返回的记录都包含
NaN不匹配的列,以后可以将其过滤掉。交叉联接返回可以过滤的完整完整行,并
isin()在列中搜索值:
import pandas as pddf1 = pd.Dataframe({'Languages': ['C++', 'C', 'Java', 'C#', 'Python', 'PHP'], 'Uses': ['computing', 'computing', 'application', 'application', 'application', 'web'], 'Type': ['Proprietary', 'Proprietary', 'Proprietary', 'Proprietary', 'Open-Source', 'Open-Source']})df2 = pd.Dataframe({'Languages': ['C++', 'C', 'Java', 'C#', 'Python', 'PHP','Perl', 'R', 'Ruby', 'VB.NET', 'Javascript', 'Matlab'], 'Uses': ['computing', 'computing', 'application', 'application', 'application', 'web', 'application', 'computing', 'web', 'application', 'web', 'computing'], 'Type': ['Proprietary', 'Proprietary', 'Proprietary', 'Proprietary', 'Open-Source', 'Open-Source', 'Open-Source', 'Open-Source', 'Open-Source', 'Proprietary', 'Open-Source', 'Proprietary']})# OUTER JOIN mergedf = pd.merge(df1, df2, on=['Languages'], how='outer')# FILTER OUT LANGUAGES IN SMALLER THAT IS NULLmergedf = mergedf[pd.isnull(mergedf['Type_x'])][['Languages', 'Uses_y', 'Type_y']]# Languages Uses_y Type_y#6 Perl application Open-Source#7 R computing Open-Source#8 Ruby web Open-Source#9 VB.NET application Proprietary#10 Javascript web Open-Source#11 Matlab computing Proprietary# ISIN COMPARISON, RETURNING RECORDS IN LARGER NOT IN SMALLERunequaldf = df2[~df2.Languages.isin(df1['Languages'])]# Languages Type Uses#6 Perl Open-Source application#7 R Open-Source computing#8 Ruby Open-Source web#9 VB.NET Proprietary application#10 Javascript Open-Source web#11 Matlab Proprietary computing# CROSS JOIN df1['key'] = 1 # (REQUIRES A JOIN KEY OF SAME VALUE)df2['key'] = 1 crossjoindf = pd.merge(df1, df2, on=['key'])# FILTER FOR LANGUAGES IN LARGER NOT IN SMALLER (ALSO USING ISIN)crossjoindf = crossjoindf[~crossjoindf['Languages_y'].isin(crossjoindf['Languages_x'])] [['Languages_y', 'Uses_y', 'Type_y']].drop_duplicates()# Languages_y Uses_y Type_y#6 Perl application Open-Source#7 R computing Open-Source#8 Ruby web Open-Source#9 VB.NET application Proprietary#10 Javascript web Open-Source#11 Matlab computing Proprietary诚然,交叉连接在这里可能是多余且冗长的,但是如果您无与伦比的需求需要跨数据帧进行排列,那么它会很方便。



