目录
merge方法
merge语法
merge代码
join方法
join语法
join代码
merge方法
类似于数据库中的join,默认左右对象同名列为键进行纵向连接,且为内连接。
merge语法
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
等同于DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None),
参数说明:
参数left/right可以是DataFrame,也可以是有名称的Series;参数how='inner',默认是内连接,也可以选择'left'/'right'/'outer',即左连接/右连接/全连接;参数on可以指定左右对象同名列,如没有指定且left_index/right_index都为False,则会自动使用左右对象同名列;参数left_on/right_on可以传入左右对象的列名称、行索引名称或与左右对象等长的数组;参数left_index/right_index默认是False,如果指定为True,表示左对象或右对象的行索引为键。其他参数暂略。
参数详解:
pandas.merge — pandas 1.4.2 documentation (pydata.org)
pandas.DataFrame.merge — pandas 1.4.2 documentation (pydata.org)
merge代码
单一键连接
#导入库
import pandas as pd
#构造数据
left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
# on='key'可省略,使用左右对象同名列;默认how='inner'
result = pd.merge(left, right, on='key')
print(result)
多键连接
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
result1 = pd.merge(left, right, on=['key1', 'key2']) #多键连接,默认how='inner'
result2 = pd.merge(left, right, how='left', on=['key1', 'key2']) #多键连接,指定how='left'
result3 = pd.merge(left, right, how='right', on=['key1', 'key2']) #多键连接,指定how='right'
result4 = pd.merge(left, right, how='outer', on=['key1', 'key2']) #多键连接,指定how='outer'
print('result1:n',result1)
print('result2:n',result2)
print('result3:n',result3)
print('result4:n',result4)
#也可以通过列名和行索引共同连接
left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1')
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'key2': ['K0', 'K1', 'K0', 'K1']},
index=left_index)
right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3'],
'key2': ['K0', 'K0', 'K0', 'K1']},
index=right_index)
result = left.merge(right, on=['key1', 'key2'])
print('result:n',result)
参数suffixes,用于给merge对象中除连接键外其他同名列添加后缀以作区分
left = pd.DataFrame({'A': [1, 2], 'B': [1, 2]})
right = pd.DataFrame({'A': [4, 5, 6], 'B': [2, 2, 2]})
result1 = pd.merge(left, right, on='B', how='outer') #默认参数suffixes=('_x','_y')
result2 = pd.merge(left, right, on='B', how='outer',suffixes=('1','2')) #指定参数suffixes=('1','2')
print('result1:n',result1)
print('result2:n',result2)
参数validate,用于判断merge对象的键是否满足条件,不满足条件会报错
- “one_to_one” or “1:1”: 检查左右对象的键是否都是唯一的
- “one_to_many” or “1:m”: 检查左对象的键是否都是唯一的
- “many_to_one” or “m:1”: 检查右对象的键是否都是唯一的
- “many_to_many” or “m:m”: 不产生任何约束
left = pd.DataFrame({'A' : [1,2], 'B' : [1, 2]})
right = pd.DataFrame({'A' : [4,5,6], 'B': [2, 2, 2]})
result = pd.merge(left, right, on='B', how='outer', validate="one_to_one")
参数indicator,用于判断连接键的值是否包含在左右对象中
left = pd.DataFrame({'key': [0, 1], 'left': ['a', 'b']})
right = pd.DataFrame({'key': [1, 2, 2], 'right': [2, 2, 2]})
#指定indicator=True会产生一个新的列_merge,用于表示连接键的值是否包含在左右对象中
result1 = pd.merge(left, right, on='key', how='outer', indicator=True)
#指定indicator='indicator_column'会产生一个新的列indicator_column,用于表示连接键的值是否包含在左右对象中
result2 = pd.merge(left, right, on='key', how='outer', indicator='indicator_column')
print('result1:n',result1)
print('result2:n',result2)
join方法
类似于数据库中的join,也类似于pandas中的merge,但默认使用索引连接,且为左连接。
join语法
DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False)
参数说明:
默认使⽤索引连接,也可以通过参数on指定左对象的列名和右对象的索引进行连接;参数how='left',默认是左连接,也可以选择‘right’/‘outer’/‘inner’,即右连接/全连接/内连接;参数lsuffix/rsuffix 指定左右对象中除连接键外同名列的后缀。其他参数暂略。
参数详解:
pandas.DataFrame.join — pandas 1.4.2 documentation (pydata.org)
join代码
参数how,默认是'left'左连接,也可以选择‘right’/‘outer’/‘inner’,即右连接/全连接/内连接
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=['K0', 'K1', 'K2'])
right = pd.DataFrame({'C': ['C0', 'C2', 'C3'],
'D': ['D0', 'D2', 'D3']},
index=['K0', 'K2', 'K3'])
result1 = left.join(right) #默认how='left'
result2 = left.join(right, how='outer') #指定how='outer'
result3 = left.join(right, how='inner') #指定how='inner'
#join可以和merge等价转换,result3和result4结果一致
result4 = pd.merge(left, right, left_index=True, right_index=True) #默认how='inner'
print('result1:n',result1)
print('result2:n',result2)
print('result3:n',result3)
print('result4:n',result4)
参数lsuffix/rsuffix,用于给join对象中除连接键外其他同名列添加后缀以作区分
left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})
right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})
result1 = pd.merge(left, right, on='k', suffixes=['_l', '_r'])
left = left.set_index('k')
right = right.set_index('k')
result2 = left.join(right, how='inner', lsuffix='_l', rsuffix='_r')
print('result1:n',result1)
print('result2:n',result2)
多层索引
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']},
index=pd.Index(['K0', 'K1', 'K2'], name='key'))
index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
('K2', 'Y2'), ('K2', 'Y3')],
names=['key', 'Y'])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=index)
result1 = left.join(right, how='inner')
# join可以和merge等价转换,result1和result2结果一致
result2 = pd.merge(left.reset_index(), right.reset_index(),on=['key'], how='inner').set_index(['key','Y'])
print('result1:n',result1)
print('result2:n',result2)
left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1']})
index = pd.MultiIndex.from_tuples([('K0', 'K0'), ('K1', 'K0'),
('K2', 'K0'), ('K2', 'K1')])
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']},
index=index)
#左对象使用列名,右对象使用多层索引
result1 = left.join(right, on=['key1', 'key2']) #默认how='left'
result2 = left.join(right, on=['key1', 'key2'], how='inner') #指定how='inner'
print('result1:n',result1)
print('result2:n',result2)
leftindex = pd.MultiIndex.from_product([list('abc'), list('xy'), [1, 2]],
names=['abc', 'xy', 'num'])
left = pd.DataFrame({'v1': range(12)}, index=leftindex)
rightindex = pd.MultiIndex.from_product([list('abc'), list('xy')],
names=['abc', 'xy'])
right = pd.DataFrame({'v2': [100 * i for i in range(1, 7)]}, index=rightindex)
result=left.join(right, on=['abc', 'xy'], how='inner') #on=['abc', 'xy']可省略
print('result:n',result)
join多个对象
left = pd.DataFrame({'v': [1, 2, 3]}, index=['K0', 'K1', 'K2'])
right1 = pd.DataFrame({'v': [4, 5, 6]}, index=['K0', 'K0', 'K3'])
right2 = pd.DataFrame({'v': [7, 8, 9]}, index=['K1', 'K1', 'K2'])
result = left.join([right1, right2])
print('result:n',result)
参考资料:Merge, join, and concatenate | Pandas 中文 (pypandas.cn)


