JOIN 的作用INNER JOIN(内连接)LEFT JOIN(左外连接 | 左连接)RIGHT JOIN(右外连接 | 右连接)FULL JOIN(完全连接 | 完全外连接)CROSS JOIN(交叉连接 | 笛卡尔积)SEMI JOIN(半连接)ANTI JOIN(反连接)
参考了以下文档:
① Spark 3.2.1 - JOIN
②
JOIN 的作用JOIN 用来将又关系的行进行关联
常见的 JOIN 类型有:
[ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTIINNER JOIN(内连接)
Inner Join 是 Spark 中默认的连接,将两张表中符合关联条件的行返回,Inner 可以省略
-- 语法 relation [ INNER ] JOIN relation [ join_criteria ]
SELECt id
,name
,employee.deptno
,deptname
FROM employee
INNER JOIN department
ON employee.deptno = department.deptno
;
LEFT JOIN(左外连接 | 左连接)
返回左表的所有值和右表的匹配值,如果没有匹配,则附加 NULL。它也称为左外连接
注意:关联条件左边的表称为 左表(也称为主表),关联条件右边的表称为 右表(也称为副表)
-- 语法 relation LEFT [ OUTER ] JOIN relation [ join_criteria ]
SELECt id
,name
,employee.deptno
,deptname
FROM employee
LEFT JOIN department
ON employee.deptno = department.deptno;
注意:无论 ON 后面再加什么条件都不能改变结果个数,能改变的只是右表查出来的数据是不是 NULL,示例:
-- 对 employee.name 进行过滤,只对右表生效,仔细想一下
SELECt id
,name
,employee.deptno
,deptname
FROM employee
LEFT JOIN department
ON employee.deptno = department.deptno AND employee.name = 'Lisa';
-- 对 epartment.deptno 进行过滤,只对右表生效,仔细想一下
SELECt id
,name
,employee.deptno
,deptname
FROM employee
LEFT JOIN department
ON employee.deptno = department.deptno AND department.deptno = 3;
RIGHT JOIN(右外连接 | 右连接)
右连接返回右表的所有值和左表的匹配值,如果没有匹配,则附加 NULL
-- 语法格式 relation RIGHT [ OUTER ] JOIN relation [ join_criteria ]
SELECt id
,name
,employee.deptno
,deptname
FROM employee
RIGHT JOIN department
ON employee.deptno = department.deptno;
员工表中增加一条数据,部门号为 3 ,然后再次执行查询
完全连接返回两个表的所有值,在没有匹配的一侧加 NULL 值,返回的行数是两个表的总行数
-- 语法格式 relation FULL [ OUTER ] JOIN relation [ join_criteria ]
SELECt id
,name
,employee.deptno
,deptname
FROM employee
FULL JOIN department
ON employee.deptno = department.deptno;
向 部门表 中加一条数据,部门号为 7 此时再看
交叉连接返回两个表的笛卡尔积(左表 3 行,右表 2 行,笛卡尔积后结果为 6 行)
-- 格式 relation CROSS JOIN relation [ join_criteria ]
SELECt id
,name
,employee.deptno
,deptname
FROM employee
CROSS JOIN department;
SEMI JOIN(半连接)
从左表返回与右表匹配的值(并不是返回左表的全部行,只返回与右表匹配的行,右表字段是不会展示出来的,左表的字段都会显示)
-- 语法格式 relation [ LEFT ] SEMI JOIN relation [ join_criteria ]
-- Spark 中为 SEMI JOIN SELECt * FROM employee SEMI JOIN department ON employee.deptno = department.deptno; -- Hive 中为 LEFT SEMI JOIN SELECt * FROM employee LEFT SEMI JOIN department ON employee.deptno = department.deptno;ANTI JOIN(反连接)
从与右表中不匹配的左表返回值
-- 语法格式 relation [ LEFT ] ANTI JOIN relation [ join_criteria ]
-- Spark 独有的格式,Hive 中不支持 SELECt * FROM employee ANTI JOIN department ON employee.deptno = department.deptno;



