- 使用案例一(单个LATERAL VIEW):split + explode + LATERAL VIEW
- 使用案例二(多个LATERAL VIEW):explode + LATERAL VIEW
The LATERAL VIEW clause is used in conjunction with generator functions such as EXPLODE, which will generate a virtual table containing one or more rows. LATERAL VIEW will apply the rows to each original output row.
LATERAL VIEW Clause - Spark 3.2.0 documentation (apache.org)
使用案例一(单个LATERAL VIEW):split + explode + LATERAL VIEW求出每个技能对应的最大的用户的年龄
表和数据
| user_id | user_name | age | skills |
|---|---|---|---|
| 1356 | kyle | 23 | Hadoop-Hive-Spark |
| 1357 | Jack | 22 | Hadoop-Hive |
| 1358 | Sam | 26 | Mysql-Oracle |
| 1359 | Lucy | 28 | Redis-Mysql |
| 1360 | Rose | 32 | Hadoop-Hive-Spark-Flink-Hbase |
| 1361 | Herry | 25 | Flink-Hbase-ClickHouse-Kafka |
| 1362 | Kelly | 27 | Spark-Flink-Hbase |
cache table user_info select '1356' user_id, 'kyle' user_name, 23 age, 'Hadoop-Hive-Spark' skills union select '1357' user_id, 'Jack' user_name, 22 age, 'Hadoop-Hive' skills union select '1358' user_id, 'Sam' user_name, 26 age, 'Mysql-Oracle' skills union select '1359' user_id, 'Luc' user_name, 28 age, 'Redis-Mysql' skills union select '1360' user_id, 'Rose' user_name, 32 age, 'Hadoop-Hive-Spark-Flink-Hbase' skills union select '1361' user_id, 'Harry' user_name, 25 age, 'Flink-Hbase-ClickHouse-Kafka' skills union select '1362' user_id, 'Kelly' user_name, 27 age, 'Spark-Flink-Hbase' skills;
需求分析
先从 skills 字段把每个技能分割出来,然后按照 user_id 和 skills 字段分组,求出最大的年龄
with t1 as (
-- 对 skills 字段进行切割并实现列转行
select user_id,
user_name,
age,
skill
from user_info
lateral view explode(split(skills,'-')) skill_table as skill
),
t2 as (
-- 按照 skill 分组 age 排序,为了标记每个技能对应的最大的用户信息
select *,
row_number() over(partition by skill order by age desc) rn
from t1
)
select
user_id,
user_name,
age,
skill
from t2
where rn = 1;
使用案例二(多个LATERAL VIEW):explode + LATERAL VIEW
将 skills 和 mark 字段全部转为列
表和数据
| user_id | user_name | age | skills | mark |
|---|---|---|---|---|
| 1356 | kyle | 23 | [“Hadoop”,“Hive”,“Spark”] | [“A”, “B”, “C”] |
| 1357 | Jack | 22 | [“Hadoop”,“Hive”] | [“A”, “D”, “E”] |
| 1358 | Sam | 26 | [“Mysql”,“Oracle”] | [“B”, “C”] |
| 1359 | Lucy | 28 | [“Redis”,“Mysql”] | [“D”, “E”] |
需求分析
由于 skills 和 mark 字段全部都是 Array
select
user_id,
user_name,
age,
skill,
mark
FROM baseTable
LATERAL VIEW explode(skills) view1 AS skill
LATERAL VIEW explode(mark) view2 AS mark;



