11、行转列(行是水平、列是垂直 )
(行是从左往右,列是从前往后)
数据: t4表:
id tag flag a b 2 a b 1 a b 3 c d 6 c d 8 c d 8
编写sql实现如下结果:
id tag flag a b 1|2|3 c d 6|8
create table t4( id string, tag string, flag int ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/t4.txt' overwrite into table t4;
分析:
行转列就是从左完网右的数据变成从前往后
行:水平
列:垂直
思想:就是建立一张虚表,临时存储
select id,tag,
concat_ws("|",collect_set(cast(flag as string)))
from t4
group by id,tag;
result:
a b 2|1|3
c d 6|8
12、列转行(垂直转水平)
数据: t5表
uid name tags 1 goudan chihuo,huaci 2 mazi sleep 3 laotie paly
编写sql实现如下结果:
uid name tag 1 goudan chihuo 1 goudan huaci 2 mazi sleep 3 laotie paly
create table t5( uid string, name string, tags string ) row format delimited fields terminated by 't' ; load data local inpath './hivedata/t5.txt' overwrite into table t5;
列:垂直 行:水平 select uid,name,tag from t5 lateral view explode(split(tags,","))t as tag; result: 1 goudan chihuo 1 goudan huaci 2 mazi sleep 3 laotie paly
13、行转列
数据: content表:
uid contents 1 i|love|china 2 china|is|good|i|i|like
统计结果如下,如果出现次数一样,则按照content名称排序:
+----------+------+--+ | content | num | +----------+------+--+ | i | 3 | | china | 2 | | good | 1 | | is | 1 | | like | 1 | | love | 1 | +----------+------+--+
drop table content; create table content( uid int, contents string ) row format delimited fields terminated by 't' ; load data local inpath './hivedata/content.txt' overwrite into table content;
分析: 切分了单词,并且统计每一个单词出现的次数。 select content,count(1) num from content lateral view explode(split(contents,"\|")) tmp as content group by content order by num desc,content asc; result: i 3 china 2 good 1 is 1 like 1 love 1
14、列转行
数据: course1表
id course 1,a 1,b 1,c 1,e 2,a 2,c 2,d 2,f 3,a 3,b 3,c 3,e
根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):
+-----+----+----+----+----+----+----+--+ | id | a | b | c | d | e | f | +-----+----+----+----+----+----+----+--+ | 1 | 1 | 1 | 1 | 0 | 1 | 0 | | 2 | 1 | 0 | 1 | 1 | 0 | 1 | | 3 | 1 | 1 | 1 | 0 | 1 | 0 | +-----+----+----+----+----+----+----+--+
create table course( id int, course string ) row format delimited fields terminated by ',' ; load data local inpath './hivedata/course.txt' overwrite into table course;
分析: id进行分组,也可以进行去重 利用窗口函数列出所有的属性值 使用的是case函数: 语法:(case when xxx=xxx then 1 else 0 end)as `a`; select id, sum(case when c.course="a" then 1 else 0 end) as `a`, sum(case when c.course="b" then 1 else 0 end) as `b`, sum(case when c.course="c" then 1 else 0 end) as `c`, sum(case when c.course="d" then 1 else 0 end) as `d`, sum(case when c.course="e" then 1 else 0 end) as `e`, sum(case when c.course="f" then 1 else 0 end) as `f` from course c group by id; case函数:就是在需要进行选择的时候,要么是1要么是0就可以使用 result: 1 1 1 1 0 1 0 2 1 0 1 1 0 1 3 1 1 1 0 1 0
15、时间戳函数:unix_timestamp,from_unixtime
获取当前时间戳:
select unix_timestamp(); result: unix_timestamp(void) is deprecated. Use current_timestamp instead. OK 1631804742
获取"2019-07-31 11:57:25"对应的时间戳:
select unix_timestamp("2019-07-31 11:57:25");
获取"2019-07-31 11:57"对应的时间戳:
select unix_timestamp("2019-07-31 11:57","yyyy-MM-dd HH:mm");
获取时间戳:1564545445所对应的日期和时分秒:
select from_unixtime(1564545445); result: 2019-07-31 11:57:25
获取时间戳:1564545446所对应的日期和小时(yyyy/MM/dd HH):
select from_unixtime(1564545446,"yyyy/MM/dd HH"); result: 2019/07/31 11



