hive sql场景题第三部分详解

11、行转列（行是水平、列是垂直）

（行是从左往右，列是从前往后）

数据： t4表：

id tag flag 
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8

编写sql实现如下结果：

id tag flag 
a b 1|2|3
c d 6|8

create table t4( 
id string, 
tag string, 
flag int 
)
row format delimited 
fields terminated by ' ' 
;

load data local inpath './hivedata/t4.txt' overwrite into table t4;

分析：
行转列就是从左完网右的数据变成从前往后
行：水平
列：垂直
思想：就是建立一张虚表，临时存储


select id,tag,
concat_ws("|",collect_set(cast(flag as string)))
from t4
group by id,tag;

result:
a       b       2|1|3
c       d       6|8

12、列转行(垂直转水平)

数据： t5表

uid name tags 
1	goudan	chihuo,huaci
2	mazi	sleep
3	laotie	paly

编写sql实现如下结果：

uid name tag 
1 goudan chihuo
1 goudan huaci
2 mazi sleep
3 laotie paly

create table t5( 
uid string, 
name string, 
tags string 
)
row format delimited 
fields terminated by 't' ;

load data local inpath './hivedata/t5.txt' overwrite into table t5;

列：垂直
行：水平

select uid,name,tag
from t5 
lateral view explode(split(tags,","))t as tag; 

result:
1       goudan  chihuo
1       goudan  huaci
2       mazi    sleep
3       laotie  paly

13、行转列

数据： content表：

uid contents 
1	i|love|china
2	china|is|good|i|i|like

统计结果如下,如果出现次数一样，则按照content名称排序：

+----------+------+--+
| content  | num  |
+----------+------+--+
| i        | 3    |
| china    | 2    |
| good     | 1    |
| is       | 1    |
| like     | 1    |
| love     | 1    |
+----------+------+--+

drop table content;
create table content( 
uid int, 
contents string 
)
row format delimited 
fields terminated by 't' 
;
load data local inpath './hivedata/content.txt' overwrite into table content;

分析:
切分了单词，并且统计每一个单词出现的次数。

select content,count(1) num
from content lateral view explode(split(contents,"\|")) tmp as content
group by content
order by num desc,content asc;

result:
i       3
china   2
good    1
is      1
like    1
love    1

14、列转行

数据： course1表

id course 
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e

根据编写sql，得到结果如下(表中的1表示选修，表中的0表示未选修)：

+-----+----+----+----+----+----+----+--+
| id  | a  | b  | c  | d  | e  | f  |
+-----+----+----+----+----+----+----+--+
| 1   | 1  | 1  | 1  | 0  | 1  | 0  |
| 2   | 1  | 0  | 1  | 1  | 0  | 1  |
| 3   | 1  | 1  | 1  | 0  | 1  | 0  |
+-----+----+----+----+----+----+----+--+

create table course( 
id int, 
course string 
)
row format delimited 
fields terminated by ',' 
;

load data local inpath './hivedata/course.txt' overwrite into table course;

分析：
id进行分组，也可以进行去重
利用窗口函数列出所有的属性值

使用的是case函数：
语法：（case when xxx=xxx then 1 else 0 end）as `a`;

select id,
sum(case when c.course="a" then 1 else 0 end) as `a`,
sum(case when c.course="b" then 1 else 0 end) as `b`,
sum(case when c.course="c" then 1 else 0 end) as `c`,
sum(case when c.course="d" then 1 else 0 end) as `d`,
sum(case when c.course="e" then 1 else 0 end) as `e`,
sum(case when c.course="f" then 1 else 0 end) as `f`
from course c
group by id;

case函数:就是在需要进行选择的时候，要么是1要么是0就可以使用
result：
1       1       1       1       0       1       0
2       1       0       1       1       0       1
3       1       1       1       0       1       0

15、时间戳函数：unix_timestamp，from_unixtime

获取当前时间戳：

select unix_timestamp();

result:
unix_timestamp(void) is deprecated. Use current_timestamp instead.
OK
1631804742

获取"2019-07-31 11:57:25"对应的时间戳:

select unix_timestamp("2019-07-31 11:57:25");

获取"2019-07-31 11:57"对应的时间戳：

select unix_timestamp("2019-07-31 11:57","yyyy-MM-dd HH:mm");

获取时间戳:1564545445所对应的日期和时分秒：

select from_unixtime(1564545445);
result:
2019-07-31 11:57:25

获取时间戳:1564545446所对应的日期和小时(yyyy/MM/dd HH):

select from_unixtime(1564545446,"yyyy/MM/dd HH");

result:
2019/07/31 11

hive sql场景题第三部分详解

大数据系统相关栏目本月热门文章