栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

hive sql场景题第三部分详解

hive sql场景题第三部分详解

11、行转列(行是水平、列是垂直 )

(行是从左往右,列是从前往后)

数据: t4表:

id tag flag 
a b 2
a b 1
a b 3
c d 6
c d 8
c d 8

编写sql实现如下结果:

id tag flag 
a b 1|2|3
c d 6|8
create table t4( 
id string, 
tag string, 
flag int 
)
row format delimited 
fields terminated by ' ' 
;
​
load data local inpath './hivedata/t4.txt' overwrite into table t4;
分析:
行转列就是从左完网右的数据变成从前往后
行:水平
列:垂直
思想:就是建立一张虚表,临时存储
​
​
select id,tag,
concat_ws("|",collect_set(cast(flag as string)))
from t4
group by id,tag;
​
result:
a       b       2|1|3
c       d       6|8

12、列转行(垂直转水平)

数据: t5表

uid name tags 
1	goudan	chihuo,huaci
2	mazi	sleep
3	laotie	paly

编写sql实现如下结果:

uid name tag 
1 goudan chihuo
1 goudan huaci
2 mazi sleep
3 laotie paly
create table t5( 
uid string, 
name string, 
tags string 
)
row format delimited 
fields terminated by 't' ;
​
load data local inpath './hivedata/t5.txt' overwrite into table t5;
列:垂直
行:水平
​
select uid,name,tag
from t5 
lateral view explode(split(tags,","))t as tag; 
​
result:
1       goudan  chihuo
1       goudan  huaci
2       mazi    sleep
3       laotie  paly
​

13、行转列

数据: content表:

uid contents 
1	i|love|china
2	china|is|good|i|i|like

统计结果如下,如果出现次数一样,则按照content名称排序:

+----------+------+--+
| content  | num  |
+----------+------+--+
| i        | 3    |
| china    | 2    |
| good     | 1    |
| is       | 1    |
| like     | 1    |
| love     | 1    |
+----------+------+--+
drop table content;
create table content( 
uid int, 
contents string 
)
row format delimited 
fields terminated by 't' 
;
load data local inpath './hivedata/content.txt' overwrite into table content; 
分析:
切分了单词,并且统计每一个单词出现的次数。
​
select content,count(1) num
from content lateral view explode(split(contents,"\|")) tmp as content
group by content
order by num desc,content asc;
​
result:
i       3
china   2
good    1
is      1
like    1
love    1
​

14、列转行

数据: course1表

id course 
1,a
1,b
1,c
1,e
2,a
2,c
2,d
2,f
3,a
3,b
3,c
3,e

根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):

+-----+----+----+----+----+----+----+--+
| id  | a  | b  | c  | d  | e  | f  |
+-----+----+----+----+----+----+----+--+
| 1   | 1  | 1  | 1  | 0  | 1  | 0  |
| 2   | 1  | 0  | 1  | 1  | 0  | 1  |
| 3   | 1  | 1  | 1  | 0  | 1  | 0  |
+-----+----+----+----+----+----+----+--+
create table course( 
id int, 
course string 
)
row format delimited 
fields terminated by ',' 
;
​
load data local inpath './hivedata/course.txt' overwrite into table course; 
分析:
id进行分组,也可以进行去重
利用窗口函数列出所有的属性值
​
使用的是case函数:
语法:(case when xxx=xxx then 1 else 0 end)as `a`;
​
select id,
sum(case when c.course="a" then 1 else 0 end) as `a`,
sum(case when c.course="b" then 1 else 0 end) as `b`,
sum(case when c.course="c" then 1 else 0 end) as `c`,
sum(case when c.course="d" then 1 else 0 end) as `d`,
sum(case when c.course="e" then 1 else 0 end) as `e`,
sum(case when c.course="f" then 1 else 0 end) as `f`
from course c
group by id;
​
case函数:就是在需要进行选择的时候,要么是1要么是0就可以使用
result:
1       1       1       1       0       1       0
2       1       0       1       1       0       1
3       1       1       1       0       1       0
​

15、时间戳函数:unix_timestamp,from_unixtime

获取当前时间戳:

select unix_timestamp();
​
result:
unix_timestamp(void) is deprecated. Use current_timestamp instead.
OK
1631804742
​

获取"2019-07-31 11:57:25"对应的时间戳:

select unix_timestamp("2019-07-31 11:57:25");

获取"2019-07-31 11:57"对应的时间戳:

select unix_timestamp("2019-07-31 11:57","yyyy-MM-dd HH:mm");

获取时间戳:1564545445所对应的日期和时分秒:

select from_unixtime(1564545445);
result:
2019-07-31 11:57:25

获取时间戳:1564545446所对应的日期和小时(yyyy/MM/dd HH):

select from_unixtime(1564545446,"yyyy/MM/dd HH");
​
result:
2019/07/31 11
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/285894.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号