第一题:
了解哪些窗口函数,都是什么功能?找一个在某个业务中的应用? 手写窗口函数及功能意义,同时随便写一个带窗口函数的sql,并说明其sql的含义。
2、求出每个栏目的被观看次数及累计观看时长?
数据: video表
uid channel min 1 1 23 2 1 12 3 1 12 4 1 32 5 1 342 6 2 13 7 2 34 8 2 13 9 2 134
create table video( uid int, channel string, min int ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/video.txt' into table video;
答案:
select channel,count(*) count,sum(min) total from video group by channel;
3、编写sql实现
数据:
userid,month,visits A,2015-01,5 A,2015-01,15 B,2015-01,5 A,2015-01,8 B,2015-01,25 A,2015-01,5 A,2015-02,4 A,2015-02,6 B,2015-02,10 B,2015-02,5 A,2015-03,16 A,2015-03,22 B,2015-03,23 B,2015-03,10 B,2015-03,1
drop table visits; create table visits( userid string, month string, visits int ) row format delimited fields terminated by ',' ; load data local inpath './hivedata/visits.txt' overwrite into table visits;
完成需求:每个用户截止到每月为止的最大单月访问次数和累计到该月的总访问次数,结果数据格式如下:
+---------+----------+---------+-------------+---------------+--+ | userid | month | visits | max_visits | total_visits | +---------+----------+---------+-------------+---------------+--+ | A | 2015-01 | 33 | 33 | 33 | | A | 2015-02 | 10 | 33 | 43 | | A | 2015-03 | 38 | 38 | 81 | | B | 2015-01 | 30 | 30 | 30 | | B | 2015-02 | 15 | 30 | 45 | | B | 2015-03 | 34 | 34 | 79 | +---------+----------+---------+-------------+---------------+--+
select userid,month, visits, max(visits) over(distribute by userid sort by month) max_visits, sum(visits) over(distribute by userid sort by month) total_visits from ( select userid,month, sum(visits) visits from visits group by userid,month ) t;
4、编写连续7天登录的总人数:
数据: t1表
Uid dt login_status(1登录成功,0异常) 1 2019-07-11 1 1 2019-07-12 1 1 2019-07-13 1 1 2019-07-14 1 1 2019-07-15 1 1 2019-07-16 1 1 2019-07-17 1 1 2019-07-18 1 2 2019-07-11 1 2 2019-07-12 1 2 2019-07-13 0 2 2019-07-14 1 2 2019-07-15 1 2 2019-07-16 0 2 2019-07-17 1 2 2019-07-18 0 2 2019-07-19 1 2 2019-07-20 0 2 2019-07-21 1 2 2019-07-22 0 2 2019-07-23 1 2 2019-07-24 0 3 2019-07-11 1 3 2019-07-12 1 3 2019-07-13 1 3 2019-07-14 1 3 2019-07-15 1 3 2019-07-16 1 3 2019-07-17 1 3 2019-07-18 1
drop table login; create table login( Uid int, dt string, login_status int ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/login.txt' into table login;
select count(*) from ( select distinct uid from( select uid,dt,lag(dt,6) over(partition by uid order by dt) pre_dt, sum(login_status) over(partition by uid order by dt rows between 6 preceding and current row) total from login ) t where date_sub(dt,6)=pre_dt and t.total=7) t1;
5、你知道的排名函数有哪些?说一说它们之间的区别? 文字说明即可
6、编写sql语句实现每班前三名,分数一样不并列,同时求出前三名按名次排序的一次的分差:
数据: stu表
Stu_no class score 1 1901 90 2 1901 90 3 1901 83 4 1901 60 5 1902 66 6 1902 23 7 1902 99 8 1902 67 9 1902 87
drop table stu; create table stu( Stu_no int, class string, score int ) row format delimited fields terminated by 't' ; load data local inpath './hivedata/stu.txt' into table stu;
编写sql实现,结果如下:
+--------+---------+--------+-----+----------+--+ | class | stu_no | score | rn | rn_diff | +--------+---------+--------+-----+----------+--+ | 1901 | 2 | 90 | 1 | 90 | | 1901 | 1 | 90 | 2 | 0 | | 1901 | 3 | 83 | 3 | -7 | | 1902 | 7 | 99 | 1 | 99 | | 1902 | 9 | 87 | 2 | -12 | | 1902 | 8 | 67 | 3 | -20 | +--------+---------+--------+-----+----------+--+
select class,stu_no,score,rn,rn_diff from( select class,stu_no,score, row_number() over(partition by class order by score desc) rn, score-nvl(lag(score,1) over(partition by class order by score desc),0) rn_diff from stu ) t where t.rn<4;
7、对于行列互换,你有哪些解决方式,详细说明每一种方式? 使用语言描述即可
8、编写sql实现行列互换。数据如下:
id sid subject int 1,001,语文,90 2,001,数学,92 3,001,英语,80 4,002,语文,88 5,002,数学,90 6,002,英语,75.5 7,003,语文,70 8,003,数学,85 9,003,英语,90 10,003,政治,82
编写sql实现,得到结果如下:
+---------+--------+--------+--------+--------+-----------+--+ | sid | 语文 | u2.数学 | u2.英语 | u2.政治 | u2.total | +---------+--------+--------+--------+--------+-----------+--+ | 001 | 90.0 | 92.0 | 80.0 | 0.0 | 262.0 | | 002 | 88.0 | 90.0 | 75.5 | 0.0 | 253.5 | | 003 | 70.0 | 85.0 | 90.0 | 82.0 | 327.0 | | total | 248.0 | 267.0 | 245.5 | 82.0 | 842.5 | +---------+--------+--------+--------+--------+-----------+--+
drop table score; create table score( id int, sid string, subject string, score double ) row format delimited fields terminated by ',' ; load data local inpath './hivedata/score.txt' into table score;
select sid, sum(if(subject="语文",score,0)) as `语文`, sum(if(subject="数学",score,0)) as `数学`, sum(case when subject="英语" then score else 0 end) as `英语`, sum(case when subject="政治" then score else 0 end) as `政治`, sum(score) total from score group by sid union select "total",sum(`语文`),sum(`数学`),sum(`英语`),sum(`政治`),sum(total) from ( select sid, sum(if(subject="语文",score,0)) as `语文`, sum(if(subject="数学",score,0)) as `数学`, sum(case when subject="英语" then score else 0 end) as `英语`, sum(case when subject="政治" then score else 0 end) as `政治`, sum(score) total from score group by sid )t;
9、编写sql实现如下:
数据: t1表
uid tags 1 1,2,3 2 2,3 3 1,2
编写sql实现如下结果:
uid tag 1 1 1 2 1 3 2 2 2 3 3 1 3 2
create table t1( uid int, tags string ) row format delimited fields terminated by 't' ; load data local inpath './hivedata/t1.txt' into table t1;
select uid,tag from t1 lateral view explode(split(tags,",")) t as tag;
10、行转列
数据: T2表:
Tags 1,2,3 1,2 2,3
T3表:
id lab 1 A 2 B 3 C
根据T2和T3表的数据,编写sql实现如下结果:
+--------+--------+--+ | tags | labs | +--------+--------+--+ | 1,2 | A,B | | 1,2,3 | A,B,C | | 2,3 | B,C | +--------+--------+--+
create table t2( tags string ); load data local inpath './hivedata/t2.txt' overwrite into table t2; create table t3( id int, lab string ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/t3.txt' overwrite into table t3;
select tags,
concat_ws(",",collect_set(lab)) labs
from
(select tags,lab
from
(select tags,tag
from t2 lateral view explode(split(tags,",")) A as tag) B
join t3 on B.tag = t3.id) C
group by tags;
11、行转列
数据: t4表:
id tag flag a b 2 a b 1 a b 3 c d 6 c d 8 c d 8
编写sql实现如下结果:
id tag flag a b 1|2|3 c d 6|8
create table t4(
id string,
tag string,
flag int
)
row format delimited
fields terminated by ' '
;
load data local inpath './hivedata/t4.txt' overwrite into table t4;
select id,tag,concat_ws("|",collect_set(cast(flag as string))) from t4 group by id,tag;
12、列转行
数据: t5表
uid name tags 1 goudan chihuo,huaci 2 mazi sleep 3 laotie paly
编写sql实现如下结果:
uid name tag 1 goudan chihuo 1 goudan huaci 2 mazi sleep 3 laotie paly
create table t5( uid string, name string, tags string ) row format delimited fields terminated by 't' ; load data local inpath './hivedata/t5.txt' overwrite into table t5;
select uid, name, tag from t5 lateral view explode(split(tags,",")) t1 as tag ;
13、行转列
数据: content表:
uid contents 1 i|love|china 2 china|is|good|i|i|like
统计结果如下,如果出现次数一样,则按照content名称排序:
+----------+------+--+ | content | num | +----------+------+--+ | i | 3 | | china | 2 | | good | 1 | | is | 1 | | like | 1 | | love | 1 | +----------+------+--+
drop table content; create table content( uid int, contents string ) row format delimited fields terminated by 't' ; load data local inpath './hivedata/content.txt' overwrite into table content;
select content,count(1) num from content lateral view explode(split(contents,"\|")) tmp as content group by content order by num desc,content;
14、列转行
数据: course1表
id course 1,a 1,b 1,c 1,e 2,a 2,c 2,d 2,f 3,a 3,b 3,c 3,e
根据编写sql,得到结果如下(表中的1表示选修,表中的0表示未选修):
+-----+----+----+----+----+----+----+--+ | id | a | b | c | d | e | f | +-----+----+----+----+----+----+----+--+ | 1 | 1 | 1 | 1 | 0 | 1 | 0 | | 2 | 1 | 0 | 1 | 1 | 0 | 1 | | 3 | 1 | 1 | 1 | 0 | 1 | 0 | +-----+----+----+----+----+----+----+--+
create table course( id int, course string ) row format delimited fields terminated by ',' ; load data local inpath './hivedata/course.txt' overwrite into table course;
select id, sum(case when c.course="a" then 1 else 0 end) as `a`, sum(case when c.course="b" then 1 else 0 end) as `b`, sum(case when c.course="c" then 1 else 0 end) as `c`, sum(case when c.course="d" then 1 else 0 end) as `d`, sum(case when c.course="e" then 1 else 0 end) as `e`, sum(case when c.course="f" then 1 else 0 end) as `f` from course c group by id ;
15、时间戳函数:unix_timestamp,from_unixtime
获取当前时间戳:
select unix_timestamp();
获取"2019-07-31 11:57:25"对应的时间戳:
select unix_timestamp("2019-07-31 11:57:25");
获取"2019-07-31 11:57"对应的时间戳:
select unix_timestamp("2019-07-31 11:57","yyyy-MM-dd HH:mm");
获取时间戳:1564545445所对应的日期和时分秒:
select from_unixtime(1564545445);
获取时间戳:1564545446所对应的日期和小时(yyyy/MM/dd HH):
select from_unixtime(1564545445,"yyyy/mm/dd HH");
16、时间格式转换:yyyyMMdd -> yyyy-MM-dd
数据: dt表
20190730 20190731
编写sql实现如下的结果:
2019-07-30 2019-07-31
create table dt( dt string ); load data local inpath './hivedata/dt.txt' overwrite into table dt;
select from_unixtime(unix_timestamp(dt,"yyyyMMdd"),"yyyy-MM-dd") from dt;
17、数据: 店铺,月份,金额
sid month money a,01,150 a,01,200 b,01,1000 b,01,800 c,01,250 c,01,220 b,01,6000 a,02,2000 a,02,3000 b,02,1000 b,02,1500 c,02,350 c,02,280 a,03,350 a,03,250
create table store( sid string, month string, money int ) row format delimited fields terminated by ',' ; load data local inpath './hivedata/store.txt' overwrite into table store;
编写Hive的HQL语句求出每个店铺的当月销售额和累计到当月的总销售额?
select sid,month,sales, sum(sales) over(partition by sid order by month) total from ( select sid,month,sum(money) sales from store group by sid,month ) t;
18、Hive是否发生过数据倾斜,怎么处理的,原理是什么?
使用语言表述即可。
19、Hive中什么时候使用过array和map,为什么使用?
语言描述即可
20、使用sql编写下面的需求:
Table A 是一个用户登陆时间记录表,当月每次登陆一次会记录一条记录。A表如下:
log_time uid 2018-10-01 12:34:11 123 2018-10-02 13:21:08 123 2018-10-02 14:21:08 123 2018-10-02 14:08:09 456 2018-10-04 05:10:22 123 2018-10-04 21:38:38 456 2018-10-04 22:38:38 456 2018-10-05 09:57:32 123 2018-10-06 13:22:56 123 2018-11-01 12:34:11 123 2018-11-02 13:21:08 123 2018-11-02 14:21:08 123 2018-11-02 14:08:09 456 2018-11-04 05:10:22 123 2018-11-04 21:38:38 456 2018-11-05 09:57:32 123 2018-11-06 13:22:56 123
需计算出每个用户本月最大连续登陆天数。如表A样例数据中,用户123最大连续登陆天数为3,而用户456最大连续登陆天数为1
drop table login_time; create table login_time( log_time timestamp, uid string ) row format delimited fields terminated by 't'; load data local inpath './hivedata/login_time.txt' overwrite into table login_time; -- 注意:可能需要对原始数据做清洗,保证每个用户每天只有一条登录信息
select uid,max(cnt) from (select uid,date_format(dt,"yyyy-MM"),count(1) cnt from (select uid,dt,date_sub(dt,row_number() over(partition by uid order by dt)) grp_dt from ( select distinct uid,date_format(log_time,"yyyy-MM-dd") as dt from login_time )a )b group by uid,grp_dt,date_format(dt,"yyyy-MM") )c group by uid;
21、使用sql实现如下:
样例数据: t1表
gender,cookie,ip,timestampe,ua F,1707041428491566106,111.200.195.186,1208524973899,Dalvik%2F2.1.0%20%28Linux%3B%20U%3B%20Android
...具体数据如下图
将图片中的awk修改为使用sql编写,然后将上诉题作出回答?
统计pv/uv的使用sql,其它问题语言描述即可。
22、使用hive求出两个数据集的差集?
数据 diff_t1表:
id name 1 zs 2 ls
diff_t2表:
id name 1 zs 3 ww
结果如下:
id name 2 ls 3 ww
create table diff_t1( id string, name string ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/diff_t1.txt' overwrite into table diff_t1; create table diff_t2( id string, name string ) row format delimited fields terminated by ' ' ; load data local inpath './hivedata/diff_t2.txt' overwrite into table diff_t2;
select t1.id id, t1.name name from diff_t1 t1 left join diff_t2 t2 on t1.id=t2.id where t2.id is null union select t2.id id, t2.name name from diff_t1 t1 right join diff_t2 t2 on t1.id=t2.id where t1.id is null ;
23、使用hive的hql实现如下需求
现有某网站购买记录字段如下
orderid,userid,productid,price,timestamp,date 121,张三,3,100,1535945356,2018-08-07 122,张三,3,200,1535945356,2018-08-08 123,李四,3,200,1535945356,2018-08-08 124,王五,1,200,1535945356,2018-08-08 125,张三,3,200,1535945356,2018-08-09 126,张三,2,200,1535945356,2018-08-09 127,李四,3,200,1535945356,2018-08-09 128,李四,3,200,1535945356,2018-08-10 129,李四,3,200,1535945356,2018-08-11
用sql统计今日及昨日都购买过商品productid为3的用户及其昨日消费。
drop table product; create table product( orderid string, userid string, productid int, price int, tamp int, dt date ) row format delimited fields terminated by ','; load data local inpath './hivedata/product.txt' overwrite into table product;
select * from( select userid,dt, lag(dt,1) over(partition by userid order by dt) yestoday, case when datediff(dt,lag(dt,1) over(partition by userid order by dt))=1 then lag(total_price) over(partition by userid order by dt) else null end yestoday_price from ( select userid,dt,productid,sum(price) over(partition by userid,dt order by dt) total_price from product order by userid,dt )A where productid = 3 )B where yestoday_price is not null;
24、使用hive的hql如下:
表user_action_log用户行为故据
uid time action 1 Time1 Read 3 Time2 Comment 1 Time3 Share 2 Time4 Like 1 Time5 Write 2 Time6 like 3 Time7 Write 2 Time8 Read
分析用户行为习惯找到毎一个用户在表中的第一次行为
drop table user_action_log; create table user_action_log( uid int, time string, action string ) row format delimited fields terminated by 't'; load data local inpath './hivedata/user_action_log.txt' overwrite into table user_action_log;
使用代码实现
select uid,time,action from ( select uid,time,action,row_number() over(partition by uid order by time) rn from user_action_log )A where rn = 1;
25、每个用户连续登陆的最大天数?
数据: user_login表
uid,dt 1,2019-08-01 1,2019-08-02 1,2019-08-03 2,2019-08-01 2,2019-08-02 3,2019-08-01 3,2019-08-03 4,2019-07-28 4,2019-07-29 4,2019-08-01 4,2019-08-02 4,2019-08-03
结果如下:
uid cnt_days 1 3 2 2 3 1 4 3
create table user_login( uid int, dt date ) row format delimited fields terminated by ','; load data local inpath './hivedata/user_login.txt' overwrite into table user_login;
select uid,max(cnt) from (select uid,dt_sub,count(1) cnt from (select uid,dt, date_sub(dt,row_number() over(partition by uid order by dt)) dt_sub from user_login )A group by uid, dt_sub )B group by uid;
26、请使用sql计算pv、uv?
数据:
t1表
uid dt url 1 2019-08-06 http://www.baidu.com 2 2019-08-06 http://www.baidu.com 3 2019-08-06 http://www.baidu.com 3 2019-08-06 http://www.soho.com 3 2019-08-06 http://www.meituan.com 3 2019-08-06
结果如下:
dt uv pv 2019-08-6 3 5
create table user_net_log( uid int, dt date, url string ) row format delimited fields terminated by ' '; load data local inpath './hivedata/user_net_log.txt' overwrite into table user_net_log;
select dt,count(distinct uid),count(url) from user_net_log group by dt;
27、hive中coalease()、nvl()、concat_ws()、collect_list()、collect_set()、regexp_replace().这几个函 数的意义?
使用语言描述即可。
28、有表如下记录了智智同学玩某moba游戏每局的英雄pick情况
pk_moba表
id names 1 亚索,挖据机,艾瑞莉亚,洛,卡莎 2 亚索,盖伦,奥巴马,牛头,皇子 3 亚索,盖伦,艾瑞莉亚,宝石,琴女 4 亚索,盖伦,赵信,老鼠,锤石
请用HiveSQL计算出出场次数最多的top3英雄及其Pick率(=出现场数/总场数)
create table pk_moba( id int, names array) row format delimited fields terminated by 't' collection items terminated by ','; load data local inpath './hivedata/pk_moba.txt' overwrite into table pk_moba;
select name,cnt,rn,concat(round(cnt/4*100,0),"%") lv from ( select name,cnt,dense_rank() over(sort by cnt desc) rn from ( select name,count(name) cnt from pk_moba lateral view explode(names) tmp as name group by name )A )B where rn<4;
29、有三个表,分别是:
区域(district) 区域中有两个字段分别是区域Id(disid)和区域名称(disname)
城市(city) 城市有两个字段分别是城市ID(cityid)和区域ID(disid)
订单(order) 订单有四个字段分别是订单ID(orderid)、用户ID(userid)、城市ID(cityid)和消费金额(amount)。
district表:
disid disname 1 华中 2 西南 create table district( disid int, disname string ) row format delimited fields terminated by ' '; load data local inpath './hivedata/district.txt' overwrite into table district;
city表:
cityid disid 1 1 2 1 3 2 4 2 5 2 create table city( cityid int, disid int ) row format delimited fields terminated by ' '; load data local inpath './hivedata/city.txt' overwrite into table city;
order表:
oid userid cityid amount 1 1 1 1223.9 2 1 1 9999.9 3 2 2 2322 4 2 2 8909 5 2 3 6789 6 2 3 798 7 3 4 56786 8 4 5 78890 create table order_t( oid int, userid int, cityid int, amount float ) row format delimited fields terminated by ' '; load data local inpath './hivedata/order.txt' overwrite into table order_t;
高消费者是消费金额大于1W的用户,使用hive hql生成如下报表:
区域名 高消费者人数 消费总额
select disname,count(1),sum(amount) from( select disname,userid,sum(amount) amount from district A join city B on A.disid = B.disid join order_t C on B.cityid = C.cityid group by disname,userid )D where amount>10000 group by disname;
30、某APP每天访问数据存放在表access_log里面,包含日期字段ds,用户类型字段user_type,用户账 号user_id,用户访问时间log_time,请使用hive的hql语句实现如下需求:
(1)、每天整体的访问UV、PV?
select log_time,count(distinct user_id) uv,count(1) pv from access_log group by log_time
(2)、每天每个类型的访问UV、PV?
select log_time,user_type,count(distinct user_id) uv,count(1) pv from access_log group by log_time;
(3)、每天每个类型中最早访问时间和最晚访问时间?
select log_time,user_type,min(log_time),max(log_time) from access_log group by log_time;
(4)、每天每个类型中访问次数最高的10个用户?
select dt,user_type,cnt,rn from( select dt,user_type,cnt,row_number() over(partition by dt,user_type order by cnt desc) rn from (select substr(log_time,1,7) dt,user_type,count(1) cnt from access_log group by substr(log_time,1,7),user_type )A )B where rn <11;
31、一张大表A(上亿条记录)和小表B(几千条记录),如果join出现数据倾斜,有什么解决办法?
使用语言描述即可
32、有如下三张表:
表login_a(登录表):
ds user_id 2019-08-06 1 2019-08-06 2 2019-08-06 3 2019-08-06 4 create table login_a( ds date, user_id int ) row format delimited fields terminated by ' '; load data local inpath './hivedata/login_a.txt' overwrite into table login_a;
表read_b(阅读表):
ds user_id read_num 2019-08-06 1 2 2019-08-06 2 3 2019-08-06 3 6 create table read_b( ds date, user_id int, read_num int ) row format delimited fields terminated by ' '; load data local inpath './hivedata/read_b.txt' overwrite into table read_b;
表cost_c(付费表):
ds user_id price 2019-08-06 1 55.6 2019-08-06 2 55.8 create table cost_c( ds date, user_id int, price float ) row format delimited fields terminated by ' '; load data local inpath './hivedata/cost_c.txt' overwrite into table cost_c;
基于上述三张表,请使用hive的hql语句实现如下需求:
(1)、用户登录并且当天有个阅读的用户数,已经阅读书籍数量
select A.ds,count(distinct A.user_id),sum(B.read_num) from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds group by A.ds;
(2)、用户登录并且阅读,但是没有付费的用户数
select A.ds,count(1) from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds left join cost_c C on B.user_id = C.user_id and B.ds = C.ds where C.price is null group by A.ds;
(3)、用户登录并且付费,付费用户数量和金额总数
select A.ds,count(1),sum(price) from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds left join cost_c C on B.user_id = C.user_id and B.ds = C.ds group by A.ds;
33、hive的hql中,left outer join和left semi join的区别?
使用语言描述即可。
34、有一个订单表和渠道表,结构如下:
create table order( order_id long, user_id long comment '用户id', amount double comment '订单金额', channel string comment '渠道', time string comment '订单时间,yyyy-MM-dd HH:mi:ss' ) partition by (dt string comment '天,yyyy-MM-dd');
请使用hive hql查询出2019-08-06号 每个渠道的下单用户数、订单总金额。
hql语句实现,结果表头如下: channel user_num order_amount
select channel,count(user_id) user_num,sum(amount) order_acount from order where to_date(time) = '2019-08-06' group by channel;
35、考虑表设计和sql查询:
设计数据库表,用来存放学生基本信息,课程信息,学生的课程及成绩,并给出查询语句,查询平均成绩大于85的所有学生。
create table stu_1( id string, name string, age int, addr string ) row format delimited fields terminated by ',' ; create table course_1( cid string, cname string ) row format delimited fields terminated by ',' ; create table course_sc( id string, cid string, score int ) row format delimited fields terminated by ',' ; load data local inpath '/hivedata/course_1.txt' overwrite into table course_1; load data local inpath '/hivedata/stu_1.txt' overwrite into table stu_1; load data local inpath '/hivedata/course_sc.txt' overwrite into table course_sc;
select cs.id, avg(score) avgscore from course_sc cs group by cs.id having avg(score)>85 ;
36、需求如下:
有用户表user(uid,name) 以及黑名单BanUser(uid) 1. 用left join 方式写sql查出所有不在黑名单的用户信息 2. 用not exists 方法写sql查询所有不在黑名单的用户信息
create table u( id string, name string ) row format delimited fields terminated by ',' ; create table banuser( id string ); load data local inpath '/hivedata/banuser.txt' overwrite into table banuser; load data local inpath '/hivedata/u.txt' overwrite into table u;
select u.* from u left join banuser on u.id = banuser.id where banuser.id is null; select u.* from u where not exists (select 1 from banuser where banuser.id = u.id);
37、需求如下:
course_score表数据:
1,zhangsan,数学,80,2015 2,lisi,语文,90,2016 3,lisi,数学,70,2016 4,wangwu,化学,80,2017 5,zhangsan,语文,85,2015 6,zhangsan,化学,90,2015
create table course_score( id string, name string, course string, score int, year string ) row format delimited fields terminated by ',' ; load data local inpath './hivedata/course_score.txt' overwrite into table course_score;
1、查出每个学期每门课程最高分记录(包含全部5个字段)
select id,name,course,score,year,max(score) over(partition by year,course) from course_score; select year,course,max(score) from course_score group by year,course order by year;
2、查出单个学期中语文课在90分以上的学生的数学成绩记录(包含全部字段)
select A.id,A.name,A.course,A.score,A.year from course_score A join course_score B on A.name = B.name where B.course='语文' and B.score>=90 and A.course='数学';
select s.id, s.name, s.course, s.score, s.year from course_score s join (select course, year, max(score) score from course_score group by course,year) t1 on s.course=t1.course and s.year=t1.year and s.score=t1.score ;
select s.id, s.name, s.course, s.score, s.year from course_score s join (select id, name, course, score, year from course_score where score >=90 and course="语文") t1 on s.name=t1.name where s.course="数学" ;
38、需求如下
t1表:
name course score aa English 75 bb math 85 aa math 90 create table t1_1( name string, course string, score int ) row format delimited fields terminated by ' '; load data local inpath './hivedata/t1_1.txt' overwrite into table t1_1;
使用hql输出以下结果
name English math aa 75 90 bb 0 85
select name, max(if(course="English",score,0)) English, max(if(course="math",score,0)) math from t1_1 group by name;
39、需求如下千锋好程序大数据学院
t1表:
用户 商品 A P1 B P1 A P2 B P3
请你使用hql变成如下结果:
用户 P1 P2 P3 A 1 1 0 B 1 0 1
select username, max(if(product="p1",1,0)) p1, max(if(product="p2",1,0)) p2, max(if(product="p3",1,0)) p3 from t1 group by username;
40、需求如下
dpt部门
dpt_id dpt_name 1 产品 2 技术
User用户表
User_id Dpt_id 1 1 2 1 3 2 4 2 5 3
result表
user_id dpt_id dpt_name 1 1 产品 2 1 产品 3 2 技术 4 2 技术 5 3 其他部门
原数据表时user和dpt表,请使用hql实现result表中的结果.
select user.user_id,user.dpt_id,dpt.dpt_id from user join dpt on user.dpt_id = dpt.dpt_id;
41、需求如下
t1_order表:
order_id order_type order_time 111 N 10:00 111 A 10:05 111 B 10:10 create table t1_order( order_id string, order_type string, order_time string ) row format delimited fields terminated by ' '; load data local inpath './hivedata/t1_order.txt' overwrite into table t1_order;
是用hql获取结果如下:
order_id order_type_1 order_type_2 order_time_1 order_time_2 111 N A 10:00 10:05 111 A B 10:05 10:10
select order_id,order_type_1,order_type_2,order_time_1,order_time_2 from( select order_id,order_type order_type_1, lead(order_type,1) over(sort by order_time) order_type_2, order_time order_time_1, lead(order_time,1) over(sort by order_time) order_time_2 from t1_order )A where order_type_2 is not null;
42、需求如下
t1_hobby表
name sex hobby janson 男 打乒乓球、游泳、看电影 tom 男 打乒乓球、看电影 drop table t1_hobby; create table t1_hobby( name string, sex string, hobby string ) row format delimited fields terminated by ' '; load data local inpath './hivedata/t1_hobby.txt' overwrite into table t1_hobby;
hobby最多3个值,使用hql实现结果如下:
name sex hobby1 hobby2 hobby3 janson 男 打乒乓球 游泳 看电影 tom 男 打乒乓球 看电影
select name,sex,split(hobby,"、")[0] hobby1, split(hobby,"、")[1] hobby2, nvl(split(hobby,"、")[2],"") hobby3 from t1_hobby;
43、需求如下
表t1(注:数据是正常的访问日志数据,分隔符全是空格)
8.35.201.160 - - [16/May/2018:17:38:21 +0800] "GET/uc_server/data/avatar/000/01/54/22_avatar_middle.jpg HTTP/1.1" 200 5396
使用hive的hql实现结果如下:
ip dt url
8.35.201.160 2018-5-16 17:38:21 /uc_server/data/avatar/000/01/54/22_avatar_middle.jpg



