26、请使用sql计算pv、uv?
数据:
t1表
uid dt url 1 2019-08-06 http://www.baidu.com 2 2019-08-06 http://www.baidu.com 3 2019-08-06 http://www.baidu.com 3 2019-08-06 http://www.soho.com 3 2019-08-06 http://www.meituan.com 3 2019-08-06
结果如下:
dt uv pv 2019-08-6 3 5
create table user_net_log( uid int, dt date, url string ) row format delimited fields terminated by ' '; load data local inpath './hivedata/user_net_log.txt' overwrite into table user_net_log;
select dt,count(distinct uid) uv,count(url) pv from user_net_log group by dt; rusule: Total MapReduce CPU Time Spent: 0 msec OK NULL 1 0 2019-08-06 3 5
27、hive中coalease()、nvl()、concat_ws()、collect_list()、collect_set()、regexp_replace().这几个函 数的意义?
略
28、有表如下记录了智智同学玩某moba游戏每局的英雄pick情况
pk_moba表
id names 1 亚索,挖据机,艾瑞莉亚,洛,卡莎 2 亚索,盖伦,奥巴马,牛头,皇子 3 亚索,盖伦,艾瑞莉亚,宝石,琴女 4 亚索,盖伦,赵信,老鼠,锤石
请用HiveSQL计算出出场次数最多的top3英雄及其Pick率(=出现场数/总场数)
create table pk_moba( id int, names array) row format delimited fields terminated by 't' collection items terminated by ','; load data local inpath './hivedata/pk_moba.txt' overwrite into table pk_moba;
分析:计算出出场次数最多的英雄及pick率 select name,cnt,rn,concat(round(cnt/4*100,0),"%") lv from ( select name,cnt,dense_rank() over(sort by cnt desc) rn from( select name,count(name) cnt from pk_moba lateral view explode(names) tmp as name group by name )A )B where rn<4; ressult: 亚索 4 1 100.0% 盖伦 3 2 75.0% 艾瑞莉亚 2 3 50.0%
29、有三个表,分别是:
区域(district) 区域中有两个字段分别是区域Id(disid)和区域名称(disname)
城市(city) 城市有两个字段分别是城市ID(cityid)和区域ID(disid)
订单(order) 订单有四个字段分别是订单ID(orderid)、用户ID(userid)、城市ID(cityid)和消费金额(amount)。
district表:
disid disname 1 华中 2 西南 create table district( disid int, disname string ) row format delimited fields terminated by ' '; load data local inpath './hivedata/district.txt' overwrite into table district;
city表:
cityid disid 1 1 2 1 3 2 4 2 5 2 create table city( cityid int, disid int ) row format delimited fields terminated by ' '; load data local inpath './hivedata/city.txt' overwrite into table city;
order表:
oid userid cityid amount 1 1 1 1223.9 2 1 1 9999.9 3 2 2 2322 4 2 2 8909 5 2 3 6789 6 2 3 798 7 3 4 56786 8 4 5 78890 create table order_t( oid int, userid int, cityid int, amount float ) row format delimited fields terminated by ' '; load data local inpath './hivedata/order.txt' overwrite into table order_t;
高消费者是消费金额大于1W的用户,使用hive hql生成如下报表:
区域名 高消费者人数 消费总额
分析: select disname,count(1),sum(amount) from( select disname,userid,sum(amount) amount from district A join city B on A.disid = B.disid join order_t C on B.cityid = C.cityid group by disname,userid )D where amount>10000 group by disname; result: 华中 2 22454.800415039062 西南 2 135676.0
30、某APP每天访问数据存放在表access_log里面,包含日期字段ds,用户类型字段user_type,用户账 号user_id,用户访问时间log_time,请使用hive的hql语句实现如下需求:
(1)、每天整体的访问UV、PV?
select log_time,count(distinct user_id) uv,count(1) pv from access_log group by log_time;
(2)、每天每个类型的访问UV、PV?
select log_time,user_type,count(distinct user_id) uv,count(1) pv from access_log group by log_time;
(3)、每天每个类型中最早访问时间和最晚访问时间?
select log_time,user_type,min(log_time),max(log_time) from access_log group by log_time;
(4)、每天每个类型中访问次数最高的10个用户?
select dt,user_type,cnt,rn from( select dt,user_type,cnt,row_number() over(partition by dt,user_type order by cnt desc) from (select substr(log_time,1,7) dt,user_type,count(1) cnt from access_log group by substr(log_time,1,7),user_type )A )B where rn<11;



