31、一张大表A(上亿条记录)和小表B(几千条记录),如果join出现数据倾斜,有什么解决办法?
略
32、有如下三张表:
表login_a(登录表):
ds user_id 2019-08-06 1 2019-08-06 2 2019-08-06 3 2019-08-06 4 create table login_a( ds date, user_id int ) row format delimited fields terminated by ' '; load data local inpath './hivedata/login_a.txt' overwrite into table login_a;
表read_b(阅读表):
ds user_id read_num 2019-08-06 1 2 2019-08-06 2 3 2019-08-06 3 6 create table read_b( ds date, user_id int, read_num int ) row format delimited fields terminated by ' '; load data local inpath './hivedata/read_b.txt' overwrite into table read_b;
表cost_c(付费表):
ds user_id price 2019-08-06 1 55.6 2019-08-06 2 55.8 create table cost_c( ds date, user_id int, price float ) row format delimited fields terminated by ' '; load data local inpath './hivedata/cost_c.txt' overwrite into table cost_c;
基于上述三张表,请使用hive的hql语句实现如下需求:
(1)、用户登录并且当天有个阅读的用户数,已经阅读书籍数量
select A.ds,count(distinct A.user_id),sum(B.read_num) from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds group by A.ds;
(2)、用户登录并且阅读,但是没有付费的用户数
select A.ds,count(1) from login_a A join read_b B on A.user = B.user_id and A.ds = B.ds left join cost_c C on B.user_id = C.user_id and B.ds = C.ds group by A.ds;
(3)、用户登录并且付费,付费用户数量和金额总数
select A.ds,count(1),sum(price) from login_a A join read_b B on A.user_id = B.user_id and A.ds = B.ds left join cost_c C on B.user_id = C.user_id and B.ds = C.ds group by A.ds;
33、hive的hql中,left outer join和left semi join的区别?
left semi join,我们称之为半开连接。它是left join的一种优化形式,只能查询左表的信息,主要用于解决hive中左表的数据是否存在的问题。
34、有一个订单表和渠道表,结构如下:
create table order( order_id long, user_id long comment '用户id', amount double comment '订单金额', channel string comment '渠道', time string comment '订单时间,yyyy-MM-dd HH:mi:ss' ) partition by (dt string comment '天,yyyy-MM-dd');
请使用hive hql查询出2019-08-06号 每个渠道的下单用户数、订单总金额。
hql语句实现,结果表头如下: channel user_num order_amount
select channel,count(user_id) user_num,sum(amount) order_acount from order where to_date(time) = '2019-08-06' group by channel;
35、考虑表设计和sql查询:
设计数据库表,用来存放学生基本信息,课程信息,学生的课程及成绩,并给出查询语句,查询平均成绩大于85的所有学生。
create table stu_1( id string, name string, age int, addr string ) row format delimited fields terminated by ',' ; create table course_1( cid string, cname string ) row format delimited fields terminated by ',' ; create table course_sc( id string, cid string, score int ) row format delimited fields terminated by ',' ; load data local inpath '/hivedata/course_1.txt' overwrite into table course_1; load data local inpath '/hivedata/stu_1.txt' overwrite into table stu_1; load data local inpath '/hivedata/course_sc.txt' overwrite into table course_sc;
分析:查询平均成绩大于85的所有学生 select cs.id,avg(score) avgscore from course_sc cs group by cs.id having avg(score)>85;



