栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hive练习题之流量统计

Hive练习题之流量统计

需求: 统计每个用户各时间段的流量总计(相隔不超过10分钟的算在一起)

数据源:

1,2020-02-18 14:20:30,2020-02-18 14:46:30,20
1,2020-02-18 14:47:20,2020-02-18 15:20:30,30
1,2020-02-18 15:37:23,2020-02-18 16:05:26,40
1,2020-02-18 16:06:27,2020-02-18 17:20:49,50
1,2020-02-18 17:21:50,2020-02-18 18:03:27,60
2,2020-02-18 14:18:24,2020-02-18 15:01:40,20
2,2020-02-18 15:20:49,2020-02-18 15:30:24,30
2,2020-02-18 16:01:23,2020-02-18 16:40:32,40
2,2020-02-18 16:44:56,2020-02-18 17:40:52,50
3,2020-02-18 14:39:58,2020-02-18 15:35:53,20
3,2020-02-18 15:36:39,2020-02-18 15:24:54,30

建表

CREATE TABLE IF NOT EXISTS test.tb_flow(
uid int,
start_time timestamp,
end_time timestamp,
flow double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";

创建并导入数据

vim /doit16/flow.txt 
LOAD DATA LOCAL INPATH "/doit16/flow.txt" INTO TABLE test.tb_flow;

解法

select 
  uid,
  min(start_time) as start_time,
  max(end_time) as end_time,
  sum(flow) as total_flow
  from 
(
    select 
    uid,
    start_time,
    end_time,
    flow,
    sum(cn) over(partition by uid order by start_time) as flag
    from 
      (
       select 
       uid,
       start_time,
       end_time,
       flow,
       if ((unix_timestamp(start_time) - unix_timestamp(lag_time) )  / 60 > 10 ,1,0) as cn
       from
          (
           select 
           uid,
           start_time,
           end_time,
           flow,
           lag(end_time,1,start_time) over(partition by uid order by start_time) as lag_time
           from tb_flow) t1 ) t2 ) t3
group by uid, flag

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/735813.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号