栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hive cube / rollup / grouping sets/GROUPING

Hive cube / rollup / grouping sets/GROUPING

Hive CUBE / ROLLUP / GROUPING SETS / GROUPING__ID用法详解
  • GROUPING SETS
  • GROUPING__ID(注意这里是两个下划线)
  • CUBE
  • ROLLUP

cube / rollup / grouping sets/GROUPING__ID,经常会被问到这几个函数的区别,今天就好好整理一下。

GROUPING SETS

grouping sets用法是给group by做辅助作用的函数,在SQL中的位置是跟在group by后面。grouping sets括号中的参数必须是group by后面的字段:group by 字段1,字段2 grouping sets(字段1,字段2)。使用grouping sets的区别在于,group by会根据括号中的字段进行分组去重,然后根据select中的聚合函数对字段聚合,而加上grouping sets相当于把括号中的字段分别进行上述group by操作,然后进行union all聚合。我们来实际操作举例说明一下:

  • 准备数据
    select * from tab1;
    

结果:

monthdayvalues
2015-032015-03-10value1
2015-032015-03-10value5
2015-032015-03-12value7
2015-042015-04-12value3
2015-042015-04-13value2
2015-042015-04-13value4
2015-042015-04-16value4
2015-032015-03-10value2
2015-032015-03-10value3
2015-042015-04-12value5
2015-042015-04-13value4
2015-042015-04-15value3
2015-042015-04-15value2
2015-042015-04-16value1

使用grouping sets

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
FROM tab1
GROUP BY month,day 
GROUPING SETS (month,day) ;

等价于

SELECt month,NULL,COUNT(DISTINCT values) AS uv FROM tab1 GROUP BY month 
UNIOn ALL 
SELECt NULL,day,COUNT(DISTINCT values) AS uv FROM tab1 GROUP BY day;

输出结果都是一样的

monthdayuv
2015-03NULL5
2015-04NULL6
NULL2015-03-104
NULL2015-03-121
NULL2015-04-122
NULL2015-04-133
NULL2015-04-152
NULL2015-04-162
GROUPING__ID(注意这里是两个下划线)

GROUPING__ID,表示结果属于哪一个分组集合,一般和GROUPING SETS一起使用。
例如:

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID 
FROM tab1
GROUP BY month,day 
GROUPING SETS (month,day) 
ORDER BY GROUPING__ID;

查询结果:

monthdayuvGROUPING__ID
2015-03NULL51
2015-04NULL61
NULL2015-03-1042
NULL2015-03-1212
NULL2015-04-1222
NULL2015-04-1332
NULL2015-04-1522
NULL2015-04-1622

也可以这样查询

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID 
FROM tab1
GROUP BY month,day 
GROUPING SETS (month,day,(month,day)) 
ORDER BY GROUPING__ID;

等价于

SELECt month,NULL,COUNT(DISTINCT values) AS uv,1 AS GROUPING__ID FROM tab1 GROUP BY month 
UNIOn ALL 
SELECt NULL,day,COUNT(DISTINCT values) AS uv,2 AS GROUPING__ID FROM tab1 GROUP BY day
UNIOn ALL 
SELECt month,day,COUNT(DISTINCT values) AS uv,3 AS GROUPING__ID FROM tab1 GROUP BY month,day

查询结果

monthdayuvGROUPING__ID
2015-03NULL51
2015-04NULL61
NULL2015-03-1042
NULL2015-03-1212
NULL2015-04-1222
NULL2015-04-1332
NULL2015-04-1522
NULL2015-04-1622
2015-032015-03-1043
2015-032015-03-1213
2015-042015-04-1223
2015-042015-04-1333
2015-042015-04-1523
2015-042015-04-1623
CUBE

根据GROUP BY的维度的所有组合进行聚合。

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID 
FROM tab1
GROUP BY month,day 
WITH CUBE 
ORDER BY GROUPING__ID;

等价于

SELECt NULL,NULL,COUNT(DISTINCT values) AS uv,0 AS GROUPING__ID FROM tab1
UNIOn ALL 
SELECt month,NULL,COUNT(DISTINCT values) AS uv,1 AS GROUPING__ID FROM tab1 GROUP BY month 
UNIOn ALL 
SELECt NULL,day,COUNT(DISTINCT values) AS uv,2 AS GROUPING__ID FROM tab1 GROUP BY day
UNIOn ALL 
SELECt month,day,COUNT(DISTINCT values) AS uv,3 AS GROUPING__ID FROM tab1 GROUP BY month,day

查询结果

monthdayuvGROUPING__ID
NULLNULL70
2015-03NULL51
2015-04NULL61
NULL2015-04-1222
NULL2015-04-1332
NULL2015-04-1522
NULL2015-04-1622
NULL2015-03-1042
NULL2015-03-1212
2015-032015-03-1043
2015-032015-03-1213
2015-042015-04-1623
2015-042015-04-1223
2015-042015-04-1333
2015-042015-04-1523

这里的第一条数据就相当于没有group by任何字段,直接进行了聚合操作的结果。

ROLLUP

是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

比如,以month维度进行层级聚合

SELECt 
	month,
	day,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID  
FROM tab1
GROUP BY month,day
WITH ROLLUP 
ORDER BY GROUPING__ID;

可以实现这样的上钻过程:
总UV->月的UV->月天的UV

monthdayuvGROUPING__ID
NULLNULL70
2015-03NULL51
2015-04NULL61
2015-032015-03-1043
2015-032015-03-1213
2015-042015-04-1223
2015-042015-04-1333
2015-042015-04-1523
2015-042015-04-1623

从查询结果我们可以看出,GROUPING__ID为0的就是总UV;GROUPING__ID为1的为月的uv;GROUPING__ID为3的是月+天uv

把month和day调换顺序,则以day维度进行层级聚合

SELECt 
	day,
	month,
	COUNT(DISTINCT values) AS uv,
	GROUPING__ID  
FROM tab1
GROUP BY day,month 
WITH ROLLUP 
ORDER BY GROUPING__ID;

可以实现这样的上钻过程:
总UV->天的UV->天月的UV

daymonthuvGROUPING__ID
NULLNULL70
2015-04-13NULL31
2015-03-12NULL11
2015-04-15NULL21
2015-03-10NULL41
2015-04-16NULL21
2015-04-12NULL21
2015-04-122015-0423
2015-03-102015-0343
2015-03-122015-0313
2015-04-132015-0433
2015-04-152015-0423
2015-04-162015-0423

从查询结果我们可以看出,GROUPING__ID为0的就是总UV;GROUPING__ID为1的为天的uv;GROUPING__ID为3的是天+月uv

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/698936.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号