栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

mysql/hive求实际活动时间

mysql/hive求实际活动时间

背景:

求每个品牌方活动实际持续时间,可能出现的情况有:

1)一个品牌方有多个重叠活动

2)一个品牌方有多个不重叠的活动,活动之间的断层不纳入总时间

数据格式:
tm_name  sdt        edt
xiaomi	2022-01-06	2022-01-09
xiaomi	2022-01-11	2022-01-15
xiaomi	2022-01-17	2022-01-20
huawei	2022-01-06	2022-01-09
huawei	2022-01-08	2022-01-15
meizu	2022-01-06	2022-01-20
meizu	2022-01-09	2022-01-15
meizu	2022-01-12	2022-01-25
TCL	2022-01-06	2022-01-20
TCL	2022-01-09	2022-01-12
TCL	2022-01-15	2022-01-25
建表语句 :
-- 建表
DROp TABLE IF EXISTS `trademark`;
CREATE TABLE `trademark`  (
  `tm_name` varchar(255) CHARACTER SET gbk COLLATE gbk_chinese_ci NULL DEFAULT NULL,
  `sdt` date NULL DEFAULT NULL,
  `edt` date NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = gbk COLLATE = gbk_chinese_ci ROW_FORMAT = Dynamic;

-- 数据装填
INSERT INTO `trademark` VALUES ('xiaomi', '2022-01-06', '2022-01-09');
INSERT INTO `trademark` VALUES ('xiaomi', '2022-01-11', '2022-01-15');
INSERT INTO `trademark` VALUES ('xiaomi', '2022-01-17', '2022-01-20');
INSERT INTO `trademark` VALUES ('huawei', '2022-01-06', '2022-01-09');
INSERT INTO `trademark` VALUES ('huawei', '2022-01-08', '2022-01-15');
INSERT INTO `trademark` VALUES ('meizu', '2022-01-06', '2022-01-20');
INSERT INTO `trademark` VALUES ('meizu', '2022-01-09', '2022-01-15');
INSERT INTO `trademark` VALUES ('meizu', '2022-01-12', '2022-01-25');
INSERT INTO `trademark` VALUES ('TCL', '2022-01-06', '2022-01-20');
INSERT INTO `trademark` VALUES ('TCL', '2022-01-09', '2022-01-12');
INSERT INTO `trademark` VALUES ('TCL', '2022-01-15', '2022-01-25');
具体步骤:

1、在写有时间重叠情况时的sql时,基本都是采用将开始时间和结束时间打散union在一起,如同求最大在线人数的思路一样,此处flag标记在下面的操作会体现它的作用,注意union all和union的区别:

select tm_name,sdt dt,'1' flag from trademark
union all
select tm_name,edt dt,'-1' flag from trademark

效果:

2、将union在一起的数据进行开窗处理,条件就是按品牌分组,累积flag求日期是否重叠,若是重叠的日期sum(1)会一直变大,而不重叠的话相当于开始之后紧跟着结束sum(1)与sum(-1)相当于0,用此标志来判断是否断层,而用lead是来获取下一个时间,用来标记断层的时间范围:

with t as(
select tm_name,sdt dt,'1' flag from trademark
union all
select tm_name,edt dt,'-1' flag from trademark)
select tm_name,dt,
	sum(flag) over(partition by tm_name order by dt) f1,
	lead(dt,1,dt) over(partition by tm_name order by dt) last_dt
from t 

 效果:

如上图所示,xiaomi出现了2个断层,分别是【01-09,01-11】和【01-15,01-17】。而标记这一行的f1列数据正好都是0。 

3、用每个品牌最晚结束的时间 - 最开始的时间 - 断层的时间 = 实际活动时间

with t as(
select tm_name,sdt dt,'1' flag from trademark
union all
select tm_name,edt dt,'-1' flag from trademark)
select
	tm_name,
	datediff(max(dt),min(dt))-sum(if(f1=0,datediff(last_dt,dt)-1,0)) real_time
from (
	select tm_name,dt,
		sum(flag) over(partition by tm_name order by dt) f1,
		lead(dt,1,dt) over(partition by tm_name order by dt) last_dt
	from t ) t1 group by tm_name
扩展法二:
with t as(
select
	tm_name,
	#若开始时间小于当前最大的结束时间就用结束时间+1作为开始时间,避免时间叠加
	if(max_edt is not null and sdt<=max_edt,date_add(max_edt,interval 1 day),sdt) sdt,
	edt
from (
select
	*,
	#取出当前数据前面最大的结束时间
	max(edt) over(partition by tm_name order by sdt rows between unbounded preceding and 1 preceding) max_edt
from trademark) t1)
select 
	tm_name,
  #因为上面改了sdt,所以会导致某些重叠数据sdt>edt,这里要过滤掉
	sum(if(datediff(edt,sdt)<0,0,datediff(edt,sdt)+1)) real_time
from t
group by tm_name
结果:

mysql 

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/701594.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号