【题目】:
现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
请输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数:
【解题思路】:
Keywords:活跃用户数、新增用户数、截止当月单月新增最大、截止当月的累积用户数
- 活跃用户数:每月有提交记录的用户
COUNT(DISTINCT uid) AS mau
- 新增用户数:到该月为用户最早记录所在月
# 最早记录 MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMAT(start_time,'%Y%m')) AS first_month # 最早记录等于开始作答即为该月新增 COUNT(DISTINCT CASE WHEN first_month = start_time THEN uid ELSE Null END) AS month_add_uv
- 截止当月单月新增最大和累积:窗口函数的排序参数,不需要分组参数加partition by
MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
【问题拆解】:
反向思考由结果出发,要想计算截止当月的最大新增用户数和累计用户数:
需要先计算每月的新增用户数,然后进行窗口函数排序操作实现截止效果新增用户 = 用户最早记录月份就为当月,就需要计算用户最早答题记录月份
因此,顺序为:
最早答题记录月份 —> 计算新增用户(最早答题记录等于当月) —> 计算截止当月的(窗口函数排序操作) —> 拼接
最早答题记录月份,记为t1
SELECt *, DATE_FORMAT(start_time,'%Y%m') AS start_month, MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS first_month FROM exam_record
计算新增用户,记为t2
SELECt start_month, COUNT(DISTINCT uid) AS mau, COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv FROM t1 GROUP BY start_month
计算截止当月的
SELECt start_month,mau,month_add_uv, MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv, SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv FROM t2
拼接
WITH t1 AS
(
SELECt *,
DATE_FORMAT(start_time,'%Y%m') AS start_month,
MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS first_month
FROM exam_record
)
SELECt start_month,mau,month_add_uv,
MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv,
SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
FROM(
SELECt start_month,
COUNT(DISTINCT uid) AS mau,
COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv
FROM t1
GROUP BY start_month
)
2.试卷完成数同比2020年的增长率及排名变化
【题目】:
请计算2021年上半年各类试卷的做完次数相比2020年上半年同期的增长率(百分比格式,保留1位小数),以及做完次数排名变化,按增长率和21年排名降序输出
结果输出如下:
【解题思路】:
Keywords:上半年、增长率、排名变化
- 上半年的筛选:将日期转换成月份,筛选1月到6月的数据
日期函数DATE_FORMAT(submit,'%Y-%m) BETWEEN '2020-01' AND '2020-06
- 增长率的计算:growth_rate = (exam_cnt_21 - exam_cnt_20) - exam_cnt_20排名变化的计算:rank_delta = exam_cnt_rank_21 - exam_cnt_rank_20
问题拆解:
增长率和排名变化的计算都需要用到exam_cnt_21和exam_cnt_20(20年和21年各类试卷完成次数),因此需要先筛选出来筛选完之后,计算各类试卷增长率及排名通过排名计算排名变化,并形成最终输出
细节:
计算排名需要用到窗口函数RANK() OVER(ORDER BY exam_cnt_20 DESC),这里是对整个数据集范围内排名,因此不需要分组参数PARTITION BY增长率为百分数,小数位为1,这里需要用到CONCAT()和COUNT():
ROUND((exam_cnt_21 - exam_cnt_20) / exam_cnt_20 * 100) # 记为a CONCAT(a,'%') # 拼接,记为b
两年完成次数相同时,会产生空值,需要IFNULL()进行处理
IFNULL(b,0)
【完整代码】:
第一步 筛选出各类试卷完成次数
SELECt tag, SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20, SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21 FROM exam_record LEFT JOIN examination_info USING(exam_id) GROUP BY tag
第二步,计算各类试卷增长率及排名,记为t1
SELECt tag,
exam_cnt_20, exam_cnt_21,
IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
FROM(
SELECt tag,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
FROM exam_record
LEFT JOIN examination_info USING(exam_id)
GROUP BY tag
) t1
通过排名计算降序,形成最终输出
SELECt tag,
exam_cnt_20, exam_cnt_21,
growth_rate,
exam_cnt_rank_20, exam_cnt_rank_21,
CAST(exam_cnt_rank_21 AS SIGNED) - CAST(exam_cnt_rank_20 AS SIGNED) AS rank_delta
FROM(
SELECt tag,
exam_cnt_20, exam_cnt_21,
IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
FROM(
SELECt tag,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
FROM exam_record
LEFT JOIN examination_info USING(exam_id)
GROUP BY tag
) t1
) t2
WHERe exam_cnt_20 != 0 AND exam_cnt_21 != 0
ORDER by growth_rate DESC, exam_cnt_21 DESC



