栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Sql之面试题总结

Sql之面试题总结

1. 每月及截止当月的答题情况

【题目】:

现有试卷作答记录表exam_record(uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分)
请输出自从有用户作答记录以来,每月的试卷作答记录中月活用户数、新增用户数、截止当月的单月最大新增用户数、截止当月的累积用户数:

【解题思路】:

Keywords:活跃用户数、新增用户数、截止当月单月新增最大、截止当月的累积用户数

    活跃用户数:每月有提交记录的用户
COUNT(DISTINCT uid) AS mau
    新增用户数:到该月为用户最早记录所在月
# 最早记录
MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMAT(start_time,'%Y%m'))	AS first_month

# 最早记录等于开始作答即为该月新增
COUNT(DISTINCT CASE WHEN first_month = start_time THEN uid ELSE Null END) AS month_add_uv
    截止当月单月新增最大和累积:窗口函数的排序参数,不需要分组参数加partition by
MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv

SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv

【问题拆解】:

反向思考由结果出发,要想计算截止当月的最大新增用户数和累计用户数:

需要先计算每月的新增用户数,然后进行窗口函数排序操作实现截止效果新增用户 = 用户最早记录月份就为当月,就需要计算用户最早答题记录月份

因此,顺序为:

最早答题记录月份 —> 计算新增用户(最早答题记录等于当月) —> 计算截止当月的(窗口函数排序操作) —> 拼接

最早答题记录月份,记为t1

SELECt *,
	   DATE_FORMAT(start_time,'%Y%m') AS start_month,
	   MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS 		first_month
FROM exam_record

计算新增用户,记为t2

SELECt start_month,
	   COUNT(DISTINCT uid) AS mau,
	   COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv
FROM t1
GROUP BY start_month

计算截止当月的

SELECt start_month,mau,month_add_uv,
	   MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv,
	   SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
FROM t2

拼接

WITH t1 AS
(
    SELECt *,
	   DATE_FORMAT(start_time,'%Y%m') AS start_month,
	   MIN(DATE_FORMAT(start_time,'%Y%m')) OVER(PARTITION BY uid ORDER BY DATE_FORMTA(start_time,'%Y%m')) AS 		first_month
FROM exam_record
)

SELECt start_month,mau,month_add_uv,
	   MAX(month_add_uv) OVER(ORDER BY start_month) AS max_month_add_uv,
	   SUM(month_add_uv) OVER(ORDER BY start_month) AS cum_sum_uv
FROM(
    SELECt start_month,
           COUNT(DISTINCT uid) AS mau,
           COUNT(DISTINCT CASE WHEN first_month = start_month THEN uid ELSE Null END) AS month_add_uv
    FROM t1
    GROUP BY start_month
	)
2.试卷完成数同比2020年的增长率及排名变化

【题目】:

请计算2021年上半年各类试卷的做完次数相比2020年上半年同期的增长率(百分比格式,保留1位小数),以及做完次数排名变化,按增长率和21年排名降序输出

结果输出如下:

【解题思路】:

Keywords:上半年、增长率、排名变化

    上半年的筛选:将日期转换成月份,筛选1月到6月的数据

日期函数DATE_FORMAT(submit,'%Y-%m) BETWEEN '2020-01' AND '2020-06

    增长率的计算:growth_rate = (exam_cnt_21 - exam_cnt_20) - exam_cnt_20排名变化的计算:rank_delta = exam_cnt_rank_21 - exam_cnt_rank_20

问题拆解:

增长率和排名变化的计算都需要用到exam_cnt_21和exam_cnt_20(20年和21年各类试卷完成次数),因此需要先筛选出来筛选完之后,计算各类试卷增长率及排名通过排名计算排名变化,并形成最终输出

细节:

计算排名需要用到窗口函数RANK() OVER(ORDER BY exam_cnt_20 DESC),这里是对整个数据集范围内排名,因此不需要分组参数PARTITION BY增长率为百分数,小数位为1,这里需要用到CONCAT()和COUNT():

ROUND((exam_cnt_21 - exam_cnt_20) / exam_cnt_20 * 100) # 记为a
CONCAT(a,'%')  # 拼接,记为b

两年完成次数相同时,会产生空值,需要IFNULL()进行处理

IFNULL(b,0) 

【完整代码】:

第一步 筛选出各类试卷完成次数

SELECt tag,
	   SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
	   SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
FROM exam_record
LEFT JOIN  examination_info USING(exam_id)
GROUP BY tag

第二步,计算各类试卷增长率及排名,记为t1

SELECt tag,
	   exam_cnt_20, exam_cnt_21,
	   IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
	   RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
	   RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
FROM(
	SELECt tag,
           SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
           SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
    FROM exam_record
    LEFT JOIN  examination_info USING(exam_id)
    GROUP BY tag	
	) t1

通过排名计算降序,形成最终输出

SELECt tag,
	   exam_cnt_20, exam_cnt_21,
	   growth_rate,
	   exam_cnt_rank_20, exam_cnt_rank_21,
	   CAST(exam_cnt_rank_21 AS SIGNED) - CAST(exam_cnt_rank_20 AS SIGNED) AS rank_delta
FROM(
	SELECt tag,
           exam_cnt_20, exam_cnt_21,
           IFNULL(CONCAT(ROUND((exam_cnt_21-exam_cnt_20)/exam_cnt_20 * 100),'%'),0) AS growth_rate,
           RANK() OVER(ORDER BY exam_cnt_20 DESC) AS exam_cnt_rank_20,
           RANK() OVER(ORDER BY exam_cnt_21 DESC) AS exam_cnt_rank_21
    FROM(
        SELECt tag,
               SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2020-01' AND '2020-06',1,0)) AS exam_cnt_20,
               SUM(IF(DATE_FORMAT(submit_time,'%Y-%m') BETWEEN '2021-01' AND '2021-06',1,0)) AS exam_cn_21
        FROM exam_record
        LEFT JOIN  examination_info USING(exam_id)
        GROUP BY tag	
        ) t1
	) t2 
WHERe exam_cnt_20 != 0 AND exam_cnt_21 != 0
ORDER by growth_rate DESC, exam_cnt_21 DESC
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/761753.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号