数据分析sql基础整理

大家有啥疑问或者本编辑错误的地方欢迎前来讨论

时间

时间戳转换时间格式函数:一般时间戳为十位数，如果其他位请转换为十位数
SPARK: from_unixtime(1234567890,‘yyyy-MM-dd’)
Presto:format_datetime(from_unixtime(1234567890),‘yyyy-MM-dd’)
时间格式转换时间戳:tounixtime(‘2023-03-08’)
获取昨天时间：
SPARK：current - 1
Presto：date_sub(current_date,1)
datediff(‘2022-02-03’,‘2022-02-01’): 返回第一个日期减去第二个日期的天数结果为2

常用

count，sum，avg，max，min：统计个数，总和，平均值，最大值，最小值
count(id),sum(score),avg(score),
count(1): 统计一共多少条数据
count(id): 统计一共多少条id不为空的数据
concat(a,b,c): 当a和b和c都是字符串时把a,b,c拼接在一起
concat_ws(’-’,year,month,day): 把year,month,day用-拼接在一起，结果2020-02-23
percentile_approx(comment_num, 0.5)：取百分位数，取评论数五十分位值
like，not like，rlike，模糊匹配符合条件的，不符合条件的，多个条件的
content_text like '苹果%'匹配苹果开头的content_text字段，返回true，false
content_text not like '苹果%‘匹配不以苹果开头的content_text字段，返回true，false
content_text rlike ‘苹果|香蕉|橘子’匹配content_text 包含苹果香蕉橘子的字段，返回true，false
regexp_extract(content_text,‘你好啊(.*？)666’,1): 匹配旁边是「你好啊()666」的字符，返回()括号里的字符，更多使用看https://www.runoob.com/regexp/regexp-syntax.html
split(‘a-b’,’-’) as a: 分割字符串返回[a,b],取第一个数据a[1]
substr(‘abcdefg’,1,3): 截取字符串返回abc，第一个到第三个字符串
substring(‘abcdefg’): 截取字符串返回cdefg，第三个到最后的字符串
if(a=b,a,null): if函数如果a=b返回a，否则为空，一般count(if())使用
case
when a>1000 then ‘大于1000’
when a>500 then ‘500到1000’
when a>300 then ‘300到500’
else ‘小于300’
end as price ：
case when分条件判断语句
group by分组函数
select
day,
count(num) as num
from table
group by
day
一般加聚合函数以后，没有聚合的字段需要group by一下，比如这里的day
order by 排序函数，desc从小到大，默认是从大到小asc

连表

left join
当条件成立时右表加入左表生成新的数据表。左表数据全部保留，右表数据当符合条件时保留下来加入左表，不符合条件则为空。当左表，或右表连接条件不唯一时会出现重复数据。
inner join
当条件成立时右表加入左表生成新的数据表，左表和右表仅保留符合条件数据。当左表，或右表连接条件不唯一时会出现重复数据。
union all
上下连接连接条件：上表和下表表字段必须一致

数组

contains(x, element)：presto判断x是否在element中
array_contains(array(‘fasd’,‘gfafa’),‘fasd’) spark判断fasd是否在数组中
concat(’-’,array())拼接数组里的字符串
array_union(array1,array2)返回数组1和数组2去重后的数据
select
p_date,
concat_ws(’,’,collect_set(content_text)) as content_text
from table
group by
p_date
spark文字聚合把每日的content_text内容放在一个字段里
select
p_date,
array_join(array_distinct(array_agg(content_text)), ‘,’) as content_text
from table
group by
p_date
presto文字聚合把每日的content_text内容放在一个字段里
spark数据操作更多请参考http://help.guandata.com/hc/kb/article/1522345/
presto请参考https://prestodb.io/docs/current/functions/array.html?highlight=array

老铁们新手求点赞，双击666

数据分析sql基础整理

大数据系统相关栏目本月热门文章