hive与sql不同之处在于什么操作(hive的sql怎么实现循环)

内容目录

hive中如何完成数据同步，sql中也可用

一、需求说明二、思路讲解三、代码实现

四、需求思考五、问题再次升级六、问题终极升级

hive中如何完成数据同步，sql中也可用一、需求说明

如果有一张表，里面有两列数据，比如

name	score
tom		100
tom		
tom
tom

现在要把数据完成同步，就是当一行有数据的时候，把同一个姓名的后面都加上数据，就变成了

name	score
tom		100
tom		100
tom		100
tom		100

二、思路讲解

之前学习过join，还学习过笛卡尔积，仔细想想，是不是类似于一个笛卡尔积

就是把同一个名字的信息使用一个笛卡尔积，把数据都join上去

tom
tom
tom		---  笛卡尔积join   ---   100
tom
tom

那么，首先我们应该提取出来100

使用distince取出来 tom, 100

然后与数据join，把100加载在名字后即可

三、代码实现

首先取出tom，100

select distinct name
  ,score
from online
where score is not null;

然后与数据表join

select online.name
  ,t1.score score
from online
left join (
    select distinct name
    ,score
    from online
    where score is not null
) t1
on online.name=t1.name

四、需求思考

以上这种解法确实可以得出结果，但是看一下数据表，如果是一个名字有两个分数怎么办

比如：

name	score
tom		100
tom		90
tom

这样会出现什么情况

可以看出来，会一一匹配，但是对于行数不变的要求明显这样是不符合要求的

我们可以使用之前学到的列转行，把多个数据连接在一起

这样，第一步我们获取score的时候，就要这样连接了

select t1.name
  ,concat_ws(',', collect_set(cast(t1.score as string))) score
from (
  select distinct name
    ,score
  from online) t1
group by t1.name

当然，也可以直接这样：

select name
  ,concat_ws(',', collect_set(cast(score as string))) score
from online
group by name;

然后在join就可以了

select online.name
  ,t1.score score
from online
left join (
    select name
  ,concat_ws(',', collect_set(cast(score as string))) score
from online
group by name
) t1
on online.name=t1.name

五、问题再次升级

如果多个name可以解决吗

比如：

name	score
tom		100
tom		90
tom
lilly	95
lilly

测试一下就可以了：

所以这个方式是好用的

六、问题终极升级

有时候我们不想这样更新，我们需要保留到底是哪一个有数据，同步的方式是放到一个新字段里

类似这种

name	score
tom		100		100,90
tom		90		100,90
tom				100,90
lilly	95		95
lilly			95

其实解法是一样的，只需要把结果score放到一个新字段就可以了

select online.name
  ,online.score
  ,t1.score scorez_copy
from online
left join (
    select name
  ,concat_ws(',', collect_set(cast(score as string))) score
from online
group by name
) t1
on online.name=t1.name

hive与sql不同之处在于什么操作(hive的sql怎么实现循环)

大数据系统相关栏目本月热门文章