栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hive SQL实现数组中map的拆分以及计算

Hive SQL实现数组中map的拆分以及计算

一、需求背景

  1. 本次任务主要是完成对表中的每一行数据进行求和计算。
  2. 涉及的表数据如下所示:
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:2","Modata:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:1","Mt:0","MoSig:26","Modata:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:2","MoSig:15","Modata:8","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:4","Modata:0","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:2","Modata:3","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:9","Modata:3","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:0","MoSig:0","Modata:0","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:1","Mt:13","MoSig:2","Modata:27","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:0","Modata:2","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]
["Emc:0","MoVoiceCall:0","Mt:1","MoSig:4","Modata:1","HighPri:0","MoVideoCall:0","MoSms:0","MpsPri:0","McsPri:0"]

二、实现方法

(一) 通过对本次的需求任务的解读,将实现方法大概分为以下几个步骤:

  1. 完成数组中括号[]的去除工作。
    ·主要使用的是hive中的正则表达式替换函数:regexp_replace来完成本次的替换;
    ·语法: regexp_replace(string A, string B, string C)
    ·返回值: string
select regexp_replace(SalaryArray,'"|\[|\]|\{|\}','') from 5gr1126;

效果图展示:

  1. 我们将每一行的字符串转换成map形式
    ·我们通过使用hive中的str_to_map函数来完成本次的转换操作;
    ·语法:str_to_map(字符串参数, 分隔符1, 分隔符2)
    ·返回值:使用两个分隔符将文本拆分为键值对
select str_to_map(regexp_replace(SalaryArray,'"|\[|\]|\{|\}','')) from 5gr1126;

效果图展示:

  1. 获取map key-value对中的所有value值
    ·我们通过hive中的map_values函数来完成本次的获取值操作;
    ·语法:map_values(map)
    ·返回值:包含输入映射值的无序数组
select map_values(str_to_map(regexp_replace(SalaryArray,'"|\[|\]|\{|\}',''))) from 5gr1126;

效果图展示:

  1. 每一行数值累加求和
    ·我们通过hive UDF自定义函数来完成本次的数值累加求和
    ·输入参数:数组列表,开始位置,结束位置
    ·输出结果:最终的和
    ·逻辑主要包含对科学技术法数值的转换以及是否为数值的过滤
public class AddNumUDF extends UDF {
    public static void main(String[] args) {//测试用例
        ArrayList strArray = new ArrayList();
        strArray.add("1E2");
        strArray.add("a");
        System.out.println(new AddNumUDF().evaluate(strArray, 0, 2));
    }
    public double evaluate(ArrayList list,
                           int from, int to) {
        double result = 0;
        if (list == null || list.size() < 1) {
            return result;
        }
        List subList;
        if (to == -1) {
            subList = list.subList(from, list.size());
        } else {
            subList = list.subList(from, to);
        }
        for (String i : subList) {//循环遍历
            if (i.contains("e") || i.contains("E") || isInt(i)) {
                BigDecimal bd = new BigDecimal(i);
                String str = bd.toPlainString();
                Double a = Double.parseDouble(str);
                result += a;
            } else {
                i = "0";
                BigDecimal bd = new BigDecimal(i);
                String str = bd.toPlainString();
                Double a = Double.parseDouble(str);
                result += a;
            }
        }
        return result;
    }
    
   	public static boolean isInt(String v) {
        if (v == null)
            return false;
        String regEx1 = "[\-|\+]?\d+";
        Pattern p;
        Matcher m;
        p = Pattern.compile(regEx1);
        m = p.matcher(v);
        if (m.matches()) return true;else return false;
    }
}
  1. 程序打成jar包,上传HS2服务器本地或者HDFS;
  2. 客户端命令行添加jar包到hive的classpath:hive>add jar /xxx/udf.jar;
  3. 注册成为临时函数(给udf命名),create temporary function 函数名 as ‘UDF类全名称’;
  4. 测试
    效果图如下所示:
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/613344.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号