【无标题】自定义函数

UDF(user defined function)自定义函数 1：1

自定义一个类

继承UDF类

重写evaluate方法

将项目package到hdfs，再导入到hive

Add jar hdfs:///user/mazhichao/..jar

在hive中创建临时函数即可使用

Create temporary function name as ‘包名.方法名’

public class ValueMaskUDF extends UDF{

       public String evaluate(String input,int maxSaveStringLength,String replaceSign) {

             if(input.length()<=maxSaveStringLength){

                    return input;

             }

             return input.substring(0,maxSaveStringLength)+replaceSign;

       }

       public static void main(String[] args) {

             System.out.println(new ValueMaskUDF().evaluate("河北省",2,"..."));

UDAF(user defined aggregation function) 自定义聚合函数 n:1

继承UDAF类

内部定义一个静态类,实现UDAFevaluator接口

实现方法init，iterate,terminatePartial,merge,terminate5个方法

同样都是add jar等

public class DIYCountUDAF extends UDAF {  

    //日志对象初始化,使访类有输出日志的能力

    public static Logger logger=Logger.getLogger(DIYCountUDAF.class);

    

    //静态类实现UDAFevaluator

    public static class evaluator implements UDAFevaluator {  

        //设置成员变量，存储每个统计范围内的总记录数

        private int totalRecords;  

        //初始化函数,map和reduce均会执行该函数,起到初始化所需要的变量的作用

        public evaluator() {  

            init();  

        }  

        //初始化，初始值为0,并日志记录下相应输出

        public void init() {  

            totalRecords = 0;  

            logger.info("init totalRecords="+totalRecords);

        }  

        //map阶段，返回值为boolean类型，当为true则程序继续执行，当为false则程序退出  

        public boolean iterate(String input) {

            //当input输入不为空的时候，即为有值存在,即为存在1行，故做+1操作

            if (input != null) {  

                totalRecords += 1;  

            }  

            //输出当前组处理到第多少条数据了

            logger.info("iterate totalRecords="+totalRecords);

            return true;  

        }  

        

        public int terminatePartial() {  

            logger.info("terminatePartial totalRecords="+totalRecords);

            return totalRecords;  

        }

        

        // reduce 阶段，用于逐个迭代处理map当中每个不同key对应的 terminatePartial的结果

        public boolean merge(int mapOutput) {  

            totalRecords +=mapOutput;  

            logger.info("merge totalRecords="+totalRecords);

            return true;  

        }  

        //处理merge计算完成后的结果，此时的count在merge完成时候，结果已经得出，无需再进一次对整体结果做处理，故直接返回即可

        public int terminate() {  

            logger.info("terminate totalRecords="+totalRecords);

            return totalRecords;  

        }  

    }  

}

UDTF(User-Defined Table-Generating Functions) 自定义 1：n 表格生成

一般常使用lateral view explode+udf来替代

【无标题】自定义函数

大数据系统相关栏目本月热门文章