自定义函数类别分为以下三种:
(1)UDF 一进一出
(2)UDAF 聚集函数,多进一出 类似于:count/max/min
(3)UDTF 一进多出
本文主要介绍UDF,比较简单,直接上代码。
2.新建项目新建一个idea的maven的java项目,自行百度就可以了,也可以参考下面的
idea-新建Maven Java工程 - MmeChan - 博客园
3.pom.xml文件配置:4.第一个UDForg.apache.hive hive-exec3.1.0 org.apache.hadoop hadoop-common3.1.0
官方例子,HelloUDF.java,将输入字符串填加一个"Hello":
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
@Description(
name = "hello",
value = "_FUNC_(str) - from the input string"
+ "returns the value that is "Hello $str" ",
extended = "Example:n"
+ " > SELECt _FUNC_(str) FROM src;"
)
public class HelloUDF extends UDF {
public String evaluate(String str){
try {
return "Hello " + str;
} catch (Exception e) {
// TODO: handle exception
e.printStackTrace();
return "ERROR";
}
}
}
5.第二个UDF例子:
EncryptUDFMd5.java,生成一个加盐的MD5字符串(加盐,能防止加密符被撞库破解),实现很简单,就是在原生MD5函数的基础上,改一下,在加密之前拼上一个salt值:
import org.apache.commons.codec.binary.Hex;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
@Description(name = "emd5",
value = "_FUNC_(str) - Calculates an MD5 128-bit checksum for the string.",
extended = "The value is returned as a string of 32 hex digits.\n" +
"emd5('ABC')")
public class EncryptUDFMd5 extends UDF {
private final Text result = new Text();
private final MessageDigest digest;
private final String salt = "hubg";
public EncryptUDFMd5() {
try {
digest = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException(e);
}
}
public Text evaluate(Text n) {
String nStr;
if (n == null) {
nStr="null"+salt;
}else{
nStr = n.toString()+salt;
}
digest.reset();
digest.update(nStr.getBytes(), 0, nStr.length());
byte[] md5Bytes = digest.digest();
String md5Hex = Hex.encodeHexString(md5Bytes);
result.set(md5Hex);
return result;
}
}
6.打包上传服务器
打包 maven package
上传到hadoop客户机,放到/tmp/目录下
/tmp/MyHiveUdf-1.0-SNAPSHOT.jar
7.执行进入hive命令行:
hive
执行hql命令:
-- 加载jar包
add jar /tmp/MyHiveUdf-1.0-SNAPSHOT.jar;
-- 创建临时函数
create TEMPORARY function hello as 'com.yixin.hubg.udf.HelloUDF';
create TEMPORARY function emd5 as 'com.yixin.hubg.udf.EncryptUDFMd5';
-- 执行udf函数
select hello('hubg'),emd5('aaa');
结束,就这么简单!!



