- 安装thrift
wget https://mirrors.cnnic.cn/apache/thrift/0.19.0/thrift-0.12.0.tar.gz tar -zxvf thrift-0.12.0.tar.gz cd thrift-0.12.0 yum install libtool flex bison pkgconfig boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel ant ./bootstrap.sh ./configure make &make install
- 增加阿里云OSS支持
org.apache.hadoop hadoop-aliyun 3.2.1 provided
-
设置hive版本(CDH上面hive版本为2.1.1)
vim packaging/hudi-flink-bundle/pom.xml
hive.version=2.1.1 -
打hudi包
mvn clean package -DskipTests -Drat.skip=true -Dscala-2.12 -T24C mvn clean install -Drat.skip=true -Pflink-bundle-shade-hive2 -Pinclude-flink-sql-connector-hive -DskipTests -Dscala-2.12 -T24C
问题总结:
问题1. thrift库冲突
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class;
发现是自己flink-parquet打的shaded包里面把org.apache.thrift打进去了,剔除出去重新打包就可以了
org.apache.maven.plugins maven-shade-plugin 3.2.4 package shade true shade org.apache.flink:flink-formats org.apache.flink.formats.parquet shaded.org.apache.flink.formats.parquet org.apache.parquet shaded.org.apache.parquet
主要更改为如下代码(只打包指定模块):
org.apache.flink:flink-formats
问题2:flink向Hudi表写数据,元数据并没有更新到Hive里面,报错连接不上Hive,感觉是版本不匹配
,后maven 编译Hudi表的时候,不把hive依赖打进去,采用
mvn clean package -DskipTests -Drat.skip=true -Dscala-2.12 -T24C
问题3. 刚开始编译Hudi0.9 scala.version=2.12时报错,未找到原因,后先用scala.version=2.11编译成功之后,再用scala.version=2.12编译成功



