| 组件 | 版本 |
|---|---|
| hudi | 10.0 |
| flink | 13.5 |
https://github.com/apache/hudi/releases2.1 需要改flink 版本为13.5
根目录下面的pom 文件
2.2 编译命令1.13.5 3.1.0 3.1.1
mvn clean package -DskipTests # 或者指定scala 版本 #编译后的包 包的路径在packaging/hudi-flink-bundle/target/hudi-flink-bundle_2.12-*.*.*-SNAPSHOT.jar2.3编译遇到一个错误
[INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary for Hudi 0.10.0: [INFO] [INFO] Hudi ............................................... SUCCESS [ 1.642 s] [INFO] hudi-common ........................................ SUCCESS [ 9.808 s] [INFO] hudi-aws ........................................... SUCCESS [ 1.306 s] [INFO] hudi-timeline-service .............................. SUCCESS [ 1.623 s] [INFO] hudi-client ........................................ SUCCESS [ 0.082 s] [INFO] hudi-client-common ................................. SUCCESS [ 8.027 s] [INFO] hudi-hadoop-mr ..................................... SUCCESS [ 2.825 s] [INFO] hudi-spark-client .................................. SUCCESS [ 13.891 s] [INFO] hudi-sync-common ................................... SUCCESS [ 0.718 s] [INFO] hudi-hive-sync ..................................... SUCCESS [ 3.027 s] [INFO] hudi-spark-datasource .............................. SUCCESS [ 0.066 s] [INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 7.706 s] [INFO] hudi-spark2_2.12 ................................... SUCCESS [ 9.535 s] [INFO] hudi-spark_2.12 .................................... SUCCESS [ 25.923 s] [INFO] hudi-utilities_2.12 ................................ FAILURE [ 2.638 s] [INFO] hudi-utilities-bundle_2.12 ......................... SKIPPED [INFO] hudi-cli ........................................... SKIPPED [INFO] hudi-java-client ................................... SKIPPED [INFO] hudi-flink-client .................................. SKIPPED [INFO] hudi-spark3_2.12 ................................... SKIPPED [INFO] hudi-dla-sync ...................................... SKIPPED [INFO] hudi-sync .......................................... SKIPPED [INFO] hudi-hadoop-mr-bundle .............................. SKIPPED [INFO] hudi-hive-sync-bundle .............................. SKIPPED [INFO] hudi-spark-bundle_2.12 ............................. SKIPPED [INFO] hudi-presto-bundle ................................. SKIPPED [INFO] hudi-timeline-server-bundle ........................ SKIPPED [INFO] hudi-hadoop-docker ................................. SKIPPED [INFO] hudi-hadoop-base-docker ............................ SKIPPED [INFO] hudi-hadoop-namenode-docker ........................ SKIPPED [INFO] hudi-hadoop-datanode-docker ........................ SKIPPED [INFO] hudi-hadoop-history-docker ......................... SKIPPED [INFO] hudi-hadoop-hive-docker ............................ SKIPPED [INFO] hudi-hadoop-sparkbase-docker ....................... SKIPPED [INFO] hudi-hadoop-sparkmaster-docker ..................... SKIPPED [INFO] hudi-hadoop-sparkworker-docker ..................... SKIPPED [INFO] hudi-hadoop-sparkadhoc-docker ...................... SKIPPED [INFO] hudi-hadoop-presto-docker .......................... SKIPPED [INFO] hudi-integ-test .................................... SKIPPED [INFO] hudi-integ-test-bundle ............................. SKIPPED [INFO] hudi-examples ...................................... SKIPPED [INFO] hudi-flink_2.12 .................................... SKIPPED [INFO] hudi-kafka-connect ................................. SKIPPED [INFO] hudi-flink-bundle_2.12 ............................. SKIPPED [INFO] hudi-kafka-connect-bundle .......................... SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 01:29 min [INFO] Finished at: 2022-02-06T17:59:02+08:00 [INFO] ------------------------------------------------------------------------ [ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.10.0: The following artifacts could not be resolved: io.confluent:kafka-avro-serializer:jar:5.3.4, io.confluent:common-config:jar:5.3.4, io.confluent:common-utils:jar:5.3.4, io.confluent:kafka-schema-registry-client:jar:5.3.4: Could not find artifact io.confluent:kafka-avro-serializer:jar:5.3.4 in aliyunmaven (https://maven.aliyun.com/repository/public) -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn-rf :hudi-utilities_2.12
以上错误需要手动下载包后添加本地仓库
mvn install:install-file -Dfile=/opt/myjar/common-config-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-config -Dversion=5.3.0 -Dpackaging=jar mvn install:install-file -Dfile=/opt/myjar/common-utils-5.3.0.jar -DgroupId=io.confluent -DartifactId=common-utils -Dversion=5.3.0 -Dpackaging=jar mvn install:install-file -Dfile=/opt/myjar/kafka-avro-serializer-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-avro-serializer -Dversion=5.3.0 -Dpackaging=jar mvn install:install-file -Dfile=/opt/myjar/kafka-schema-registry-client-5.3.0.jar -DgroupId=io.confluent -DartifactId=kafka-schema-registry-client -Dversion=5.3.0 -Dpackaging=jar
[INFO] Reactor Summary for Hudi 0.10.0: [INFO] [INFO] Hudi ............................................... SUCCESS [ 1.370 s] [INFO] hudi-common ........................................ SUCCESS [ 10.813 s] [INFO] hudi-aws ........................................... SUCCESS [ 1.394 s] [INFO] hudi-timeline-service .............................. SUCCESS [ 1.404 s] [INFO] hudi-client ........................................ SUCCESS [ 0.072 s] [INFO] hudi-client-common ................................. SUCCESS [ 7.295 s] [INFO] hudi-hadoop-mr ..................................... SUCCESS [ 2.848 s] [INFO] hudi-spark-client .................................. SUCCESS [ 15.158 s] [INFO] hudi-sync-common ................................... SUCCESS [ 0.681 s] [INFO] hudi-hive-sync ..................................... SUCCESS [ 2.856 s] [INFO] hudi-spark-datasource .............................. SUCCESS [ 0.054 s] [INFO] hudi-spark-common_2.12 ............................. SUCCESS [ 7.296 s] [INFO] hudi-spark2_2.12 ................................... SUCCESS [ 10.521 s] [INFO] hudi-spark_2.12 .................................... SUCCESS [ 26.299 s] [INFO] hudi-utilities_2.12 ................................ SUCCESS [ 11.262 s] [INFO] hudi-utilities-bundle_2.12 ......................... SUCCESS [01:39 min] [INFO] hudi-cli ........................................... SUCCESS [ 15.297 s] [INFO] hudi-java-client ................................... SUCCESS [ 2.267 s] [INFO] hudi-flink-client .................................. SUCCESS [01:06 min] [INFO] hudi-spark3_2.12 ................................... SUCCESS [ 6.117 s] [INFO] hudi-dla-sync ...................................... SUCCESS [ 6.830 s] [INFO] hudi-sync .......................................... SUCCESS [ 0.061 s] [INFO] hudi-hadoop-mr-bundle .............................. SUCCESS [ 8.565 s] [INFO] hudi-hive-sync-bundle .............................. SUCCESS [ 1.131 s] [INFO] hudi-spark-bundle_2.12 ............................. SUCCESS [ 11.139 s] [INFO] hudi-presto-bundle ................................. SUCCESS [ 38.706 s] [INFO] hudi-timeline-server-bundle ........................ SUCCESS [ 8.251 s] [INFO] hudi-hadoop-docker ................................. SUCCESS [ 1.166 s] [INFO] hudi-hadoop-base-docker ............................ SUCCESS [ 0.649 s] [INFO] hudi-hadoop-namenode-docker ........................ SUCCESS [ 0.649 s] [INFO] hudi-hadoop-datanode-docker ........................ SUCCESS [ 0.627 s] [INFO] hudi-hadoop-history-docker ......................... SUCCESS [ 0.659 s] [INFO] hudi-hadoop-hive-docker ............................ SUCCESS [ 7.320 s] [INFO] hudi-hadoop-sparkbase-docker ....................... SUCCESS [ 0.731 s] [INFO] hudi-hadoop-sparkmaster-docker ..................... SUCCESS [ 0.638 s] [INFO] hudi-hadoop-sparkworker-docker ..................... SUCCESS [ 0.667 s] [INFO] hudi-hadoop-sparkadhoc-docker ...................... SUCCESS [ 0.671 s] [INFO] hudi-hadoop-presto-docker .......................... SUCCESS [ 0.704 s] [INFO] hudi-integ-test .................................... SUCCESS [ 36.320 s] [INFO] hudi-integ-test-bundle ............................. SUCCESS [01:47 min] [INFO] hudi-examples ...................................... SUCCESS [ 8.120 s] [INFO] hudi-flink_2.12 .................................... SUCCESS [ 38.207 s] [INFO] hudi-kafka-connect ................................. SUCCESS [ 19.832 s] [INFO] hudi-flink-bundle_2.12 ............................. SUCCESS [ 27.658 s] [INFO] hudi-kafka-connect-bundle .......................... SUCCESS [ 14.287 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 10:30 min [INFO] Finished at: 2022-02-06T20:29:29+08:00 [INFO] ------------------------------------------------------------------------ [root@node01 hudi-0.10.0]#3. 编译的包目录
[root@node01 packaging]# pwd /opt/module/hudi/hudi-0.10.0/packaging [root@node01 packaging]# ll 总用量 4 drwxr-xr-x 4 501 games 46 2月 6 20:41 hudi-flink-bundle drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-hadoop-mr-bundle drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-hive-sync-bundle drwxr-xr-x 4 501 games 46 2月 6 20:39 hudi-integ-test-bundle drwxr-xr-x 4 501 games 46 2月 6 20:41 hudi-kafka-connect-bundle drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-presto-bundle drwxr-xr-x 4 501 games 46 2月 6 20:38 hudi-spark-bundle drwxr-xr-x 4 501 games 101 2月 6 20:38 hudi-timeline-server-bundle drwxr-xr-x 4 501 games 46 2月 6 20:37 hudi-utilities-bundle -rw-r--r-- 1 501 games 2206 12月 8 10:38 README.md [root@node01 packaging]#4.flink 整合hudi 所需要的jar 包
主要是
hudi-flink-bundle_2.12-0.10.0.jar
hudi-hadoop-mr-bundle-0.10.0.jar
[root@node01 lib]# pwd /opt/module/flink/flink-1.13.5/lib [root@node01 lib]# ll 总用量 316964 -rw-r--r-- 1 root root 7802399 1月 1 08:27 doris-flink-1.0-SNAPSHOT.jar -rw-r--r-- 1 root root 249571 12月 27 23:32 flink-connector-jdbc_2.12-1.13.5.jar -rw-r--r-- 1 root root 359138 1月 1 10:17 flink-connector-kafka_2.12-1.13.5.jar -rw-r--r-- 1 hive 1007 92315 12月 15 08:23 flink-csv-1.13.5.jar -rw-r--r-- 1 hive 1007 106535830 12月 15 08:29 flink-dist_2.12-1.13.5.jar -rw-r--r-- 1 hive 1007 148127 12月 15 08:23 flink-json-1.13.5.jar -rw-r--r-- 1 root root 43317025 2月 6 20:51 flink-shaded-hadoop-2-uber-2.8.3-10.0.jar -rw-r--r-- 1 hive 1007 7709740 12月 15 06:57 flink-shaded-zookeeper-3.4.14.jar -rw-r--r-- 1 hive 1007 35051557 12月 15 08:28 flink-table_2.12-1.13.5.jar -rw-r--r-- 1 hive 1007 38613344 12月 15 08:28 flink-table-blink_2.12-1.13.5.jar -rw-r--r-- 1 root root 62447468 2月 6 20:44 hudi-flink-bundle_2.12-0.10.0.jar -rw-r--r-- 1 root root 17276348 2月 6 20:51 hudi-hadoop-mr-bundle-0.10.0.jar -rw-r--r-- 1 root root 1893564 1月 1 10:17 kafka-clients-2.0.0.jar -rw-r--r-- 1 hive 1007 207909 12月 15 06:56 log4j-1.2-api-2.16.0.jar -rw-r--r-- 1 hive 1007 301892 12月 15 06:56 log4j-api-2.16.0.jar -rw-r--r-- 1 hive 1007 1789565 12月 15 06:56 log4j-core-2.16.0.jar -rw-r--r-- 1 hive 1007 24258 12月 15 06:56 log4j-slf4j-impl-2.16.0.jar -rw-r--r-- 1 root root 724213 12月 27 23:23 mysql-connector-java-5.1.9.jar [root@node01 lib]#5. 进入到flink sql 中
./sql-client.sh embedded shell # 在SQL Cli设置分析结果展示模式 set execution.result-mode=tableau;6. 建表语句
CREATE TABLE t1( uuid VARCHAR(20), name VARCHAR(10), age INT, ts TIMESTAMP(3), `partition` VARCHAR(20) ) PARTITIONED BY (`partition`) WITH ( 'connector' = 'hudi', 'path' = 'hdfs://192.168.1.161:8020/hudi-warehouse/hudi-t1', 'write.tasks' = '1', 'compaction.tasks' = '1', 'table.type' = 'MERGE_ON_READ' );6.1 插入数据
INSERT INTO t1 VALUES('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1');
INSERT INTO t1 VALUES
('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
## 展示
Flink SQL> INSERT INTO t1 VALUES
> ('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),
> ('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),
> ('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),
> ('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),
> ('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),
> ('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),
> ('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4');
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: d6c70e43969b0f2b5124104468c5e065
Flink SQL> select * from t1;
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| op | uuid | name | age | ts | partition |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
| +I | id6 | Emma | 20 | 1970-01-01 00:00:06.000 | par3 |
| +I | id5 | Sophia | 18 | 1970-01-01 00:00:05.000 | par3 |
| +I | id8 | Han | 56 | 1970-01-01 00:00:08.000 | par4 |
| +I | id7 | Bob | 44 | 1970-01-01 00:00:07.000 | par4 |
| +I | id2 | Stephen | 33 | 1970-01-01 00:00:02.000 | par1 |
| +I | id1 | Danny | 28 | 1970-01-01 00:00:01.000 | par1 |
| +I | id4 | Fabian | 31 | 1970-01-01 00:00:04.000 | par2 |
| +I | id3 | Julian | 53 | 1970-01-01 00:00:03.000 | par2 |
+----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+
Received a total of 8 rows
7 更新 操作 更新就是需要从新插入数据
将年龄更改为18
INSERT INTO t1 VALUES(‘id1’,‘Danny’,18,TIMESTAMP ‘1970-01-01 00:00:01’,‘par1’);
查询如下
Flink SQL> select * from t1; +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+ | op | uuid | name | age | ts | partition | +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+ | +I | id8 | Han | 56 | 1970-01-01 00:00:08.000 | par4 | | +I | id7 | Bob | 44 | 1970-01-01 00:00:07.000 | par4 | | +I | id4 | Fabian | 31 | 1970-01-01 00:00:04.000 | par2 | | +I | id3 | Julian | 53 | 1970-01-01 00:00:03.000 | par2 | | +I | id2 | Stephen | 33 | 1970-01-01 00:00:02.000 | par1 | | +I | id1 | Danny | 18 | 1970-01-01 00:00:01.000 | par1 | | +I | id6 | Emma | 20 | 1970-01-01 00:00:06.000 | par3 | | +I | id5 | Sophia | 18 | 1970-01-01 00:00:05.000 | par3 | +----+--------------------------------+--------------------------------+-------------+-------------------------+--------------------------------+ Received a total of 8 rows Flink SQL>8.flink 中的任务



