实践数据湖iceberg 第一课 入门
实践数据湖iceberg 第二课 iceberg基于hadoop的底层数据格式
实践数据湖iceberg 第三课 在sqlclient中,以sql方式从kafka读数据到iceberg
实践数据湖iceberg 第四课 在sqlclient中,以sql方式从kafka读数据到iceberg(升级版本到flink1.12.7)
实践数据湖iceberg 第五课 hive catalog特点
实践数据湖iceberg 第六课 从kafka写入到iceberg失败问题 解决
实践数据湖iceberg 第七课 实时写入到iceberg
实践数据湖iceberg 第八课 hive与iceberg集成
提示:写完文章后,目录可以自动生成,如何生成可参考右边的帮助文档
文章目录系列文章目录前言1. 准备iceberg与hive结合的包2. hive中测试iceberg
2.1. 在hive中建表2.2 直接读取flink-sql生成的iceberg表,报错2.3 为什么2.1直接在hive上建立的icerberg能直接查?从flinksql创建的icerberg报错? 总结
前言 本节学习目标: 测试从hive入口,进行iceberg格式的表CRUD
提示:以下是本篇文章正文内容,下面案例可供参考
1. 准备iceberg与hive结合的包把 iceberg-hive-runtime-0.12.1.jar 放到hive的lib下。
下载 iceberg-hive-runtime-0.12.1.jar 包,放到HIVE_HOME/lib下。 下载地址 link.
代码如下(示例):
创建表脚本:
CREATE TABLE table_a ( id bigint, name string ) PARTITIonED BY ( dept string ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
运行看看
hive (iceberg_db)> CREATE TABLE table_a (
> id bigint, name string
> ) PARTITIonED BY (
> dept string
> ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
OK
Time taken: 0.253 seconds
hive (iceberg_db)> insert into table_a values(1,'dev');
FAILED: SemanticException [Error 10044]: Line 1:12 Cannot insert into target table because column number/types are different 'table_a': Table insclause-0 has 3 columns, but query has 2 columns.
hive (iceberg_db)> insert into table_a values(1,'tom','dev');
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20220120165250_f1a960bc-af68-4b80-b722-211d5fb282ed
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1642579431487_0015, Tracking URL = http://hadoop101:8088/proxy/application_1642579431487_0015/
Kill Command = /opt/module/hadoop/bin/hadoop job -kill job_1642579431487_0015
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2022-01-20 16:52:56,533 Stage-3 map = 0%, reduce = 0%
2022-01-20 16:53:02,790 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.22 sec
MapReduce Total cumulative CPU time: 4 seconds 220 msec
Ended Job = job_1642579431487_0015
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 4.22 sec HDFS Read: 114527 HDFS Write: 3531 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 220 msec
OK
_col0 _col1 _col2
Time taken: 13.967 seconds
hive (iceberg_db)> select * from table_a;
OK
table_a.id table_a.name table_a.dept
1 tom dev
Time taken: 0.143 seconds, Fetched: 1 row(s)
2.2 直接读取flink-sql生成的iceberg表,报错
hive中读由flink-sql 生成的iceberg表,报错
hive (iceberg_db6)> select count(*) from behavior_log_ib6; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = root_20220120170353_9d501412-4843-4c1d-9b53-d4c31b91daa5 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:519) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:328) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:414) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.util.concurrent.ExecutionException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getNonCombinablePathIndices(CombineHiveInputFormat.java:476) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:508) ... 37 more Caused by: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork! at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:330) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CheckNonCombinablePathCallable.call(CombineHiveInputFormat.java:107) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CheckNonCombinablePathCallable.call(CombineHiveInputFormat.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: java.lang.InstantiationException at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:85) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:322) ... 6 more Caused by: java.lang.InstantiationException at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:83) ... 7 more Job Submission failed with exception 'java.io.IOException(java.util.concurrent.ExecutionException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork!)' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. java.util.concurrent.ExecutionException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.FileInputFormat as specified in mapredWork!
灵魂拷问: 为什么2.1直接在hive上建立的icerberg能直接查?从flinksql创建的icerberg报错?
2.3 为什么2.1直接在hive上建立的icerberg能直接查?从flinksql创建的icerberg报错?尝试了把/opt/software/iceberg-flink-runtime-0.12.1.jar 放到hive/lib 下 没有用
头疼,问题先挂起来
总结
心情很糟。。。
待续。。。



