资源中心
文件管理UDF管理
资源管理UDF管理 Sqoop 任务
安装sqoop命令测试sqoop任务
定义任务(压缩类型snappy)测试问题
任务报错(sqoop缺少mysql驱动包)native snappy library not available: this version of libhadoop was built without snappy support
最终解决办法(snappy为false,Hadoop需要重新编译) Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录 重新定义任务,不指定压缩类型,可以成功压缩类型选择gzip,可以成功压缩类型选择lzo,报错Cannot find codec class com.hadoop.compression.lzo.LzoCodec...
最终解决(就是单纯的缺少jar包,兜俩圈了) MR(MapReduce) 任务
资源中心 文件管理创建文件夹、文件、上传文件
[dolphinscheduler@host1 ~]$ hdfs dfs -ls / Found 1 items drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 18:17 /dolphinscheduler [dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler Found 1 items drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 18:17 /dolphinscheduler/dolphin [dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin Found 2 items drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 18:18 /dolphinscheduler/dolphin/resources drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 18:17 /dolphinscheduler/dolphin/udfs [dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin/resources Found 2 items drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 18:17 /dolphinscheduler/dolphin/resources/testHDFS -rw-r--r-- 3 dolphinscheduler supergroup 13 2022-03-08 18:18 /dolphinscheduler/dolphin/resources/testhdfs.sh [dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin/resources Found 3 items drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 18:17 /dolphinscheduler/dolphin/resources/testHDFS -rw-r--r-- 3 dolphinscheduler supergroup 13 2022-03-08 18:18 /dolphinscheduler/dolphin/resources/testhdfs.sh -rw-r--r-- 3 dolphinscheduler supergroup 541261 2022-03-08 18:31 /dolphinscheduler/dolphin/resources/微信截图_20220306175420.png [dolphinscheduler@host1 ~]$UDF管理 资源管理
[dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin/udfs Found 2 items -rw-r--r-- 3 dolphinscheduler supergroup 21921 2022-03-08 18:36 /dolphinscheduler/dolphin/udfs/dolphinscheduler-registry-zookeeper-2.0.3.jar drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 18:34 /dolphinscheduler/dolphin/udfs/testUDF [dolphinscheduler@host1 ~]$UDF管理
自定义UDF函数,管理资源管理里面的资源(jar包)
sqoop下载地址
直接终端wget下载,比浏览器要快一点
[dolphinscheduler@host1 app]$ pwd /home/dolphinscheduler/app [dolphinscheduler@host1 app]$ wget http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz --2022-03-08 19:06:05-- http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 正在解析主机 archive.apache.org (archive.apache.org)... 138.201.131.134, 2a01:4f8:172:2ec5::2 正在连接 archive.apache.org (archive.apache.org)|138.201.131.134|:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:17953604 (17M) [application/x-gzip] 正在保存至: “sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz” 100%[=================================================================================================>] 17,953,604 1.75MB/s 用时 16s 2022-03-08 19:06:34 (1.09 MB/s) - 已保存 “sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz” [17953604/17953604]) [dolphinscheduler@host1 app]$
解压、配置、环境变量
[dolphinscheduler@host1 app]$ [dolphinscheduler@host1 app]$ #解压 [dolphinscheduler@host1 app]$ tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz [dolphinscheduler@host1 app]$ #修改配置文件 [dolphinscheduler@host1 app]$ cd sqoop-1.4.7.bin__hadoop-2.6.0/conf/ [dolphinscheduler@host1 conf]$ ls oraoop-site-template.xml sqoop-env-template.cmd sqoop-env-template.sh sqoop-site-template.xml sqoop-site.xml [dolphinscheduler@host1 conf]$ cp sqoop-env-template.sh sqoop-env.sh [dolphinscheduler@host1 conf]$ vi sqoop-env.sh [dolphinscheduler@host1 conf]$ grep HADOOP sqoop-env.sh export HADOOP_COMMON_HOME=/home/dolphinscheduler/app/hadoop-2.7.3 export HADOOP_MAPRED_HOME=/home/dolphinscheduler/app/hadoop-2.7.3 [dolphinscheduler@host1 conf]$ # 配置环境变量 [dolphinscheduler@host1 conf]$ cd [dolphinscheduler@host1 ~]$ vi .bash_profile [dolphinscheduler@host1 ~]$ . .bash_profile [dolphinscheduler@host1 ~]$ grep SQOOP .bash_profile export SQOOP_HOME=/home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0 export PATH=$SQOOP_HOME/bin:$PATH [dolphinscheduler@host1 ~]$ # 点击tab,可以不全sqoop命令,安装成功 [dolphinscheduler@host1 ~]$ sqoop sqoop sqoop-eval sqoop-import-all-tables sqoop-list-tables sqoop.cmd sqoop-export sqoop-import-mainframe sqoop-merge sqoop-codegen sqoop-help sqoop-job sqoop-metastore sqoop-create-hive-table sqoop-import sqoop-list-databases sqoop-version [dolphinscheduler@host1 ~]$ sqoop sqoop sqoop-eval sqoop-import-all-tables sqoop-list-tables sqoop.cmd sqoop-export sqoop-import-mainframe sqoop-merge sqoop-codegen sqoop-help sqoop-job sqoop-metastore sqoop-create-hive-table sqoop-import sqoop-list-databases sqoop-version [dolphinscheduler@host1 ~]$ sqoop测试sqoop任务 定义任务(压缩类型snappy)
环境配置(dolphinscheduler环境管理配置)
#JAVA export JAVA_HOME=/usr/local/java/jdk1.8.0_151 export PATH=$JAVA_HOME/bin:$PATH #hadoop export HADOOP_HOME=/home/dolphinscheduler/app/hadoop-2.7.3 export PATH=$HADOOP_HOME/bin:$PATH #sqoop export SQOOP_HOME=/home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0 export PATH=$SQOOP_HOME/bin:$PATH测试问题 任务报错(sqoop缺少mysql驱动包)
[INFO] 2022-03-08 19:22:25.400 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[117] - command : #!/bin/sh
baseDIR=$(cd `dirname $0`; pwd)
cd $baseDIR
#JAVA
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
#hadoop
export HADOOP_HOME=/home/dolphinscheduler/app/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$PATH
#sqoop
export SQOOP_HOME=/home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$SQOOP_HOME/bin:$PATH
sqoop import -D mapred.job.name=importdb -m 1 --connect "jdbc:mysql://192.168.56.10:3306/dolphinscheduler?allowLoadLocalInfile=false&autoDeserialize=false&allowLocalInfile=false&allowUrlInLocalInfile=false" --username ds_user --password "dolphinscheduler" --query "select id,user_name from t_ds_user WHERe $CONDITIONS" --target-dir /home/dolphinscheduler/app/hadoop-2.7.3/data/tmp/dfs/data --compression-codec snappy --as-textfile --delete-target-dir --fields-terminated-by '@' --lines-terminated-by '|' --null-non-string 'NULL' --null-string 'NULL'
...skipping...
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/03/08 19:22:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/03/08 19:22:26 WARN tool.baseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/08 19:22:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/03/08 19:22:26 INFO tool.CodeGenTool: Beginning code generation
22/03/08 19:22:26 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:253)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:336)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.importTool.importTable(importTool.java:501)
at org.apache.sqoop.tool.importTool.run(importTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[INFO] 2022-03-08 19:22:26.407 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[60] - FINALIZE_SESSION
拷贝mysql 驱动包到sqoop lib目录下再次测试,不再包驱动错
[dolphinscheduler@host1 ~]$ cd app/ [dolphinscheduler@host1 app]$ [dolphinscheduler@host1 app]$ cp dolphinscheduler/lib/mysql-connector-java-8.0.25.jar sqoop-1.4.7.bin__hadoop-2.6.0/lib/ [dolphinscheduler@host1 app]$native snappy library not available: this version of libhadoop was built without snappy support
[INFO] 2022-03-08 19:28:06.877 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[178] - process start, process i
d is: 14955
[INFO] 2022-03-08 19:28:07.879 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] - -> welcome to use bigdat
a scheduling system...
Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! Hbase imports will fail.
Please set $Hbase_HOME to the root of your Hbase installation.
Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
...skipping...
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
[INFO] 2022-03-08 19:28:35.939 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] - -> 22/03/08 19:28:35 INF
O mapreduce.Job: Task Id : attempt_1646732822031_0001_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
at org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
at org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:99)
at org.apache.sqoop.mapreduce.RawKeyTextOutputFormat.getRecordWriter(RawKeyTextOutputFormat.java:102)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
增加环境变量配置再次验证,问题依然存在
export LD_LIBRARY_PATH=/home/dolphinscheduler/app/hadoop-2.7.3/lib/native最终解决办法(snappy为false,Hadoop需要重新编译)
通过安装snappy和重新编译Hadoop解决(打开snappy功能),以下为具体过程
安装snappy
下载地址
[dolphinscheduler@host1 app]$ wget http://pkgs.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz --2022-03-09 14:58:52-- http://pkgs.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz 正在解析主机 pkgs.fedoraproject.org (pkgs.fedoraproject.org)... 38.145.60.17 正在连接 pkgs.fedoraproject.org (pkgs.fedoraproject.org)|38.145.60.17|:80... 已连接。 已发出 HTTP 请求,正在等待回应... 302 Found 位置:https://src.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz [跟随至新的 URL] --2022-03-09 14:58:52-- https://src.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz 正在解析主机 src.fedoraproject.org (src.fedoraproject.org)... 38.145.60.21, 38.145.60.20 正在连接 src.fedoraproject.org (src.fedoraproject.org)|38.145.60.21|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:1777992 (1.7M) [application/x-gzip] 正在保存至: “snappy-1.1.1.tar.gz” 100%[=================================================================================================>] 1,777,992 564KB/s 用时 3.1s 2022-03-09 14:58:56 (564 KB/s) - 已保存 “snappy-1.1.1.tar.gz” [1777992/1777992]) [dolphinscheduler@host1 app]$ tar xf snappy-1.1.1.tar.gz [dolphinscheduler@host1 app]$ cd snappy-1.1.1 [dolphinscheduler@host1 snappy-1.1.1]$ ./configure [dolphinscheduler@host1 snappy-1.1.1]$ make [dolphinscheduler@host1 snappy-1.1.1]$sudo make install [dolphinscheduler@host1 snappy-1.1.1]$ ls -lh /usr/local/lib |grep snappy -rw-r--r--. 1 root root 323K 3月 9 15:03 libsnappy.a -rwxr-xr-x. 1 root root 953 3月 9 15:03 libsnappy.la lrwxrwxrwx. 1 root root 18 3月 9 15:03 libsnappy.so -> libsnappy.so.1.2.0 lrwxrwxrwx. 1 root root 18 3月 9 15:03 libsnappy.so.1 -> libsnappy.so.1.2.0 -rwxr-xr-x. 1 root root 161K 3月 9 15:03 libsnappy.so.1.2.0 [dolphinscheduler@host1 snappy-1.1.1]$
安装protobuf
不按照编译会报错,org.apache.maven.plugin.MojoExecutionException: ‘protoc –version’ did not return a version -> [Help 1]
sudo yum -y install protobuf sudo yum -y install protobuf-compiler
安装cmake
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-common-project/hadoop-common/target/native"): error=2, 没有那个文件或目录 [ERROR] around Ant part ...... @ 4:147 in /home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-common-project/hadoop-common/target/antrun/build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-common [dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum install -y cmake
安装ant
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: exec returned: 1 [ERROR] around Ant part ...... @ 4:147 in /home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-common-project/hadoop-common/target/antrun/build-main.xml [ERROR] -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hadoop-common [dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum install -y ant
还是报错
安装套装工具
[dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum -y groupinstall "Development Tools"
依然报上面的错,安装openssl-devel,最终编译成功
[dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum install -y openssl-devel
下载Hadoop源码,安装maven,进行编译(编译前保证前面步骤均已安装完毕)
dolphinscheduler@host1 app]$ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz --2022-03-09 16:13:49-- https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz 正在解析主机 archive.apache.org (archive.apache.org)... 138.201.131.134, 2a01:4f8:172:2ec5::2 正在连接 archive.apache.org (archive.apache.org)|138.201.131.134|:443... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:18258529 (17M) [application/x-gzip] 正在保存至: “hadoop-2.7.3-src.tar.gz” 100%[=================================================================================================>] 18,258,529 400KB/s 用时 1m 40s 2022-03-09 16:15:31 (178 KB/s) - 已保存 “hadoop-2.7.3-src.tar.gz” [18258529/18258529]) [dolphinscheduler@host1 app]$ [dolphinscheduler@host1 app]$ sudo yum install maven -y
编译命令
[dolphinscheduler@host1 app]$ tar xf hadoop-2.7.3-src.tar.gz [dolphinscheduler@host1 app]$ cd hadoop-2.7.3-src [dolphinscheduler@host1 hadoop-2.7.3-src]$ [dolphinscheduler@host1 hadoop-2.7.3-src]$ mvn clean package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=/usr/local/lib -Dbundle.snappy
进入编译后的目录,将native下面的文件覆盖到Hadoop的部署目录下
[dolphinscheduler@host1 native]$ pwd /home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/lib/native [dolphinscheduler@host1 native]$ cp * /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/
再次checknative,snappy状态为true
[dolphinscheduler@host1 native]$ hadoop checknative 22/03/09 17:43:27 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 22/03/09 17:43:27 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libsnappy.so.1 lz4: true revision:99 bzip2: false openssl: true /lib64/libcrypto.so [dolphinscheduler@host1 native]$
再次验证sqoop snappy压缩类型,终于解决了
Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录[dolphinscheduler@host1 app]$ hadoop checknative 22/03/08 19:41:52 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 22/03/08 19:41:52 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: false lz4: true revision:99 bzip2: false openssl: false Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录)! [dolphinscheduler@host1 app]$ [dolphinscheduler@host1 app]$ cd /usr/lib64/ [dolphinscheduler@host1 lib64]$ # 建立软连接 [dolphinscheduler@host1 lib64]$ sudo ln -s libcrypto.so.1.0.2k libcrypto.so [dolphinscheduler@host1 lib64]$ cd - /home/dolphinscheduler/app [dolphinscheduler@host1 app]$ # 再次执行,错误解决 [dolphinscheduler@host1 app]$ hadoop checknative 22/03/08 19:46:28 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version 22/03/08 19:46:28 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: false lz4: true revision:99 bzip2: false openssl: true /lib64/libcrypto.so [dolphinscheduler@host1 app]$重新定义任务,不指定压缩类型,可以成功
[dolphinscheduler@host1 logs]$ hdfs dfs -ls / Found 4 items drwxr-xr-x - dolphinscheduler supergroup 0 2022-03-08 22:24 /dolphinscheduler drwxr-xr-x - dolphin supergroup 0 2022-03-08 19:28 /home drwxr-xr-x - dolphin supergroup 0 2022-03-09 14:23 /testsqool drwx------ - dolphin supergroup 0 2022-03-08 19:28 /tmp [dolphinscheduler@host1 logs]$ hdfs dfs -ls /testsqool Found 2 items -rw-r--r-- 3 dolphin supergroup 0 2022-03-09 14:23 /testsqool/_SUCCESS -rw-r--r-- 3 dolphin supergroup 8 2022-03-09 14:23 /testsqool/part-m-00000 [dolphinscheduler@host1 logs]$ hdfs dfs -cat /testsqool/part-m-00000 [dolphinscheduler@host1 logs]$ hdfs dfs -cat /testsqool/part-m-00000 [dolphinscheduler@host1 logs]$ hdfs dfs -cat /testsqool/part-m-00000 1@admin|[dolphinscheduler@host1 logs]$压缩类型选择gzip,可以成功
[INFO] 2022-03-09 14:36:33.826 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] - -> 22/03/09 14:36:33 INF
O manager.SqlManager: Executing SQL statement: select id,user_name from t_ds_user WHERe (1 = 0)
22/03/09 14:36:33 INFO manager.SqlManager: Executing SQL statement: select id,user_name from t_ds_user WHERe (1 = 0)
22/03/09 14:36:33 INFO manager.SqlManager: Executing SQL statement: select id,user_name from t_ds_user WHERe (1 = 0)
22/03/09 14:36:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/dolphinscheduler/app/hadoop-2.7.3
[INFO] 2022-03-09 14:36:34.827 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] - -> 注: /tmp/sqoop-dolphin/compile/3b2082173e1a4e7b76050dab82e32ea8/QueryResult.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
22/03/09 14:36:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-dolphin/compile/3b2082173e1a4e7b76050dab82e32ea8/QueryResult.jar
[INFO] 2022-03-09 14:36:35.829 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] - -> 22/03/09 14:36:35 INFO tool.importTool: Destination directory /testsqool deleted.
22/03/09 14:36:35 INFO mapreduce.importJobbase: Beginning query import.
22/03/09 14:36:35 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/03/09 14:36:35 ERROR tool.importTool: import failed: com.cloudera.sqoop.io.UnsupportedCodecException: Cannot find codec class com.hadoop.compression.lzo.LzoCodec for codec lzo
at org.apache.sqoop.io.CodecMap.getCodec(CodecMap.java:111)
at com.cloudera.sqoop.io.CodecMap.getCodec(CodecMap.java:64)
at org.apache.sqoop.mapreduce.importJobbase.configureOutputFormat(importJobbase.java:120)
at org.apache.sqoop.mapreduce.importJobbase.runimport(importJobbase.java:263)
at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
at org.apache.sqoop.tool.importTool.importTable(importTool.java:522)
at org.apache.sqoop.tool.importTool.run(importTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[INFO] 2022-03-09 14:36:36.013 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[202] - process has exited, exec
ute path:/tmp/dolphinscheduler/exec/process/4640619800832/4762182296320_7/1520/1543, processId:25749 ,exitStatusCode:1 ,processWaitForStatu
s:true ,processExitValue:1
[INFO] 2022-03-09 14:36:36.838 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[60] - FINALIZE_SESSION
(END)
下载安装lzo,还是报错
[dolphinscheduler@host1 ~]$ cd app/ [dolphinscheduler@host1 app]$ wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz --2022-03-09 18:05:18-- http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz 正在解析主机 www.oberhumer.com (www.oberhumer.com)... 193.170.194.40 正在连接 www.oberhumer.com (www.oberhumer.com)|193.170.194.40|:80... 已连接。 已发出 HTTP 请求,正在等待回应... 200 OK 长度:583045 (569K) [application/x-gzip] 正在保存至: “lzo-2.06.tar.gz” 100%[=================================================================================================>] 583,045 421KB/s 用时 1.4s 2022-03-09 18:05:20 (421 KB/s) - 已保存 “lzo-2.06.tar.gz” [583045/583045]) [dolphinscheduler@host1 app]$ tar xf lzo-2.06.tar.gz [dolphinscheduler@host1 app]$ cd lzo-2.06 [dolphinscheduler@host1 lzo-2.06]$ ./configure --enable-shared [dolphinscheduler@host1 lzo-2.06]$ make [dolphinscheduler@host1 lzo-2.06]$ sudo make install
修改hadoop 配置也依然报错,真正功能貌似是配置lzo后缀用到的
最终解决(就是单纯的缺少jar包,兜俩圈了)io.compression.codec.lzo.class com.hadoop.compression.lzo.LzoCodec
下载hadoop-lzo源码
解压编译,找到对应jar包,拷贝到sqoop的lib下面
[dolphinscheduler@host1 app]$ tar xf hadoop-lzo-release-0.4.20.tar.gz [dolphinscheduler@host1 app]$ cd hadoop-lzo-release-0.4.20 [dolphinscheduler@host1 hadoop-lzo-release-0.4.20]$ mvn clean package -Dmaven.test.skip=true ... [INFO] Building jar: /home/dolphinscheduler/app/hadoop-lzo-release-0.4.20/target/hadoop-lzo-0.4.20-javadoc.jar [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3:07.479s [INFO] Finished at: Wed Mar 09 18:40:09 CST 2022 [INFO] Final Memory: 28M/149M [INFO] ------------------------------------------------------------------------ [dolphinscheduler@host1 hadoop-lzo-release-0.4.20]$ [dolphinscheduler@host1 hadoop-lzo-release-0.4.20]$ cd target/ [dolphinscheduler@host1 target]$ ll 总用量 436 drwxr-xr-x. 2 dolphinscheduler dolphin 4096 3月 9 18:39 antrun drwxr-xr-x. 4 dolphinscheduler dolphin 4096 3月 9 18:40 apidocs drwxr-xr-x. 5 dolphinscheduler dolphin 66 3月 9 18:39 classes drwxr-xr-x. 3 dolphinscheduler dolphin 25 3月 9 18:39 generated-sources -rw-r--r--. 1 dolphinscheduler dolphin 188788 3月 9 18:39 hadoop-lzo-0.4.20.jar -rw-r--r--. 1 dolphinscheduler dolphin 191960 3月 9 18:40 hadoop-lzo-0.4.20-javadoc.jar -rw-r--r--. 1 dolphinscheduler dolphin 51992 3月 9 18:40 hadoop-lzo-0.4.20-sources.jar drwxr-xr-x. 2 dolphinscheduler dolphin 71 3月 9 18:40 javadoc-bundle-options drwxr-xr-x. 2 dolphinscheduler dolphin 28 3月 9 18:39 maven-archiver drwxr-xr-x. 3 dolphinscheduler dolphin 28 3月 9 18:39 native drwxr-xr-x. 4 dolphinscheduler dolphin 54 3月 9 18:39 test-classes [dolphinscheduler@host1 target]$ cp hadoop-lzo-0.4.20.jar /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/ [dolphinscheduler@host1 target]$ [dolphinscheduler@host1 target]$ hdfs dfs -ls /testsqool Found 2 items -rw-r--r-- 3 dolphin supergroup 0 2022-03-09 18:46 /testsqool/_SUCCESS -rw-r--r-- 3 dolphin supergroup 20 2022-03-09 18:46 /testsqool/part-m-00000.lzo_deflate [dolphinscheduler@host1 target]$MR(MapReduce) 任务
找到Hadoop自带的案例jar包,上传到资源管理
[dolphinscheduler@host1 mapreduce]$ cd [dolphinscheduler@host1 ~]$ cd app/hadoop-2.7.3/ [dolphinscheduler@host1 hadoop-2.7.3]$ find . -name hadoop-mapreduce-examples-2.7.3.jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar [dolphinscheduler@host1 hadoop-2.7.3]$ sz ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
创建任务
创建源文件目录,上传测试文件
[dolphinscheduler@host1 mapreduce]$ hdfs dfs -mkdir -p /testMr/in [dolphinscheduler@host1 mapreduce]$ ll 总用量 4972 -rw-r--r--. 1 dolphinscheduler dolphin 537521 8月 18 2016 hadoop-mapreduce-client-app-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 773501 8月 18 2016 hadoop-mapreduce-client-common-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 1554595 8月 18 2016 hadoop-mapreduce-client-core-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 189714 8月 18 2016 hadoop-mapreduce-client-hs-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 27598 8月 18 2016 hadoop-mapreduce-client-hs-plugins-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 61745 8月 18 2016 hadoop-mapreduce-client-jobclient-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 1551594 8月 18 2016 hadoop-mapreduce-client-jobclient-2.7.3-tests.jar -rw-r--r--. 1 dolphinscheduler dolphin 71310 8月 18 2016 hadoop-mapreduce-client-shuffle-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 295812 8月 18 2016 hadoop-mapreduce-examples-2.7.3.jar drwxr-xr-x. 2 dolphinscheduler dolphin 4096 8月 18 2016 lib drwxr-xr-x. 2 dolphinscheduler dolphin 30 8月 18 2016 lib-examples drwxr-xr-x. 2 dolphinscheduler dolphin 4096 8月 18 2016 sources [dolphinscheduler@host1 mapreduce]$ ll >> testWord.txt [dolphinscheduler@host1 mapreduce]$ hdfs dfs -put testWord.txt /testMr/in [dolphinscheduler@host1 mapreduce]$
测试结果,除了sqoop,好像其它任务都挺顺利的
具体单词统计结果如下,比如jar包名都是一次
[dolphinscheduler@host1 mapreduce]$ hdfs dfs -ls /testMr/out Found 2 items -rw-r--r-- 3 dolphin supergroup 0 2022-03-10 17:46 /testMr/out/_SUCCESS -rw-r--r-- 3 dolphin supergroup 662 2022-03-10 17:46 /testMr/out/part-r-00000 [dolphinscheduler@host1 mapreduce]$ hdfs dfs -cat /testMr/out/part-r-00000 -rw-r--r--. 10 0 1 1 10 10 1 1551594 1 1554595 1 17:43 1 18 12 189714 1 2 3 2016 12 27598 1 295812 1 30 1 3月 1 4096 2 4972 1 537521 1 61745 1 71310 1 773501 1 8月 12 dolphin 13 dolphinscheduler 13 drwxr-xr-x. 3 hadoop-mapreduce-client-app-2.7.3.jar 1 hadoop-mapreduce-client-common-2.7.3.jar 1 hadoop-mapreduce-client-core-2.7.3.jar 1 hadoop-mapreduce-client-hs-2.7.3.jar 1 hadoop-mapreduce-client-hs-plugins-2.7.3.jar 1 hadoop-mapreduce-client-jobclient-2.7.3-tests.jar 1 hadoop-mapreduce-client-jobclient-2.7.3.jar 1 hadoop-mapreduce-client-shuffle-2.7.3.jar 1 hadoop-mapreduce-examples-2.7.3.jar 1 lib 1 lib-examples 1 sources 1 testWord.txt 1 总用量 1 [dolphinscheduler@host1 mapreduce]$ cat testWord.txt 总用量 4972 -rw-r--r--. 1 dolphinscheduler dolphin 537521 8月 18 2016 hadoop-mapreduce-client-app-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 773501 8月 18 2016 hadoop-mapreduce-client-common-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 1554595 8月 18 2016 hadoop-mapreduce-client-core-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 189714 8月 18 2016 hadoop-mapreduce-client-hs-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 27598 8月 18 2016 hadoop-mapreduce-client-hs-plugins-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 61745 8月 18 2016 hadoop-mapreduce-client-jobclient-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 1551594 8月 18 2016 hadoop-mapreduce-client-jobclient-2.7.3-tests.jar -rw-r--r--. 1 dolphinscheduler dolphin 71310 8月 18 2016 hadoop-mapreduce-client-shuffle-2.7.3.jar -rw-r--r--. 1 dolphinscheduler dolphin 295812 8月 18 2016 hadoop-mapreduce-examples-2.7.3.jar drwxr-xr-x. 2 dolphinscheduler dolphin 4096 8月 18 2016 lib drwxr-xr-x. 2 dolphinscheduler dolphin 30 8月 18 2016 lib-examples drwxr-xr-x. 2 dolphinscheduler dolphin 4096 8月 18 2016 sources -rw-r--r--. 1 dolphinscheduler dolphin 0 3月 10 17:43 testWord.txt [dolphinscheduler@host1 mapreduce]$



