栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

dolphinscheduler涉及HDFS功能测试(二)资源中心、SQOOP、MR(MapReduce)

dolphinscheduler涉及HDFS功能测试(二)资源中心、SQOOP、MR(MapReduce)

dolphinscheduler涉及HDFS的功能

资源中心

文件管理UDF管理

资源管理UDF管理 Sqoop 任务

安装sqoop命令测试sqoop任务

定义任务(压缩类型snappy)测试问题

任务报错(sqoop缺少mysql驱动包)native snappy library not available: this version of libhadoop was built without snappy support

最终解决办法(snappy为false,Hadoop需要重新编译) Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录 重新定义任务,不指定压缩类型,可以成功压缩类型选择gzip,可以成功压缩类型选择lzo,报错Cannot find codec class com.hadoop.compression.lzo.LzoCodec...

最终解决(就是单纯的缺少jar包,兜俩圈了) MR(MapReduce) 任务

资源中心 文件管理

创建文件夹、文件、上传文件

[dolphinscheduler@host1 ~]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 18:17 /dolphinscheduler
[dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler
Found 1 items
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 18:17 /dolphinscheduler/dolphin
[dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin
Found 2 items
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 18:18 /dolphinscheduler/dolphin/resources
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 18:17 /dolphinscheduler/dolphin/udfs
[dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin/resources
Found 2 items
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 18:17 /dolphinscheduler/dolphin/resources/testHDFS
-rw-r--r--   3 dolphinscheduler supergroup         13 2022-03-08 18:18 /dolphinscheduler/dolphin/resources/testhdfs.sh
[dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin/resources
Found 3 items
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 18:17 /dolphinscheduler/dolphin/resources/testHDFS
-rw-r--r--   3 dolphinscheduler supergroup         13 2022-03-08 18:18 /dolphinscheduler/dolphin/resources/testhdfs.sh
-rw-r--r--   3 dolphinscheduler supergroup     541261 2022-03-08 18:31 /dolphinscheduler/dolphin/resources/微信截图_20220306175420.png
[dolphinscheduler@host1 ~]$ 

UDF管理 资源管理

[dolphinscheduler@host1 ~]$ hdfs dfs -ls /dolphinscheduler/dolphin/udfs
Found 2 items
-rw-r--r--   3 dolphinscheduler supergroup      21921 2022-03-08 18:36 /dolphinscheduler/dolphin/udfs/dolphinscheduler-registry-zookeeper-2.0.3.jar
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 18:34 /dolphinscheduler/dolphin/udfs/testUDF
[dolphinscheduler@host1 ~]$ 
UDF管理

自定义UDF函数,管理资源管理里面的资源(jar包)

Sqoop 任务 安装sqoop命令

sqoop下载地址

直接终端wget下载,比浏览器要快一点

[dolphinscheduler@host1 app]$ pwd
/home/dolphinscheduler/app
[dolphinscheduler@host1 app]$ wget http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 
--2022-03-08 19:06:05--  http://archive.apache.org/dist/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
正在解析主机 archive.apache.org (archive.apache.org)... 138.201.131.134, 2a01:4f8:172:2ec5::2
正在连接 archive.apache.org (archive.apache.org)|138.201.131.134|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:17953604 (17M) [application/x-gzip]
正在保存至: “sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz”

100%[=================================================================================================>] 17,953,604  1.75MB/s 用时 16s    

2022-03-08 19:06:34 (1.09 MB/s) - 已保存 “sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz” [17953604/17953604])

[dolphinscheduler@host1 app]$ 

解压、配置、环境变量

[dolphinscheduler@host1 app]$ 
[dolphinscheduler@host1 app]$ #解压
[dolphinscheduler@host1 app]$ tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 
[dolphinscheduler@host1 app]$ #修改配置文件
[dolphinscheduler@host1 app]$ cd sqoop-1.4.7.bin__hadoop-2.6.0/conf/
[dolphinscheduler@host1 conf]$ ls
oraoop-site-template.xml  sqoop-env-template.cmd  sqoop-env-template.sh  sqoop-site-template.xml  sqoop-site.xml
[dolphinscheduler@host1 conf]$ cp sqoop-env-template.sh sqoop-env.sh
[dolphinscheduler@host1 conf]$ vi sqoop-env.sh 
[dolphinscheduler@host1 conf]$ grep HADOOP sqoop-env.sh 
export HADOOP_COMMON_HOME=/home/dolphinscheduler/app/hadoop-2.7.3
export HADOOP_MAPRED_HOME=/home/dolphinscheduler/app/hadoop-2.7.3
[dolphinscheduler@host1 conf]$ # 配置环境变量
[dolphinscheduler@host1 conf]$ cd
[dolphinscheduler@host1 ~]$ vi .bash_profile 
[dolphinscheduler@host1 ~]$ . .bash_profile 
[dolphinscheduler@host1 ~]$ grep SQOOP .bash_profile 
export SQOOP_HOME=/home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$SQOOP_HOME/bin:$PATH
[dolphinscheduler@host1 ~]$ # 点击tab,可以不全sqoop命令,安装成功
[dolphinscheduler@host1 ~]$ sqoop
sqoop                    sqoop-eval               sqoop-import-all-tables  sqoop-list-tables        
sqoop.cmd                sqoop-export             sqoop-import-mainframe   sqoop-merge              
sqoop-codegen            sqoop-help               sqoop-job                sqoop-metastore          
sqoop-create-hive-table  sqoop-import             sqoop-list-databases     sqoop-version            
[dolphinscheduler@host1 ~]$ sqoop
sqoop                    sqoop-eval               sqoop-import-all-tables  sqoop-list-tables        
sqoop.cmd                sqoop-export             sqoop-import-mainframe   sqoop-merge              
sqoop-codegen            sqoop-help               sqoop-job                sqoop-metastore          
sqoop-create-hive-table  sqoop-import             sqoop-list-databases     sqoop-version            
[dolphinscheduler@host1 ~]$ sqoop

测试sqoop任务 定义任务(压缩类型snappy)

环境配置(dolphinscheduler环境管理配置)

#JAVA
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
#hadoop
export HADOOP_HOME=/home/dolphinscheduler/app/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$PATH
#sqoop
export SQOOP_HOME=/home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$SQOOP_HOME/bin:$PATH
测试问题 任务报错(sqoop缺少mysql驱动包)
[INFO] 2022-03-08 19:22:25.400 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[117] - command : #!/bin/sh
baseDIR=$(cd `dirname $0`; pwd)
cd $baseDIR
#JAVA
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
export PATH=$JAVA_HOME/bin:$PATH
#hadoop
export HADOOP_HOME=/home/dolphinscheduler/app/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$PATH
#sqoop
export SQOOP_HOME=/home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0
export PATH=$SQOOP_HOME/bin:$PATH
sqoop import -D mapred.job.name=importdb -m 1 --connect "jdbc:mysql://192.168.56.10:3306/dolphinscheduler?allowLoadLocalInfile=false&autoDeserialize=false&allowLocalInfile=false&allowUrlInLocalInfile=false" --username ds_user --password "dolphinscheduler" --query "select id,user_name from t_ds_user WHERe $CONDITIONS" --target-dir /home/dolphinscheduler/app/hadoop-2.7.3/data/tmp/dfs/data --compression-codec snappy --as-textfile --delete-target-dir --fields-terminated-by '@' --lines-terminated-by '|' --null-non-string 'NULL' --null-string 'NULL'
...skipping...
        Please set $HCAT_HOME to the root of your HCatalog installation.
        Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
        Please set $ACCUMULO_HOME to the root of your Accumulo installation.
        Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
        Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
        22/03/08 19:22:26 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
        22/03/08 19:22:26 WARN tool.baseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
        22/03/08 19:22:26 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
        22/03/08 19:22:26 INFO tool.CodeGenTool: Beginning code generation
        22/03/08 19:22:26 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
        java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
                at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:875)
                at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
                at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
                at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
                at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
                at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
                at org.apache.sqoop.manager.SqlManager.getColumnTypesForQuery(SqlManager.java:253)
                at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:336)
                at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
                at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
                at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
                at org.apache.sqoop.tool.importTool.importTable(importTool.java:501)
                at org.apache.sqoop.tool.importTool.run(importTool.java:628)
                at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
                at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
                at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
                at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
                at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[INFO] 2022-03-08 19:22:26.407 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[60] - FINALIZE_SESSION

拷贝mysql 驱动包到sqoop lib目录下再次测试,不再包驱动错

[dolphinscheduler@host1 ~]$ cd app/
[dolphinscheduler@host1 app]$ 
[dolphinscheduler@host1 app]$ cp dolphinscheduler/lib/mysql-connector-java-8.0.25.jar  sqoop-1.4.7.bin__hadoop-2.6.0/lib/
[dolphinscheduler@host1 app]$ 
native snappy library not available: this version of libhadoop was built without snappy support
[INFO] 2022-03-08 19:28:06.877 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[178] - process start, process i
d is: 14955
[INFO] 2022-03-08 19:28:07.879 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] -  -> welcome to use bigdat
a scheduling system...
        Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! Hbase imports will fail.
        Please set $Hbase_HOME to the root of your Hbase installation.
        Warning: /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
...skipping...
                at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:422)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
                at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
        
        Container killed by the ApplicationMaster.
        Container killed on request. Exit code is 143
        Container exited with a non-zero exit code 143
        
[INFO] 2022-03-08 19:28:35.939 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] -  -> 22/03/08 19:28:35 INF
O mapreduce.Job: Task Id : attempt_1646732822031_0001_m_000000_2, Status : FAILED
        Error: java.lang.RuntimeException: native snappy library not available: this version of libhadoop was built without snappy support.
                at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:65)
                at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:134)
                at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)
                at org.apache.hadoop.io.compress.CompressionCodec$Util.createOutputStreamWithCodecPool(CompressionCodec.java:131)
                at org.apache.hadoop.io.compress.SnappyCodec.createOutputStream(SnappyCodec.java:99)
                at org.apache.sqoop.mapreduce.RawKeyTextOutputFormat.getRecordWriter(RawKeyTextOutputFormat.java:102)
                at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.(MapTask.java:647)
                at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
                at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
                at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:422)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
                at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
        
        Container killed by the ApplicationMaster.
        Container killed on request. Exit code is 143
        Container exited with a non-zero exit code 143

增加环境变量配置再次验证,问题依然存在

export LD_LIBRARY_PATH=/home/dolphinscheduler/app/hadoop-2.7.3/lib/native
最终解决办法(snappy为false,Hadoop需要重新编译)

通过安装snappy和重新编译Hadoop解决(打开snappy功能),以下为具体过程
安装snappy
下载地址

[dolphinscheduler@host1 app]$ wget http://pkgs.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz
--2022-03-09 14:58:52--  http://pkgs.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz
正在解析主机 pkgs.fedoraproject.org (pkgs.fedoraproject.org)... 38.145.60.17
正在连接 pkgs.fedoraproject.org (pkgs.fedoraproject.org)|38.145.60.17|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Found
位置:https://src.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz [跟随至新的 URL]
--2022-03-09 14:58:52--  https://src.fedoraproject.org/repo/pkgs/snappy/snappy-1.1.1.tar.gz/8887e3b7253b22a31f5486bca3cbc1c2/snappy-1.1.1.tar.gz
正在解析主机 src.fedoraproject.org (src.fedoraproject.org)... 38.145.60.21, 38.145.60.20
正在连接 src.fedoraproject.org (src.fedoraproject.org)|38.145.60.21|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:1777992 (1.7M) [application/x-gzip]
正在保存至: “snappy-1.1.1.tar.gz”

100%[=================================================================================================>] 1,777,992    564KB/s 用时 3.1s   

2022-03-09 14:58:56 (564 KB/s) - 已保存 “snappy-1.1.1.tar.gz” [1777992/1777992])

[dolphinscheduler@host1 app]$ tar xf snappy-1.1.1.tar.gz 
[dolphinscheduler@host1 app]$ cd snappy-1.1.1
[dolphinscheduler@host1 snappy-1.1.1]$ ./configure  
[dolphinscheduler@host1 snappy-1.1.1]$ make 
[dolphinscheduler@host1 snappy-1.1.1]$sudo make install 
[dolphinscheduler@host1 snappy-1.1.1]$ ls -lh /usr/local/lib |grep snappy
-rw-r--r--. 1 root root 323K 3月   9 15:03 libsnappy.a
-rwxr-xr-x. 1 root root  953 3月   9 15:03 libsnappy.la
lrwxrwxrwx. 1 root root   18 3月   9 15:03 libsnappy.so -> libsnappy.so.1.2.0
lrwxrwxrwx. 1 root root   18 3月   9 15:03 libsnappy.so.1 -> libsnappy.so.1.2.0
-rwxr-xr-x. 1 root root 161K 3月   9 15:03 libsnappy.so.1.2.0
[dolphinscheduler@host1 snappy-1.1.1]$ 

安装protobuf

不按照编译会报错,org.apache.maven.plugin.MojoExecutionException: ‘protoc –version’ did not return a version -> [Help 1]

sudo yum -y  install protobuf
sudo yum -y  install protobuf-compiler

安装cmake

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: Execute failed: java.io.IOException: Cannot run program "cmake" (in directory "/home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-common-project/hadoop-common/target/native"): error=2, 没有那个文件或目录
[ERROR] around Ant part ...... @ 4:147 in /home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-common-project/hadoop-common/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-common
[dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum install -y cmake

安装ant

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.7:run (make) on project hadoop-common: An Ant BuildException has occured: exec returned: 1
[ERROR] around Ant part ...... @ 4:147 in /home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-common-project/hadoop-common/target/antrun/build-main.xml
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-common
[dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum install -y ant

还是报错
安装套装工具

[dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum -y groupinstall "Development Tools"

依然报上面的错,安装openssl-devel,最终编译成功

[dolphinscheduler@host1 hadoop-2.7.3-src]$ sudo yum install -y openssl-devel

下载Hadoop源码,安装maven,进行编译(编译前保证前面步骤均已安装完毕)

dolphinscheduler@host1 app]$ wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz 
--2022-03-09 16:13:49--  https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3-src.tar.gz
正在解析主机 archive.apache.org (archive.apache.org)... 138.201.131.134, 2a01:4f8:172:2ec5::2
正在连接 archive.apache.org (archive.apache.org)|138.201.131.134|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:18258529 (17M) [application/x-gzip]
正在保存至: “hadoop-2.7.3-src.tar.gz”

100%[=================================================================================================>] 18,258,529   400KB/s 用时 1m 40s 

2022-03-09 16:15:31 (178 KB/s) - 已保存 “hadoop-2.7.3-src.tar.gz” [18258529/18258529])

[dolphinscheduler@host1 app]$ 
[dolphinscheduler@host1 app]$ sudo yum install maven -y

编译命令

[dolphinscheduler@host1 app]$ tar xf hadoop-2.7.3-src.tar.gz 
[dolphinscheduler@host1 app]$ cd hadoop-2.7.3-src
[dolphinscheduler@host1 hadoop-2.7.3-src]$
[dolphinscheduler@host1 hadoop-2.7.3-src]$ mvn clean package -DskipTests -Pdist,native -Dtar -Dsnappy.lib=/usr/local/lib -Dbundle.snappy

进入编译后的目录,将native下面的文件覆盖到Hadoop的部署目录下

[dolphinscheduler@host1 native]$ pwd
/home/dolphinscheduler/app/hadoop-2.7.3-src/hadoop-dist/target/hadoop-2.7.3/lib/native
[dolphinscheduler@host1 native]$ cp * /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/

再次checknative,snappy状态为true

[dolphinscheduler@host1 native]$ hadoop checknative
22/03/09 17:43:27 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
22/03/09 17:43:27 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libsnappy.so.1
lz4:     true revision:99
bzip2:   false 
openssl: true /lib64/libcrypto.so
[dolphinscheduler@host1 native]$ 

再次验证sqoop snappy压缩类型,终于解决了

Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录
[dolphinscheduler@host1 app]$ hadoop checknative
22/03/08 19:41:52 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
22/03/08 19:41:52 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  false 
lz4:     true revision:99
bzip2:   false 
openssl: false Cannot load libcrypto.so (libcrypto.so: 无法打开共享对象文件: 没有那个文件或目录)!
[dolphinscheduler@host1 app]$ 
[dolphinscheduler@host1 app]$ cd /usr/lib64/
[dolphinscheduler@host1 lib64]$ # 建立软连接
[dolphinscheduler@host1 lib64]$ sudo ln -s libcrypto.so.1.0.2k libcrypto.so
[dolphinscheduler@host1 lib64]$ cd -
/home/dolphinscheduler/app
[dolphinscheduler@host1 app]$ # 再次执行,错误解决
[dolphinscheduler@host1 app]$ hadoop checknative
22/03/08 19:46:28 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
22/03/08 19:46:28 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop:  true /home/dolphinscheduler/app/hadoop-2.7.3/lib/native/libhadoop.so.1.0.0
zlib:    true /lib64/libz.so.1
snappy:  false 
lz4:     true revision:99
bzip2:   false 
openssl: true /lib64/libcrypto.so
[dolphinscheduler@host1 app]$ 

重新定义任务,不指定压缩类型,可以成功


[dolphinscheduler@host1 logs]$ hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - dolphinscheduler supergroup          0 2022-03-08 22:24 /dolphinscheduler
drwxr-xr-x   - dolphin          supergroup          0 2022-03-08 19:28 /home
drwxr-xr-x   - dolphin          supergroup          0 2022-03-09 14:23 /testsqool
drwx------   - dolphin          supergroup          0 2022-03-08 19:28 /tmp
[dolphinscheduler@host1 logs]$ hdfs dfs -ls /testsqool
Found 2 items
-rw-r--r--   3 dolphin supergroup          0 2022-03-09 14:23 /testsqool/_SUCCESS
-rw-r--r--   3 dolphin supergroup          8 2022-03-09 14:23 /testsqool/part-m-00000
[dolphinscheduler@host1 logs]$ hdfs dfs -cat /testsqool/part-m-00000
[dolphinscheduler@host1 logs]$ hdfs dfs -cat /testsqool/part-m-00000
[dolphinscheduler@host1 logs]$ hdfs dfs -cat /testsqool/part-m-00000
1@admin|[dolphinscheduler@host1 logs]$ 
压缩类型选择gzip,可以成功


压缩类型选择lzo,报错Cannot find codec class com.hadoop.compression.lzo.LzoCodec…
[INFO] 2022-03-09 14:36:33.826 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] -  -> 22/03/09 14:36:33 INF
O manager.SqlManager: Executing SQL statement: select id,user_name from t_ds_user WHERe  (1 = 0) 
        22/03/09 14:36:33 INFO manager.SqlManager: Executing SQL statement: select id,user_name from t_ds_user WHERe  (1 = 0) 
        22/03/09 14:36:33 INFO manager.SqlManager: Executing SQL statement: select id,user_name from t_ds_user WHERe  (1 = 0) 
        22/03/09 14:36:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/dolphinscheduler/app/hadoop-2.7.3
[INFO] 2022-03-09 14:36:34.827 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] -  -> 注: /tmp/sqoop-dolphin/compile/3b2082173e1a4e7b76050dab82e32ea8/QueryResult.java使用或覆盖了已过时的 API。
        注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
        22/03/09 14:36:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-dolphin/compile/3b2082173e1a4e7b76050dab82e32ea8/QueryResult.jar
[INFO] 2022-03-09 14:36:35.829 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[66] -  -> 22/03/09 14:36:35 INFO tool.importTool: Destination directory /testsqool deleted.
        22/03/09 14:36:35 INFO mapreduce.importJobbase: Beginning query import.
        22/03/09 14:36:35 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
        22/03/09 14:36:35 ERROR tool.importTool: import failed: com.cloudera.sqoop.io.UnsupportedCodecException: Cannot find codec class com.hadoop.compression.lzo.LzoCodec for codec lzo
                at org.apache.sqoop.io.CodecMap.getCodec(CodecMap.java:111)
                at com.cloudera.sqoop.io.CodecMap.getCodec(CodecMap.java:64)
                at org.apache.sqoop.mapreduce.importJobbase.configureOutputFormat(importJobbase.java:120)
                at org.apache.sqoop.mapreduce.importJobbase.runimport(importJobbase.java:263)
                at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:748)
                at org.apache.sqoop.tool.importTool.importTable(importTool.java:522)
                at org.apache.sqoop.tool.importTool.run(importTool.java:628)
                at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
                at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
                at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
                at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
                at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
        
[INFO] 2022-03-09 14:36:36.013 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[202] - process has exited, exec
ute path:/tmp/dolphinscheduler/exec/process/4640619800832/4762182296320_7/1520/1543, processId:25749 ,exitStatusCode:1 ,processWaitForStatu
s:true ,processExitValue:1
[INFO] 2022-03-09 14:36:36.838 TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.sqoop.SqoopTask:[60] - FINALIZE_SESSION
(END)

下载安装lzo,还是报错

[dolphinscheduler@host1 ~]$ cd app/
[dolphinscheduler@host1 app]$ wget  http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz
--2022-03-09 18:05:18--  http://www.oberhumer.com/opensource/lzo/download/lzo-2.06.tar.gz
正在解析主机 www.oberhumer.com (www.oberhumer.com)... 193.170.194.40
正在连接 www.oberhumer.com (www.oberhumer.com)|193.170.194.40|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:583045 (569K) [application/x-gzip]
正在保存至: “lzo-2.06.tar.gz”

100%[=================================================================================================>] 583,045      421KB/s 用时 1.4s   

2022-03-09 18:05:20 (421 KB/s) - 已保存 “lzo-2.06.tar.gz” [583045/583045])

[dolphinscheduler@host1 app]$ tar xf lzo-2.06.tar.gz 
[dolphinscheduler@host1 app]$ cd lzo-2.06
[dolphinscheduler@host1 lzo-2.06]$ ./configure --enable-shared
[dolphinscheduler@host1 lzo-2.06]$ make 
[dolphinscheduler@host1 lzo-2.06]$ sudo make install

修改hadoop 配置也依然报错,真正功能貌似是配置lzo后缀用到的


     io.compression.codec.lzo.class
     com.hadoop.compression.lzo.LzoCodec


最终解决(就是单纯的缺少jar包,兜俩圈了)

下载hadoop-lzo源码

解压编译,找到对应jar包,拷贝到sqoop的lib下面

[dolphinscheduler@host1 app]$ tar xf hadoop-lzo-release-0.4.20.tar.gz 
[dolphinscheduler@host1 app]$ cd hadoop-lzo-release-0.4.20
[dolphinscheduler@host1 hadoop-lzo-release-0.4.20]$ mvn clean package -Dmaven.test.skip=true
...
[INFO] Building jar: /home/dolphinscheduler/app/hadoop-lzo-release-0.4.20/target/hadoop-lzo-0.4.20-javadoc.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:07.479s
[INFO] Finished at: Wed Mar 09 18:40:09 CST 2022
[INFO] Final Memory: 28M/149M
[INFO] ------------------------------------------------------------------------
[dolphinscheduler@host1 hadoop-lzo-release-0.4.20]$ 
[dolphinscheduler@host1 hadoop-lzo-release-0.4.20]$ cd target/
[dolphinscheduler@host1 target]$ ll
总用量 436
drwxr-xr-x. 2 dolphinscheduler dolphin   4096 3月   9 18:39 antrun
drwxr-xr-x. 4 dolphinscheduler dolphin   4096 3月   9 18:40 apidocs
drwxr-xr-x. 5 dolphinscheduler dolphin     66 3月   9 18:39 classes
drwxr-xr-x. 3 dolphinscheduler dolphin     25 3月   9 18:39 generated-sources
-rw-r--r--. 1 dolphinscheduler dolphin 188788 3月   9 18:39 hadoop-lzo-0.4.20.jar
-rw-r--r--. 1 dolphinscheduler dolphin 191960 3月   9 18:40 hadoop-lzo-0.4.20-javadoc.jar
-rw-r--r--. 1 dolphinscheduler dolphin  51992 3月   9 18:40 hadoop-lzo-0.4.20-sources.jar
drwxr-xr-x. 2 dolphinscheduler dolphin     71 3月   9 18:40 javadoc-bundle-options
drwxr-xr-x. 2 dolphinscheduler dolphin     28 3月   9 18:39 maven-archiver
drwxr-xr-x. 3 dolphinscheduler dolphin     28 3月   9 18:39 native
drwxr-xr-x. 4 dolphinscheduler dolphin     54 3月   9 18:39 test-classes
[dolphinscheduler@host1 target]$ cp hadoop-lzo-0.4.20.jar /home/dolphinscheduler/app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/
[dolphinscheduler@host1 target]$ 
[dolphinscheduler@host1 target]$ hdfs dfs -ls /testsqool
Found 2 items
-rw-r--r--   3 dolphin supergroup          0 2022-03-09 18:46 /testsqool/_SUCCESS
-rw-r--r--   3 dolphin supergroup         20 2022-03-09 18:46 /testsqool/part-m-00000.lzo_deflate
[dolphinscheduler@host1 target]$ 

MR(MapReduce) 任务

找到Hadoop自带的案例jar包,上传到资源管理

[dolphinscheduler@host1 mapreduce]$ cd 
[dolphinscheduler@host1 ~]$ cd app/hadoop-2.7.3/
[dolphinscheduler@host1 hadoop-2.7.3]$ find . -name hadoop-mapreduce-examples-2.7.3.jar
./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
[dolphinscheduler@host1 hadoop-2.7.3]$ sz ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar

创建任务

创建源文件目录,上传测试文件

[dolphinscheduler@host1 mapreduce]$ hdfs dfs -mkdir -p /testMr/in
[dolphinscheduler@host1 mapreduce]$ ll
总用量 4972
-rw-r--r--. 1 dolphinscheduler dolphin  537521 8月  18 2016 hadoop-mapreduce-client-app-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin  773501 8月  18 2016 hadoop-mapreduce-client-common-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin 1554595 8月  18 2016 hadoop-mapreduce-client-core-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin  189714 8月  18 2016 hadoop-mapreduce-client-hs-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin   27598 8月  18 2016 hadoop-mapreduce-client-hs-plugins-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin   61745 8月  18 2016 hadoop-mapreduce-client-jobclient-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin 1551594 8月  18 2016 hadoop-mapreduce-client-jobclient-2.7.3-tests.jar
-rw-r--r--. 1 dolphinscheduler dolphin   71310 8月  18 2016 hadoop-mapreduce-client-shuffle-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin  295812 8月  18 2016 hadoop-mapreduce-examples-2.7.3.jar
drwxr-xr-x. 2 dolphinscheduler dolphin    4096 8月  18 2016 lib
drwxr-xr-x. 2 dolphinscheduler dolphin      30 8月  18 2016 lib-examples
drwxr-xr-x. 2 dolphinscheduler dolphin    4096 8月  18 2016 sources
[dolphinscheduler@host1 mapreduce]$ ll >> testWord.txt
[dolphinscheduler@host1 mapreduce]$ hdfs dfs -put testWord.txt  /testMr/in
[dolphinscheduler@host1 mapreduce]$ 

测试结果,除了sqoop,好像其它任务都挺顺利的

具体单词统计结果如下,比如jar包名都是一次

[dolphinscheduler@host1 mapreduce]$ hdfs dfs -ls /testMr/out
Found 2 items
-rw-r--r--   3 dolphin supergroup          0 2022-03-10 17:46 /testMr/out/_SUCCESS
-rw-r--r--   3 dolphin supergroup        662 2022-03-10 17:46 /testMr/out/part-r-00000
[dolphinscheduler@host1 mapreduce]$ hdfs dfs -cat /testMr/out/part-r-00000
-rw-r--r--.	10
0	1
1	10
10	1
1551594	1
1554595	1
17:43	1
18	12
189714	1
2	3
2016	12
27598	1
295812	1
30	1
3月	1
4096	2
4972	1
537521	1
61745	1
71310	1
773501	1
8月	12
dolphin	13
dolphinscheduler	13
drwxr-xr-x.	3
hadoop-mapreduce-client-app-2.7.3.jar	1
hadoop-mapreduce-client-common-2.7.3.jar	1
hadoop-mapreduce-client-core-2.7.3.jar	1
hadoop-mapreduce-client-hs-2.7.3.jar	1
hadoop-mapreduce-client-hs-plugins-2.7.3.jar	1
hadoop-mapreduce-client-jobclient-2.7.3-tests.jar	1
hadoop-mapreduce-client-jobclient-2.7.3.jar	1
hadoop-mapreduce-client-shuffle-2.7.3.jar	1
hadoop-mapreduce-examples-2.7.3.jar	1
lib	1
lib-examples	1
sources	1
testWord.txt	1
总用量	1
[dolphinscheduler@host1 mapreduce]$ cat testWord.txt 
总用量 4972
-rw-r--r--. 1 dolphinscheduler dolphin  537521 8月  18 2016 hadoop-mapreduce-client-app-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin  773501 8月  18 2016 hadoop-mapreduce-client-common-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin 1554595 8月  18 2016 hadoop-mapreduce-client-core-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin  189714 8月  18 2016 hadoop-mapreduce-client-hs-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin   27598 8月  18 2016 hadoop-mapreduce-client-hs-plugins-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin   61745 8月  18 2016 hadoop-mapreduce-client-jobclient-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin 1551594 8月  18 2016 hadoop-mapreduce-client-jobclient-2.7.3-tests.jar
-rw-r--r--. 1 dolphinscheduler dolphin   71310 8月  18 2016 hadoop-mapreduce-client-shuffle-2.7.3.jar
-rw-r--r--. 1 dolphinscheduler dolphin  295812 8月  18 2016 hadoop-mapreduce-examples-2.7.3.jar
drwxr-xr-x. 2 dolphinscheduler dolphin    4096 8月  18 2016 lib
drwxr-xr-x. 2 dolphinscheduler dolphin      30 8月  18 2016 lib-examples
drwxr-xr-x. 2 dolphinscheduler dolphin    4096 8月  18 2016 sources
-rw-r--r--. 1 dolphinscheduler dolphin       0 3月  10 17:43 testWord.txt
[dolphinscheduler@host1 mapreduce]$ 

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/761373.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号