栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

dolphinscheduler涉及HDFS功能测试(一)环境准备

dolphinscheduler涉及HDFS功能测试(一)环境准备

dolphinscheduler和HDFS

测试环境配置Hadoop伪分布部署

免密登录

免密无效

查看日志解决 部署Hadoop

hadoop-env.shcore-site.xmlhdfs-site.xmlmapred-site.xmlyarn-site.xmlslaves 配置HADOOP环境变量格式化NameNode启动 总结

测试环境配置
centos7 虚拟机
dolphinscheduler 2.0.5
MySQL数据库
hadoop伪分布模式(需要hdfs)
[dolphinscheduler@host1 ~]$ java -version
java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)
[dolphinscheduler@host1 ~]$ python --version
Python 2.7.5
[dolphinscheduler@host1 ~]$ 
Hadoop伪分布部署

海豚里面hadoop jar包是2.7.3版本

选择对应版本,下载地址

或者通过wget命令下载

wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz
免密登录

ssh-keygen -t rsa生成.ssh目录及公钥私钥
公钥写入authorized_keys文件
也可以直接使用ssh-copy-id localhost
ssh 登录验证,第一次需要输入密码

[dolphinscheduler@host1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/dolphinscheduler/.ssh/id_rsa): 
Created directory '/home/dolphinscheduler/.ssh'.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/dolphinscheduler/.ssh/id_rsa.
Your public key has been saved in /home/dolphinscheduler/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:lwcGY8SCzQG+OSwJbfokOd4+ljnpM4lT/q/2VrG5hbM dolphinscheduler@host1
The key's randomart image is:
+---[RSA 2048]----+
|    .=.+=        |
| . .. +..o       |
|. o .  .  o      |
| = o o   o o     |
|= + =   S B .    |
|.=.o .   B o     |
| .=.=   . =      |
| o.% . . E       |
|  +oBo=o         |
+----[SHA256]-----+
[dolphinscheduler@host1 ~]$ cd .ssh/
[dolphinscheduler@host1 .ssh]$ cat id_rsa.pub >> authorized_keys
[dolphinscheduler@host1 .ssh]$ ll
总用量 12
-rw-r--r--. 1 dolphinscheduler dolphin  404 3月   8 17:07 authorized_keys
-rw-------. 1 dolphinscheduler dolphin 1679 3月   8 17:05 id_rsa
-rw-r--r--. 1 dolphinscheduler dolphin  404 3月   8 17:05 id_rsa.pub
[dolphinscheduler@host1 .ssh]$ chmod 600 authorized_keys
[dolphinscheduler@host1 .ssh]$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:3a9OTk4w9slOAnXL2gLp4FxNga/wB4nR/9Ojh0n1+lY.
ECDSA key fingerprint is MD5:06:79:53:e3:fe:35:bf:a9:11:6f:1d:b7:f8:87:88:a8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Tue Mar  8 16:53:16 2022 from localhost
[dolphinscheduler@host1 ~]$ 登出
Connection to localhost closed.
[dolphinscheduler@host1 .ssh]$ ssh localhost
Last login: Tue Mar  8 17:07:49 2022 from localhost
[dolphinscheduler@host1 ~]$ 

免密无效 查看日志
sudo cat /var/log/secure

Authentication refused

Mar  8 16:42:23 host1 sshd[8030]: pam_unix(sshd:session): session closed for user dolphinscheduler
Mar  8 16:42:28 host1 sshd[8050]: reprocess config line 142: Deprecated option RSAAuthentication
Mar  8 16:42:28 host1 sshd[8050]: Authentication refused: bad ownership or modes for directory /home/dolphinscheduler
Mar  8 16:42:30 host1 sshd[8050]: Accepted password for dolphinscheduler from 127.0.0.1 port 33578 ssh2
Mar  8 16:42:31 host1 sshd[8050]: pam_unix(sshd:session): session opened for user dolphinscheduler by (uid=0)
Mar  8 16:42:32 host1 sshd[8054]: Received disconnect from 127.0.0.1 port 33578:11: disconnected by user
Mar  8 16:42:32 host1 sshd[8054]: Disconnected from 127.0.0.1 port 33578
Mar  8 16:42:32 host1 sshd[8050]: pam_unix(sshd:session): session closed for user dolphinscheduler
Mar  8 16:42:33 host1 sshd[8075]: reprocess config line 142: Deprecated option RSAAuthentication
Mar  8 16:42:33 host1 sshd[8075]: Authentication refused: bad ownership or modes for directory /home/dolphinscheduler
Mar  8 16:42:34 host1 sshd[8075]: Connection closed by 127.0.0.1 port 33580 [preauth]
Mar  8 16:42:49 host1 sudo: dolphinscheduler : TTY=pts/4 ; PWD=/home/dolphinscheduler ; USER=root ; COMMAND=/bin/cat /var/log/secure

解决

sshd_config配置参数StrictModes改为no,重启服务

[dolphinscheduler@host1 ~]$ sudo vi /etc/ssh/sshd_config 
[dolphinscheduler@host1 ~]$ sudo systemctl restart sshd.service
[dolphinscheduler@host1 ~]$ sudo grep StrictModes /etc/ssh/sshd_config 
StrictModes no
[dolphinscheduler@host1 ~]$ 
部署Hadoop

###解压修改配置文件

hadoop-env.sh
[dolphinscheduler@host1 hadoop]$ vi hadoop-env.sh 
[dolphinscheduler@host1 hadoop]$ grep JAVA_HOME hadoop-env.sh 
# The only required environment variable is JAVA_HOME.  All others are
# set JAVA_HOME in this file, so that it is correctly defined on
export JAVA_HOME=/usr/local/java/jdk1.8.0_151
[dolphinscheduler@host1 hadoop]$ 

core-site.xml
[dolphinscheduler@host1 app]$ cd hadoop-2.7.3
[dolphinscheduler@host1 hadoop-2.7.3]$ pwd
/home/dolphinscheduler/app/hadoop-2.7.3
[dolphinscheduler@host1 hadoop-2.7.3]$ mkdir -p data/tmp
[dolphinscheduler@host1 hadoop-2.7.3]$ cd data/tmp/
[dolphinscheduler@host1 tmp]$ pwd
/home/dolphinscheduler/app/hadoop-2.7.3/data/tmp
[dolphinscheduler@host1 tmp]$ cd ../../etc/hadoop/
[dolphinscheduler@host1 hadoop]$ pwd
/home/dolphinscheduler/app/hadoop-2.7.3/etc/hadoop
[dolphinscheduler@host1 hadoop]$ 
[dolphinscheduler@host1 hadoop]$ vi core-site.xml
[dolphinscheduler@host1 hadoop]$ cat core-site.xml 









    fs.defaultFS
    hdfs://host1:8020




     hadoop.tmp.dir
     /home/dolphinscheduler/app/hadoop-2.7.3/data/tmp


[dolphinscheduler@host1 hadoop]$ 

host1对应配置

[dolphinscheduler@host1 hadoop]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.10   host1

[dolphinscheduler@host1 hadoop]$ 
hdfs-site.xml
[dolphinscheduler@host1 hadoop]$ vi hdfs-site.xml 
[dolphinscheduler@host1 hadoop]$ 
[dolphinscheduler@host1 hadoop]$ cat hdfs-site.xml 







    
    
        dfs.replication
        3
    
    
    
        dfs.permissions.enabled
        false
    
    
        dfs.namenode.http.address
        host1:50070
    

[dolphinscheduler@host1 hadoop]$ 

mapred-site.xml
[dolphinscheduler@host1 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[dolphinscheduler@host1 hadoop]$ vi mapred-site.xml
[dolphinscheduler@host1 hadoop]$ cat mapred-site.xml








	 	        		
    mapreduce.framework.name
    yarn


[dolphinscheduler@host1 hadoop]$
yarn-site.xml
[dolphinscheduler@host1 hadoop]$ vi yarn-site.xml 
[dolphinscheduler@host1 hadoop]$ cat yarn-site.xml 





 
          yarn.nodemanager.aux-services
          mapreduce_shuffle
  


[dolphinscheduler@host1 hadoop]$ 

slaves

伪分布式,都是同一台机器host1

[dolphinscheduler@host1 hadoop]$ vi slaves
[dolphinscheduler@host1 hadoop]$ cat slaves 
host1
[dolphinscheduler@host1 hadoop]$ ssh host1
The authenticity of host 'host1 (192.168.56.10)' can't be established.
ECDSA key fingerprint is SHA256:3a9OTk4w9slOAnXL2gLp4FxNga/wB4nR/9Ojh0n1+lY.
ECDSA key fingerprint is MD5:06:79:53:e3:fe:35:bf:a9:11:6f:1d:b7:f8:87:88:a8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'host1,192.168.56.10' (ECDSA) to the list of known hosts.
Last login: Tue Mar  8 17:07:52 2022 from localhost
[dolphinscheduler@host1 ~]$ 登出
Connection to host1 closed.
[dolphinscheduler@host1 hadoop]$ ssh host1
Last login: Tue Mar  8 17:30:10 2022 from host1
[dolphinscheduler@host1 ~]$ 
配置HADOOP环境变量

部署用户用的dolphinscheduler,配置文件对应.bash_profile

[dolphinscheduler@host1 ~]$ vi .bash_profile 
[dolphinscheduler@host1 ~]$ grep HADOOP .bash_profile 
export HADOOP_HOME=/home/dolphinscheduler/app/hadoop-2.7.3
export PATH=$HADOOP_HOME/bin:$PATH
[dolphinscheduler@host1 ~]$ source .bash_profile 
[dolphinscheduler@host1 ~]$ hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /home/dolphinscheduler/app/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.jar
[dolphinscheduler@host1 ~]$ 

格式化NameNode
[dolphinscheduler@host1 ~]$ hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

22/03/08 17:36:45 INFO namenode.NameNode: STARTUP_MSG: 

22/03/08 17:36:45 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
22/03/08 17:36:45 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-fdc4e4af-4153-4b72-8cb5-ffdee8693e48
22/03/08 17:36:45 INFO namenode.FSNamesystem: No KeyProvider found.
22/03/08 17:36:45 INFO namenode.FSNamesystem: fsLock is fair:true
22/03/08 17:36:45 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
22/03/08 17:36:45 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
22/03/08 17:36:45 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
22/03/08 17:36:45 INFO blockmanagement.BlockManager: The block deletion will start around 2022 三月 08 17:36:45
22/03/08 17:36:45 INFO util.GSet: Computing capacity for map BlocksMap
22/03/08 17:36:45 INFO util.GSet: VM type       = 64-bit
22/03/08 17:36:45 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
22/03/08 17:36:45 INFO util.GSet: capacity      = 2^21 = 2097152 entries
22/03/08 17:36:45 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
22/03/08 17:36:45 INFO blockmanagement.BlockManager: defaultReplication         = 3
22/03/08 17:36:45 INFO blockmanagement.BlockManager: maxReplication             = 512
22/03/08 17:36:45 INFO blockmanagement.BlockManager: minReplication             = 1
22/03/08 17:36:45 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
22/03/08 17:36:45 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
22/03/08 17:36:45 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
22/03/08 17:36:45 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
22/03/08 17:36:45 INFO namenode.FSNamesystem: fsOwner             = dolphinscheduler (auth:SIMPLE)
22/03/08 17:36:45 INFO namenode.FSNamesystem: supergroup          = supergroup
22/03/08 17:36:45 INFO namenode.FSNamesystem: isPermissionEnabled = false
22/03/08 17:36:45 INFO namenode.FSNamesystem: HA Enabled: false
22/03/08 17:36:45 INFO namenode.FSNamesystem: Append Enabled: true
22/03/08 17:36:45 INFO util.GSet: Computing capacity for map INodeMap
22/03/08 17:36:45 INFO util.GSet: VM type       = 64-bit
22/03/08 17:36:45 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
22/03/08 17:36:45 INFO util.GSet: capacity      = 2^20 = 1048576 entries
22/03/08 17:36:45 INFO namenode.FSDirectory: ACLs enabled? false
22/03/08 17:36:45 INFO namenode.FSDirectory: XAttrs enabled? true
22/03/08 17:36:45 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
22/03/08 17:36:45 INFO namenode.NameNode: Caching file names occuring more than 10 times
22/03/08 17:36:45 INFO util.GSet: Computing capacity for map cachedBlocks
22/03/08 17:36:45 INFO util.GSet: VM type       = 64-bit
22/03/08 17:36:45 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
22/03/08 17:36:45 INFO util.GSet: capacity      = 2^18 = 262144 entries
22/03/08 17:36:45 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
22/03/08 17:36:45 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
22/03/08 17:36:45 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
22/03/08 17:36:45 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
22/03/08 17:36:45 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
22/03/08 17:36:45 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
22/03/08 17:36:45 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
22/03/08 17:36:45 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
22/03/08 17:36:45 INFO util.GSet: Computing capacity for map NameNodeRetryCache
22/03/08 17:36:45 INFO util.GSet: VM type       = 64-bit
22/03/08 17:36:45 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
22/03/08 17:36:45 INFO util.GSet: capacity      = 2^15 = 32768 entries
22/03/08 17:36:46 INFO namenode.FSImage: Allocated new BlockPoolId: BP-866347120-192.168.56.10-1646732206003
22/03/08 17:36:46 INFO common.Storage: Storage directory /home/dolphinscheduler/app/hadoop-2.7.3/data/tmp/dfs/name has been successfully formatted.
22/03/08 17:36:46 INFO namenode.FSImageFormatProtobuf: Saving image file /home/dolphinscheduler/app/hadoop-2.7.3/data/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
22/03/08 17:36:46 INFO namenode.FSImageFormatProtobuf: Image file /home/dolphinscheduler/app/hadoop-2.7.3/data/tmp/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 362 bytes saved in 0 seconds.
22/03/08 17:36:46 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
22/03/08 17:36:46 INFO util.ExitUtil: Exiting with status 0
22/03/08 17:36:46 INFO namenode.NameNode: SHUTDOWN_MSG: 

[dolphinscheduler@host1 ~]$ 

启动
[dolphinscheduler@host1 sbin]$ pwd
/home/dolphinscheduler/app/hadoop-2.7.3/sbin
[dolphinscheduler@host1 sbin]$ sh start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [host1]
host1: starting namenode, logging to /home/dolphinscheduler/app/hadoop-2.7.3/logs/hadoop-dolphinscheduler-namenode-host1.out
host1: starting datanode, logging to /home/dolphinscheduler/app/hadoop-2.7.3/logs/hadoop-dolphinscheduler-datanode-host1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/dolphinscheduler/app/hadoop-2.7.3/logs/hadoop-dolphinscheduler-secondarynamenode-host1.out
starting yarn daemons
starting resourcemanager, logging to /home/dolphinscheduler/app/hadoop-2.7.3/logs/yarn-dolphinscheduler-resourcemanager-host1.out
host1: starting nodemanager, logging to /home/dolphinscheduler/app/hadoop-2.7.3/logs/yarn-dolphinscheduler-nodemanager-host1.out
[dolphinscheduler@host1 sbin]$ jps
11267 DataNode
11443 SecondaryNameNode
11735 NodeManager
11624 ResourceManager
12042 Jps
11135 NameNode
[dolphinscheduler@host1 sbin]$ 

总结

公司目前用到dolphin scheduler,都是简单的存储过程、datax任务,涉及HDFS功能的都未使用到,因此先本地部署测试一下,相当于未雨绸缪了,以后估计也会用到,由于CSDN对篇幅的要求,涉及HDFS的相关功能,见dolphinscheduler涉及HDFS功能测试(二)

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/761385.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号