栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Hadoop-3.1.3部署

Hadoop-3.1.3部署

1.准备3台机器  2核2G即可

ip配置
192.168.56.2112核2G即可
192.168.56.212     2核2G即可
192.168.56.2132核2G即可

在每台机器配置 /etc/hosts

添加:(192.168.56.211(master节点),192.168.56.212(node1节点),192.168.56.213(node2节点))每台机器都添加

修改每台机器名字

[root@master~]# vim /etc/hostname
master
[root@master ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.56.211 master
192.168.56.212 node1
192.168.56.213 node2

[root@master ~]# 

图例:

 

2.3台机器配置免密登录

ssh-keygen -t rsa
#一路回车即可 然后分发到每台机器,包括当前机器
ssh-copy-id master
ssh-copy-id node1
ssh-copy-id node2
#注意:
还需要在 master上采用 root 账号配置一下无密登录到 master、node1、
node2服务器上。
还需要在 node1上采用 root 账号配置一下无密登录到 master、node1、
node2服务器上。
还需要在 node2上采用 root 账号,配置一下无密登录到 master、node1、
node2;

如:

3. 在/opt 目录下创建 module、software 文件夹

[root@master~]# mkdir /opt/module
[root@master~]# mkdir /opt/software

4.下载链接:(Hadoop3.1.3下载路径)

https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/

5.每台机器都要配置java环境变量

如:(/opt/jdk/jdk1.8.0_202  环境变量下面会用到)

[root@master ~]# echo $JAVA_HOME
/opt/jdk/jdk1.8.0_202
[root@master ~]# java -version
java version "1.8.0_202"
Java(TM) SE Runtime Environment (build 1.8.0_202-b08)
Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
[root@master ~]# 

6.把hadoop上传到linux目录:(每台机器都需要配置环境变量)

1)进入到 Hadoop 安装包路径下
[root@master ~]$ cd /opt/software/
2)解压安装文件到/opt/module 下面
[root@master software]$ tar -zxvf hadoop-3.1.3.tar.gz -C 
/opt/module/
3)查看是否解压成功
[root@master software]$ ls /opt/module/
hadoop-3.1.3
4)将 Hadoop 添加到环境变量
(1)获取 Hadoop 安装路径
[root@master hadoop-3.1.3]$ pwd
/opt/module/hadoop-3.1.3
(2)打开/etc/profile.d/my_env.sh 文件
[root@master hadoop-3.1.3]$ sudo vim
/etc/profile.d/my_env.sh
在 my_env.sh 文件末尾添加如下内容:(shift+g)
#HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
➢ 保存并退出: :wq
(3)让修改后的文件生效
[root@master hadoop-3.1.3]$ source /etc/profile

测试是否安装成功
[root@master hadoop-3.1.3]$ hadoop version
Hadoop 3.1.3
5)重启(如果 Hadoop 命令不能用再重启虚拟机)
[root@master hadoop-3.1.3]$ sudo reboot

7.运行WordCount示例:

7.1 本地运行模式(官方 WordCount)
1)创建在 hadoop-3.1.3 文件下面创建一个 wcinput 文件夹
[root@master hadoop-3.1.3]$ mkdir wcinput
2)在 wcinput 文件下创建一个 word.txt 文件
[root@master hadoop-3.1.3]$ cd wcinput
3)编辑 word.txt 文件
[root@master wcinput]$ vim word.txt
➢ 在文件中输入如下内容
hadoop yarn
hadoop mapreduce
➢ 保存退出::wq
4)回到 Hadoop 目录/opt/module/hadoop-3.1.3
5)执行程序
[root@master hadoop-3.1.3]$ hadoop jar 
share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar 
wordcount wcinput wcoutput
6)查看结果
[root@master hadoop-3.1.3]$ cat wcoutput/part-r-00000
看到如下结果:
hadoop 2
mapreduce 1
yarn 1

8.1)集群部署规划
注意:
➢ NameNode 和 SecondaryNameNode 不要安装在同一台服务器
➢ ResourceManager 也很消耗内存,不要和 NameNode、SecondaryNameNode 配置在
同一台机器上。

master          node1node2
HDFSNameNode
DataNode 
DataNodeSecondaryNameNode
DataNode
YARNNodeManagerResourceManager
NodeManager 
NodeManager


2)配置文件说明
Hadoop 配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认
配置值时,才需要修改自定义配置文件,更改相应属性值。
(1)默认配置文件:
要获取的默认文件 文件存放在 Hadoop 的 jar 包中的位置
[core-default.xml] hadoop-common-3.1.3.jar/core-default.xml
[hdfs-default.xml] hadoop-hdfs-3.1.3.jar/hdfs-default.xml
[yarn-default.xml] hadoop-yarn-common-3.1.3.jar/yarn-default.xml
[mapred-default.xml] hadoop-mapreduce-client-core-3.1.3.jar/mapred-default.xml
(2)自定义配置文件:
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml 四个配置文件存放在
$HADOOP_HOME/etc/hadoop 这个路径上,用户可以根据项目需求重新进行修改配置。
3)配置集群
(1)核心配置文件
配置 core-site.xml

[root@master ~]$ cd $HADOOP_HOME/etc/hadoop
[root@master hadoop]$ vim core-site.xml

文件如下:







 
 fs.defaultFS
 hdfs://master:8020
 
 
 
 hadoop.tmp.dir
 /opt/module/hadoop-3.1.3/data
 
 
 
 hadoop.http.staticuser.user
 root
 

(2)HDFS 配置文件
配置 hdfs-site.xml

[root@master hadoop]$ vim hdfs-site.xml


文件内容如下:



 




 dfs.namenode.http-address
 master:9870
 

 
 dfs.namenode.secondary.http-address
 node2:9868
 

(3)YARN 配置文件
配置 yarn-site.xml

[root@master hadoop]$ vim yarn-site.xml


文件内容如下:







 
 yarn.nodemanager.aux-services
 mapreduce_shuffle
 
 
 
 yarn.resourcemanager.hostname
 node1
 
 
 
 yarn.nodemanager.env-whitelist
 
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO
NF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAP
RED_HOME
 

 

 yarn.log-aggregation-enable
 true


 
 yarn.log.server.url 
 http://master:19888/jobhistory/logs



 yarn.log-aggregation.retain-seconds
 604800




(4)MapReduce 配置文件

配置 mapred-site.xml

[root@master hadoop]$ vim mapred-site.xml

文件内容如下:







 
 mapreduce.framework.name
 yarn
 

  yarn.app.mapreduce.am.env
  HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3


  mapreduce.map.env
  HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3


  mapreduce.reduce.env
  HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3




 mapreduce.jobhistory.address
 master:10020



 mapreduce.jobhistory.webapp.address
 master:19888




1)配置 workers

[root@master hadoop]$ vim /opt/module/hadoop3.1.3/etc/hadoop/workers


在该文件中增加如下内容:

master
node1
node2

编辑 hadoop-env.sh 添加 java运行环境目录:

export JAVA_HOME=/opt/jdk/jdk1.8.0_202

hadoop-3.1.3sbin start-dfs.sh      stop-dfs.sh 添加root用户启动

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

 

hadoop-3.1.3sbin start-yarn.sh      stop-yarn.sh 添加root用户启动

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

 

2)启动集群
(1)如果集群是第一次启动,需要在 master节点格式化 NameNode(注意:格式
化 NameNode,会产生新的集群 id,导致 NameNode 和 DataNode 的集群 id 不一致,集群找
不到已往数据。如果集群在运行过程中报错,需要重新格式化 NameNode 的话,一定要先停
止 namenode 和 datanode 进程,并且要删除所有机器的 data 和 logs 目录,然后再进行格式
化。)

[root@master hadoop-3.1.3]$ hdfs namenode -format


(2)启动 HDFS

[root@master hadoop-3.1.3]$ sbin/start-dfs.sh


(3)在配置了 ResourceManager 的节点(node1)启动 YARN

[root@node1  hadoop-3.1.3]$ sbin/start-yarn.sh

3)在 master 启动历史服务器
 

[root@master hadoop-3.1.3]$ mapred --daemon start historyserver


(4)Web 端查看 HDFS 的 NameNode
(a)浏览器中输入:http://hadoop100:9870
(b)查看 HDFS 上存储的数据信息
(5)Web 端查看 YARN 的 ResourceManager
(a)浏览器中输入:http://hadoop101:8088
(b)查看 YARN 上运行的 Job 信息

windows需要配置映射:(C:WindowsSystem32driversetc)修改HOSTS文件添加如下内容

192.168.56.211 hadoop100
192.168.56.212 hadoop101
192.168.56.213 hadoop102

192.168.56.211 master
192.168.56.211 06762320a7eb
192.168.56.212 node1
192.168.56.213 node2

master:命令如下:

drwxr-xr-x 9 1000 1000 149 9月  12 2019 hadoop-3.1.3
[root@master module]# cd hadoop-3.1.3/
[root@master hadoop-3.1.3]# hdfs namenode -format
WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
2021-12-10 10:20:44,056 INFO namenode.NameNode: STARTUP_MSG: 

2021-12-10 10:20:44,082 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
2021-12-10 10:20:44,370 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-10ef8d8c-2b08-40e9-a2f0-6a5e0647fc88
2021-12-10 10:20:45,810 INFO namenode.FSEditLog: Edit logging is async:true
2021-12-10 10:20:45,828 INFO namenode.FSNamesystem: KeyProvider: null
2021-12-10 10:20:45,829 INFO namenode.FSNamesystem: fsLock is fair: true
2021-12-10 10:20:45,829 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
2021-12-10 10:20:45,901 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
2021-12-10 10:20:45,902 INFO namenode.FSNamesystem: supergroup          = supergroup
2021-12-10 10:20:45,902 INFO namenode.FSNamesystem: isPermissionEnabled = true
2021-12-10 10:20:45,902 INFO namenode.FSNamesystem: HA Enabled: false
2021-12-10 10:20:45,993 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
2021-12-10 10:20:46,003 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
2021-12-10 10:20:46,006 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
2021-12-10 10:20:46,011 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
2021-12-10 10:20:46,012 INFO blockmanagement.BlockManager: The block deletion will start around 2021 十二月 10 10:20:46
2021-12-10 10:20:46,013 INFO util.GSet: Computing capacity for map BlocksMap
2021-12-10 10:20:46,014 INFO util.GSet: VM type       = 64-bit
2021-12-10 10:20:46,048 INFO util.GSet: 2.0% max memory 443 MB = 8.9 MB
2021-12-10 10:20:46,048 INFO util.GSet: capacity      = 2^20 = 1048576 entries
2021-12-10 10:20:46,054 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
2021-12-10 10:20:46,062 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISEConDS
2021-12-10 10:20:46,062 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
2021-12-10 10:20:46,062 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
2021-12-10 10:20:46,062 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
2021-12-10 10:20:46,062 INFO blockmanagement.BlockManager: defaultReplication         = 3
2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: maxReplication             = 512
2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: minReplication             = 1
2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
2021-12-10 10:20:46,118 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
2021-12-10 10:20:46,136 INFO util.GSet: Computing capacity for map INodeMap
2021-12-10 10:20:46,136 INFO util.GSet: VM type       = 64-bit
2021-12-10 10:20:46,136 INFO util.GSet: 1.0% max memory 443 MB = 4.4 MB
2021-12-10 10:20:46,136 INFO util.GSet: capacity      = 2^19 = 524288 entries
2021-12-10 10:20:46,137 INFO namenode.FSDirectory: ACLs enabled? false
2021-12-10 10:20:46,137 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
2021-12-10 10:20:46,137 INFO namenode.FSDirectory: XAttrs enabled? true
2021-12-10 10:20:46,137 INFO namenode.NameNode: Caching file names occurring more than 10 times
2021-12-10 10:20:46,147 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
2021-12-10 10:20:46,149 INFO snapshot.SnapshotManager: SkipList is disabled
2021-12-10 10:20:46,155 INFO util.GSet: Computing capacity for map cachedBlocks
2021-12-10 10:20:46,155 INFO util.GSet: VM type       = 64-bit
2021-12-10 10:20:46,155 INFO util.GSet: 0.25% max memory 443 MB = 1.1 MB
2021-12-10 10:20:46,155 INFO util.GSet: capacity      = 2^17 = 131072 entries
2021-12-10 10:20:46,163 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
2021-12-10 10:20:46,163 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
2021-12-10 10:20:46,163 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
2021-12-10 10:20:46,168 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
2021-12-10 10:20:46,168 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
2021-12-10 10:20:46,170 INFO util.GSet: Computing capacity for map NameNodeRetryCache
2021-12-10 10:20:46,170 INFO util.GSet: VM type       = 64-bit
2021-12-10 10:20:46,170 INFO util.GSet: 0.029999999329447746% max memory 443 MB = 136.1 KB
2021-12-10 10:20:46,170 INFO util.GSet: capacity      = 2^14 = 16384 entries
2021-12-10 10:20:46,207 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1937585949-192.168.56.211-1639102846200
2021-12-10 10:20:46,233 INFO common.Storage: Storage directory /opt/module/hadoop-3.1.3/data/dfs/name has been successfully formatted.
2021-12-10 10:20:46,271 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/module/hadoop-3.1.3/data/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2021-12-10 10:20:46,363 INFO namenode.FSImageFormatProtobuf: Image file /opt/module/hadoop-3.1.3/data/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds .
2021-12-10 10:20:46,380 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2021-12-10 10:20:46,385 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown.
2021-12-10 10:20:46,386 INFO namenode.NameNode: SHUTDOWN_MSG: 

[root@master hadoop-3.1.3]# jps
24737 Jps
[root@master hadoop-3.1.3]# sbin/start-dfs.sh
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [master]
上一次登录:五 12月 10 09:24:29 CST 2021从 192.168.56.11pts/1 上
Starting datanodes
上一次登录:五 12月 10 10:21:36 CST 2021pts/0 上
node1: WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
node2: WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating.
Starting secondary namenodes [node2]
上一次登录:五 12月 10 10:21:39 CST 2021pts/0 上
[root@master hadoop-3.1.3]# jps
25521 NameNode
26133 Jps
25689 DataNode
[root@master hadoop-3.1.3]# jps
25521 NameNode
27013 Jps
25689 DataNode
26623 NodeManager
[root@master hadoop-3.1.3]# mapred --daemon start historyserver
[root@master hadoop-3.1.3]# jps
29872 Jps
25521 NameNode
29752 JobHistoryServer
25689 DataNode
26623 NodeManager
[root@master hadoop-3.1.3]# 

node1节点:命令如下:

[root@node1 software]# jps
24256 DataNode
24519 Jps
[root@node1 software]# cd /opt/module/
[root@node1 module]# cd hadoop-3.1.3/
[root@node1 hadoop-3.1.3]# sbin/start-yarn.sh
Starting resourcemanager
上一次登录:五 12月 10 09:24:32 CST 2021从 192.168.56.11pts/1 上
Starting nodemanagers
上一次登录:五 12月 10 10:22:17 CST 2021pts/0 上
[root@node1 hadoop-3.1.3]# jps
24256 DataNode
24998 ResourceManager
25175 NodeManager
25674 Jps
[root@node1 hadoop-3.1.3]# jps
24256 DataNode
24998 ResourceManager
25175 NodeManager
28584 Jps
[root@node1 hadoop-3.1.3]# 

截图:

etc/hadoop文件与 sbin 已上传github

地址:

https://github.com/heidaodageshiwo/common/tree/springboot-hadoop3.1.3/common/src/main/java/com/common/common/hadoop

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/651153.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号