1.准备3台机器 2核2G即可
| ip | 配置 |
| 192.168.56.211 | 2核2G即可 |
| 192.168.56.212 | 2核2G即可 |
| 192.168.56.213 | 2核2G即可 |
在每台机器配置 /etc/hosts
添加:(192.168.56.211(master节点),192.168.56.212(node1节点),192.168.56.213(node2节点))每台机器都添加
修改每台机器名字
[root@master~]# vim /etc/hostname master
[root@master ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.56.211 master 192.168.56.212 node1 192.168.56.213 node2 [root@master ~]#
图例:
2.3台机器配置免密登录
ssh-keygen -t rsa #一路回车即可 然后分发到每台机器,包括当前机器 ssh-copy-id master ssh-copy-id node1 ssh-copy-id node2 #注意: 还需要在 master上采用 root 账号配置一下无密登录到 master、node1、 node2服务器上。 还需要在 node1上采用 root 账号配置一下无密登录到 master、node1、 node2服务器上。 还需要在 node2上采用 root 账号,配置一下无密登录到 master、node1、 node2;
如:
3. 在/opt 目录下创建 module、software 文件夹
[root@master~]# mkdir /opt/module [root@master~]# mkdir /opt/software
4.下载链接:(Hadoop3.1.3下载路径)
https://archive.apache.org/dist/hadoop/common/hadoop-3.1.3/
5.每台机器都要配置java环境变量
如:(/opt/jdk/jdk1.8.0_202 环境变量下面会用到)
[root@master ~]# echo $JAVA_HOME /opt/jdk/jdk1.8.0_202 [root@master ~]# java -version java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode) [root@master ~]#
6.把hadoop上传到linux目录:(每台机器都需要配置环境变量)
1)进入到 Hadoop 安装包路径下 [root@master ~]$ cd /opt/software/ 2)解压安装文件到/opt/module 下面 [root@master software]$ tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/ 3)查看是否解压成功 [root@master software]$ ls /opt/module/ hadoop-3.1.3 4)将 Hadoop 添加到环境变量 (1)获取 Hadoop 安装路径 [root@master hadoop-3.1.3]$ pwd /opt/module/hadoop-3.1.3 (2)打开/etc/profile.d/my_env.sh 文件 [root@master hadoop-3.1.3]$ sudo vim /etc/profile.d/my_env.sh 在 my_env.sh 文件末尾添加如下内容:(shift+g) #HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin ➢ 保存并退出: :wq (3)让修改后的文件生效 [root@master hadoop-3.1.3]$ source /etc/profile 测试是否安装成功 [root@master hadoop-3.1.3]$ hadoop version Hadoop 3.1.3 5)重启(如果 Hadoop 命令不能用再重启虚拟机) [root@master hadoop-3.1.3]$ sudo reboot
7.运行WordCount示例:
7.1 本地运行模式(官方 WordCount) 1)创建在 hadoop-3.1.3 文件下面创建一个 wcinput 文件夹 [root@master hadoop-3.1.3]$ mkdir wcinput 2)在 wcinput 文件下创建一个 word.txt 文件 [root@master hadoop-3.1.3]$ cd wcinput 3)编辑 word.txt 文件 [root@master wcinput]$ vim word.txt ➢ 在文件中输入如下内容 hadoop yarn hadoop mapreduce ➢ 保存退出::wq 4)回到 Hadoop 目录/opt/module/hadoop-3.1.3 5)执行程序 [root@master hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount wcinput wcoutput 6)查看结果 [root@master hadoop-3.1.3]$ cat wcoutput/part-r-00000 看到如下结果: hadoop 2 mapreduce 1 yarn 1
8.1)集群部署规划
注意:
➢ NameNode 和 SecondaryNameNode 不要安装在同一台服务器
➢ ResourceManager 也很消耗内存,不要和 NameNode、SecondaryNameNode 配置在
同一台机器上。
| master | node1 | node2 | |
| HDFS | NameNode DataNode | DataNode | SecondaryNameNode DataNode |
| YARN | NodeManager | ResourceManager NodeManager | NodeManager |
2)配置文件说明
Hadoop 配置文件分两类:默认配置文件和自定义配置文件,只有用户想修改某一默认
配置值时,才需要修改自定义配置文件,更改相应属性值。
(1)默认配置文件:
要获取的默认文件 文件存放在 Hadoop 的 jar 包中的位置
[core-default.xml] hadoop-common-3.1.3.jar/core-default.xml
[hdfs-default.xml] hadoop-hdfs-3.1.3.jar/hdfs-default.xml
[yarn-default.xml] hadoop-yarn-common-3.1.3.jar/yarn-default.xml
[mapred-default.xml] hadoop-mapreduce-client-core-3.1.3.jar/mapred-default.xml
(2)自定义配置文件:
core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml 四个配置文件存放在
$HADOOP_HOME/etc/hadoop 这个路径上,用户可以根据项目需求重新进行修改配置。
3)配置集群
(1)核心配置文件
配置 core-site.xml
[root@master ~]$ cd $HADOOP_HOME/etc/hadoop [root@master hadoop]$ vim core-site.xml
文件如下:
fs.defaultFS hdfs://master:8020 hadoop.tmp.dir /opt/module/hadoop-3.1.3/data hadoop.http.staticuser.user root
(2)HDFS 配置文件
配置 hdfs-site.xml
[root@master hadoop]$ vim hdfs-site.xml
文件内容如下:
dfs.namenode.http-address master:9870 dfs.namenode.secondary.http-address node2:9868
(3)YARN 配置文件
配置 yarn-site.xml
[root@master hadoop]$ vim yarn-site.xml
文件内容如下:
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname node1 yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO NF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAP RED_HOME yarn.log-aggregation-enable true yarn.log.server.url http://master:19888/jobhistory/logs yarn.log-aggregation.retain-seconds 604800
(4)MapReduce 配置文件
配置 mapred-site.xml
[root@master hadoop]$ vim mapred-site.xml
文件内容如下:
mapreduce.framework.name yarn yarn.app.mapreduce.am.env HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3 mapreduce.map.env HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3 mapreduce.reduce.env HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3 mapreduce.jobhistory.address master:10020 mapreduce.jobhistory.webapp.address master:19888
1)配置 workers
[root@master hadoop]$ vim /opt/module/hadoop3.1.3/etc/hadoop/workers
在该文件中增加如下内容:
master node1 node2
编辑 hadoop-env.sh 添加 java运行环境目录:
export JAVA_HOME=/opt/jdk/jdk1.8.0_202
hadoop-3.1.3sbin start-dfs.sh stop-dfs.sh 添加root用户启动
#!/usr/bin/env bash HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
hadoop-3.1.3sbin start-yarn.sh stop-yarn.sh 添加root用户启动
#!/usr/bin/env bash YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=root
2)启动集群
(1)如果集群是第一次启动,需要在 master节点格式化 NameNode(注意:格式
化 NameNode,会产生新的集群 id,导致 NameNode 和 DataNode 的集群 id 不一致,集群找
不到已往数据。如果集群在运行过程中报错,需要重新格式化 NameNode 的话,一定要先停
止 namenode 和 datanode 进程,并且要删除所有机器的 data 和 logs 目录,然后再进行格式
化。)
[root@master hadoop-3.1.3]$ hdfs namenode -format
(2)启动 HDFS
[root@master hadoop-3.1.3]$ sbin/start-dfs.sh
(3)在配置了 ResourceManager 的节点(node1)启动 YARN
[root@node1 hadoop-3.1.3]$ sbin/start-yarn.sh
3)在 master 启动历史服务器
[root@master hadoop-3.1.3]$ mapred --daemon start historyserver
(4)Web 端查看 HDFS 的 NameNode
(a)浏览器中输入:http://hadoop100:9870
(b)查看 HDFS 上存储的数据信息
(5)Web 端查看 YARN 的 ResourceManager
(a)浏览器中输入:http://hadoop101:8088
(b)查看 YARN 上运行的 Job 信息
windows需要配置映射:(C:WindowsSystem32driversetc)修改HOSTS文件添加如下内容
192.168.56.211 hadoop100 192.168.56.212 hadoop101 192.168.56.213 hadoop102 192.168.56.211 master 192.168.56.211 06762320a7eb 192.168.56.212 node1 192.168.56.213 node2
master:命令如下:
drwxr-xr-x 9 1000 1000 149 9月 12 2019 hadoop-3.1.3 [root@master module]# cd hadoop-3.1.3/ [root@master hadoop-3.1.3]# hdfs namenode -format WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating. 2021-12-10 10:20:44,056 INFO namenode.NameNode: STARTUP_MSG: 2021-12-10 10:20:44,082 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 2021-12-10 10:20:44,370 INFO namenode.NameNode: createNameNode [-format] Formatting using clusterid: CID-10ef8d8c-2b08-40e9-a2f0-6a5e0647fc88 2021-12-10 10:20:45,810 INFO namenode.FSEditLog: Edit logging is async:true 2021-12-10 10:20:45,828 INFO namenode.FSNamesystem: KeyProvider: null 2021-12-10 10:20:45,829 INFO namenode.FSNamesystem: fsLock is fair: true 2021-12-10 10:20:45,829 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false 2021-12-10 10:20:45,901 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE) 2021-12-10 10:20:45,902 INFO namenode.FSNamesystem: supergroup = supergroup 2021-12-10 10:20:45,902 INFO namenode.FSNamesystem: isPermissionEnabled = true 2021-12-10 10:20:45,902 INFO namenode.FSNamesystem: HA Enabled: false 2021-12-10 10:20:45,993 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling 2021-12-10 10:20:46,003 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000 2021-12-10 10:20:46,006 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true 2021-12-10 10:20:46,011 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000 2021-12-10 10:20:46,012 INFO blockmanagement.BlockManager: The block deletion will start around 2021 十二月 10 10:20:46 2021-12-10 10:20:46,013 INFO util.GSet: Computing capacity for map BlocksMap 2021-12-10 10:20:46,014 INFO util.GSet: VM type = 64-bit 2021-12-10 10:20:46,048 INFO util.GSet: 2.0% max memory 443 MB = 8.9 MB 2021-12-10 10:20:46,048 INFO util.GSet: capacity = 2^20 = 1048576 entries 2021-12-10 10:20:46,054 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false 2021-12-10 10:20:46,062 INFO Configuration.deprecation: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISEConDS 2021-12-10 10:20:46,062 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033 2021-12-10 10:20:46,062 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0 2021-12-10 10:20:46,062 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000 2021-12-10 10:20:46,062 INFO blockmanagement.BlockManager: defaultReplication = 3 2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: maxReplication = 512 2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: minReplication = 1 2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: maxReplicationStreams = 2 2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: redundancyRecheckInterval = 3000ms 2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: encryptDataTransfer = false 2021-12-10 10:20:46,063 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000 2021-12-10 10:20:46,118 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215 2021-12-10 10:20:46,136 INFO util.GSet: Computing capacity for map INodeMap 2021-12-10 10:20:46,136 INFO util.GSet: VM type = 64-bit 2021-12-10 10:20:46,136 INFO util.GSet: 1.0% max memory 443 MB = 4.4 MB 2021-12-10 10:20:46,136 INFO util.GSet: capacity = 2^19 = 524288 entries 2021-12-10 10:20:46,137 INFO namenode.FSDirectory: ACLs enabled? false 2021-12-10 10:20:46,137 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true 2021-12-10 10:20:46,137 INFO namenode.FSDirectory: XAttrs enabled? true 2021-12-10 10:20:46,137 INFO namenode.NameNode: Caching file names occurring more than 10 times 2021-12-10 10:20:46,147 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536 2021-12-10 10:20:46,149 INFO snapshot.SnapshotManager: SkipList is disabled 2021-12-10 10:20:46,155 INFO util.GSet: Computing capacity for map cachedBlocks 2021-12-10 10:20:46,155 INFO util.GSet: VM type = 64-bit 2021-12-10 10:20:46,155 INFO util.GSet: 0.25% max memory 443 MB = 1.1 MB 2021-12-10 10:20:46,155 INFO util.GSet: capacity = 2^17 = 131072 entries 2021-12-10 10:20:46,163 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10 2021-12-10 10:20:46,163 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10 2021-12-10 10:20:46,163 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25 2021-12-10 10:20:46,168 INFO namenode.FSNamesystem: Retry cache on namenode is enabled 2021-12-10 10:20:46,168 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis 2021-12-10 10:20:46,170 INFO util.GSet: Computing capacity for map NameNodeRetryCache 2021-12-10 10:20:46,170 INFO util.GSet: VM type = 64-bit 2021-12-10 10:20:46,170 INFO util.GSet: 0.029999999329447746% max memory 443 MB = 136.1 KB 2021-12-10 10:20:46,170 INFO util.GSet: capacity = 2^14 = 16384 entries 2021-12-10 10:20:46,207 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1937585949-192.168.56.211-1639102846200 2021-12-10 10:20:46,233 INFO common.Storage: Storage directory /opt/module/hadoop-3.1.3/data/dfs/name has been successfully formatted. 2021-12-10 10:20:46,271 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/module/hadoop-3.1.3/data/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression 2021-12-10 10:20:46,363 INFO namenode.FSImageFormatProtobuf: Image file /opt/module/hadoop-3.1.3/data/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 391 bytes saved in 0 seconds . 2021-12-10 10:20:46,380 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0 2021-12-10 10:20:46,385 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid = 0 when meet shutdown. 2021-12-10 10:20:46,386 INFO namenode.NameNode: SHUTDOWN_MSG: [root@master hadoop-3.1.3]# jps 24737 Jps [root@master hadoop-3.1.3]# sbin/start-dfs.sh WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER. Starting namenodes on [master] 上一次登录:五 12月 10 09:24:29 CST 2021从 192.168.56.11pts/1 上 Starting datanodes 上一次登录:五 12月 10 10:21:36 CST 2021pts/0 上 node1: WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating. node2: WARNING: /opt/module/hadoop-3.1.3/logs does not exist. Creating. Starting secondary namenodes [node2] 上一次登录:五 12月 10 10:21:39 CST 2021pts/0 上 [root@master hadoop-3.1.3]# jps 25521 NameNode 26133 Jps 25689 DataNode [root@master hadoop-3.1.3]# jps 25521 NameNode 27013 Jps 25689 DataNode 26623 NodeManager [root@master hadoop-3.1.3]# mapred --daemon start historyserver [root@master hadoop-3.1.3]# jps 29872 Jps 25521 NameNode 29752 JobHistoryServer 25689 DataNode 26623 NodeManager [root@master hadoop-3.1.3]#
node1节点:命令如下:
[root@node1 software]# jps 24256 DataNode 24519 Jps [root@node1 software]# cd /opt/module/ [root@node1 module]# cd hadoop-3.1.3/ [root@node1 hadoop-3.1.3]# sbin/start-yarn.sh Starting resourcemanager 上一次登录:五 12月 10 09:24:32 CST 2021从 192.168.56.11pts/1 上 Starting nodemanagers 上一次登录:五 12月 10 10:22:17 CST 2021pts/0 上 [root@node1 hadoop-3.1.3]# jps 24256 DataNode 24998 ResourceManager 25175 NodeManager 25674 Jps [root@node1 hadoop-3.1.3]# jps 24256 DataNode 24998 ResourceManager 25175 NodeManager 28584 Jps [root@node1 hadoop-3.1.3]#
截图:
etc/hadoop文件与 sbin 已上传github
地址:
https://github.com/heidaodageshiwo/common/tree/springboot-hadoop3.1.3/common/src/main/java/com/common/common/hadoop



