- 1. 准备工作
- 1.1. 软件版本
- 1.2. 集群规划
- 2. 环境搭建
- 1.修改主机名
- 2. 关闭防火墙
- 3.修改hosts文件
- 4.配置ssh,无密码登录
- 5.安装jdk
- 6.安装hadoop
- 1.解压
- 2.将Hadoop添加到环境变量,vi /etc/profile
- 3.将 profile分配到其他节点,再source一下生效
- 4.创建hdfs存储目录
- 5.修改/hadoop-2.9.2/etc/jadoop/hadoop-env.sh文件,设置JAVA_HOME 为实际路径
- 6.修改/hadoop-2.9.2/etc/jadoop/yarn-env.sh文件,设置JAVA_HOME 为实际路径
- 7.配置/hadoop-2.9.2/etc/hadoop/core-site.xml
- 8.配置/hadoop-2.9.2/etc/hadoop/hdfs-site.xml
- 9.配置/hadoop-2.9.2/etc/hadoop/mapred-site.xml
- 10. 配置/hadoop-2.9.2/etc/hadoop/yarn-site.xml
- 11. 配置/hadoop-2.9.2/etc/hadoop/slaves
- 12.发送到其他节点上
- 13.格式化namenode
- 14.启动hadoop
- 15.访问web页面
- 16.运行实例
1.2. 集群规划jdk: 1.8
hadoop:2.9.2
系统:centos7
安装包统一放在 /usr/local/src目录下
| 编号 | 主机名 | ip地址 | 节点类型 |
|---|---|---|---|
| 1 | master | 192.168.1.101 | NameNode、SecondaryNameNode、ResourceManager |
| 2 | slave1 | 192.168.1.102 | NodeManager、DataNode |
| 3 | slave2 | 192.168.1.103 | NodeManager、DataNode |
在三个节点上分别执行
hostnamectl set-hostname master hostnamectl set-hostname slave1 hostnamectl set-hostname slave22. 关闭防火墙
集群上每个节点的防火墙都需要关闭
systemctl stop firewalld systemctl disable firewalld3.修改hosts文件
vi /etc/hosts
hosts添加下面三行
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.101 master 192.168.1.102 slave1 192.168.1.103 slave2
把hosts文件复制到其他节点上(需要输入yes,然后输入目标节点的用户密码
scp /etc/hosts root@slave1:/etc/ scp /etc/hosts root@slave2:/etc/4.配置ssh,无密码登录
生成公钥和私钥
ssh-keygen -t rsa
连续按回车出现以下图像
Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:0f4Tz5jw1zbR3t9j8RH1bOcwhg1BZwC7jkf1sUDfQTM root@master The key's randomart image is: +---[RSA 2048]----+ | .o=o+E | | . o.+ .=| | . o o+o.+| | o o.o=+*| | S = ..o*+| | + + * =+| | . o * +.O| | . o +=| | . +| +----[SHA256]-----+
将公钥拷贝到要免密登录的目标机器上
ssh-copy-id master ssh-copy-id slave1 ssh-copy-id slave2
测试效果
[root@master src]# ssh slave1 Last login: Wed Nov 10 15:34:09 2021 from 192.168.1.17 [root@slave1 ~]#5.安装jdk
解压,重命名文件夹
tar -xvf jdk-8u261-linux-x64.tar.gz mv jdk1.8.0_261 jdk1.8
追加环境变量
vi /etc/profile
在文件末尾添加
# java environment export JAVA_HOME=/usr/local/src/jdk1.8 # java解压的路径 export PATH=$PATH:$JAVA_HOME/bin
让修改后的文件生效
source /etc/profile
测试安装是否成功
[root@master src]# java -version java version "1.8.0_261" Java(TM) SE Runtime Environment (build 1.8.0_261-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)
把jdk和profile复制到其他节点
scp -r /usr/local/src/jdk1.8 root@slave1:/usr/local/src/ scp -r /usr/local/src/jdk1.8 root@slave2:/usr/local/src/ scp /etc/profile root@slave1:/etc/ scp /etc/profile root@slave2:/etc/
在其他节点使用 source /etc/profile 让环境生效
6.安装hadoop 1.解压tar -zxvf hadoop-2.9.2.tar.gz2.将Hadoop添加到环境变量,vi /etc/profile
#hadoop envrionment export HADOOP_HOME=/usr/local/src/hadoop-2.9.2 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin3.将 profile分配到其他节点,再source一下生效
scp /etc/profile root@slave1:/etc/ scp /etc/profile root@slave2:/etc/4.创建hdfs存储目录
(注意hadoop-2.9.2在/usr/local/src/目录下
/hadoop-2.9.2/hdfs/name --存储namenode文件
/hadoop-2.9.2/hdfs/data --存储数据
/hadoop-2.9.2/hdfs/tmp --存储临时文件
cd /usr/local/src/hadoop-2.9.2 mkdir hdfs cd hdfs mkdir name data tmp5.修改/hadoop-2.9.2/etc/jadoop/hadoop-env.sh文件,设置JAVA_HOME 为实际路径
cd /usr/local/src/hadoop-2.9.2/etc/hadoop/ vi hadoop-env.sh
把原来的注释掉
# The java implementation to use.
# export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/usr/local/src/jdk1.8
6.修改/hadoop-2.9.2/etc/jadoop/yarn-env.sh文件,设置JAVA_HOME 为实际路径
vi yarn-env.sh
把原来的注释下面添加
# some Java parameters # export JAVA_HOME=/home/y/libexec/jdk1.6.0/ export JAVA_HOME=/usr/local/src/jdk1.87.配置/hadoop-2.9.2/etc/hadoop/core-site.xml
vi core-site.xml
在configuration中添加
8.配置/hadoop-2.9.2/etc/hadoop/hdfs-site.xml# 临时存储目 # hdfs文件系统地址和端口 hadoop.tmp.dir /usr/local/src/hadoop-2.9.2/hdfs/tmp fs.default.name hdfs://master:9000
vi hdfs-site.xml
在configuration中添加
9.配置/hadoop-2.9.2/etc/hadoop/mapred-site.xml# 数据副本数量 # namenode存储目录 dfs.replication 3 # 数据存储目录 dfs.name.dir /usr/local/src/hadoop-2.9.2/hdfs/name # 关闭上传hdfs文件权限检查 dfs.data.dir /usr/local/src/hadoop-2.9.2/hdfs/data dfs.permissions false
根据目标复制一份出来
cp mapred-site.xml.template mapred-site.xml vi mapred-site.xml
在configuration中添加
10. 配置/hadoop-2.9.2/etc/hadoop/yarn-site.xml# 指定mapreduce在yarn平台运行 mapreduce.framework.name yarn
vi yarn-site.xml
在configuration中添加
11. 配置/hadoop-2.9.2/etc/hadoop/slaves# resourcemanager地址 # reducer获取数据的方式 yarn.resourcemanager.hostname master # 忽略虚拟内存检查 yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.vmem-check-enabled false
vi slaves
删除原有的内容,添加如下内容
slave1 slave212.发送到其他节点上
cd /usr/local/src/ scp -r hadoop-2.9.2 root@slave1:$PWD # $PWD获取当前所在目录下的绝对路径 scp -r hadoop-2.9.2 root@slave2:$PWD13.格式化namenode
hadoop namenode -format
如果有这一行说明格式化成功
14.启动hadoopstart-all.sh
查看各个节点情况
jps # jdk的命令
master
[root@master src]# jps 15636 NameNode 17014 Jps 16493 ResourceManager 16255 SecondaryNameNode
slave1,slave2
[root@slave1 src]# jps 14134 NodeManager 15739 Jps 13565 DataNode15.访问web页面
访问hdfs页面 http://192.168.1.101:50070
访问yarn页面 http://192.168.1.101:8088
16.运行实例cd hadoop-2.9.2/share/hadoop/mapreduce/ hadoop jar hadoop-mapreduce-examples-2.9.2.jar pi 5 10
[root@master mapreduce]# hadoop jar hadoop-mapreduce-examples-2.9.2.jar pi 5 10
Number of Maps = 5
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Starting Job
21/11/12 10:57:01 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.1.101:8032
21/11/12 10:57:01 INFO input.FileInputFormat: Total input files to process : 5
21/11/12 10:57:01 INFO mapreduce.JobSubmitter: number of splits:5
21/11/12 10:57:01 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
21/11/12 10:57:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1636685278166_0001
21/11/12 10:57:02 INFO impl.YarnClientImpl: Submitted application application_1636685278166_0001
21/11/12 10:57:02 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1636685278166_0001/
21/11/12 10:57:02 INFO mapreduce.Job: Running job: job_1636685278166_0001
21/11/12 10:57:08 INFO mapreduce.Job: Job job_1636685278166_0001 running in uber mode : false
21/11/12 10:57:08 INFO mapreduce.Job: map 0% reduce 0%
21/11/12 10:57:19 INFO mapreduce.Job: map 100% reduce 0%
21/11/12 10:57:24 INFO mapreduce.Job: map 100% reduce 100%
21/11/12 10:57:24 INFO mapreduce.Job: Job job_1636685278166_0001 completed successfully
21/11/12 10:57:24 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=116
FILE: Number of bytes written=1192839
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1300
HDFS: Number of bytes written=215
HDFS: Number of read operations=23
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=5
Launched reduce tasks=1
Data-local map tasks=5
Total time spent by all maps in occupied slots (ms)=42055
Total time spent by all reduces in occupied slots (ms)=2317
Total time spent by all map tasks (ms)=42055
Total time spent by all reduce tasks (ms)=2317
Total vcore-milliseconds taken by all map tasks=42055
Total vcore-milliseconds taken by all reduce tasks=2317
Total megabyte-milliseconds taken by all map tasks=43064320
Total megabyte-milliseconds taken by all reduce tasks=2372608
Map-Reduce framework
Map input records=5
Map output records=10
Map output bytes=90
Map output materialized bytes=140
Input split bytes=710
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=140
Reduce input records=10
Reduce output records=0
Spilled Records=20
Shuffled Maps =5
Failed Shuffles=0
Merged Map outputs=5
GC time elapsed (ms)=4892
CPU time spent (ms)=2690
Physical memory (bytes) snapshot=1675964416
Virtual memory (bytes) snapshot=12723679232
Total committed heap usage (bytes)=1073741824
Shuffle Errors
BAD_ID=0
ConNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=590
File Output Format Counters
Bytes Written=97
Job Finished in 23.816 seconds
Estimated value of Pi is 3.28000000000000000000
[root@master mapreduce]#



