首先准备一台Linux环境的服务器关闭防火墙
# 查看防火墙状态 systemctl status firewalld # 停止防火墙 systemctl stop firewalld # 启动防火墙 systemctl start firewalld # 永久关闭防火墙 systemctl disable firewalld
修改hostname
[root@hadoop102 wcoutput]# vim /etc/hostname hadoop102
配置IP地址
[root@hadoop102 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 #根据自己的实际情况配置 BOOTPROTO=static onBOOT=yes IPADDR=192.168.150.102 GATEWAY=192.168.150.2 NETMASK=255.255.255.0 DNS1=8.8.8.8
配置映射
[root@hadoop102 ~]# vim /etc/hosts #因为完全分布式需要三台服务器所以我们提前配置映射 192.168.150.102 hadoop102 192.168.150.103 hadoop103 192.168.150.104 hadoop104
在根目录下新建文件夹
[root@hadoop102 ~]# mkdir -p /export/servers
上传jdk8和Hadoop到/export/servers文件夹
文件路径需自行修改
scp -r D:桌面hadoop笔记资料 4_jar包jdk-8u212-linux-x64.tar.gz root@hadoop102:/export/servers scp -r D:桌面hadoop笔记资料 4_jar包hadoop-3.1.3.tar.gz root@hadoop102:/export/servers
解压jdk和hadoop到当前文件夹
[root@hadoop102 servers]# cd /export/servers/ [root@hadoop102 servers]# tar -zxvf ./hadoop-3.1.3.tar.gz -C ./ [root@hadoop102 servers]# tar -zxvf ./jdk-8u212-linux-x64.tar.gz -C ./
重命名Hadoop与jdk
[root@hadoop102 servers]# mv hadoop-3.1.3 hadoop [root@hadoop102 servers]# mv jdk1.8.0_212 jdk
配置hadoop与jdk的环境变量
[root@hadoop102 servers]# cd /etc/profile.d/ [root@hadoop102 profile.d]# vim my_env.sh #my_env.sh内容如下 #JAVA_HOME export JAVA_HOME=/export/servers/jdk export PATH=$PATH:$JAVA_HOME/bin #HADOOP_HOME export HADOOP_HOME=/export/servers/hadoop export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root #保存退出后刷新环境变量 [root@hadoop102 profile.d]# source /etc/profile
执行java,javac,hadoop都没有报错那么说明jdk与hadoop安装成功
接下来我们测试Hadoop伪分布式,执行wordcount程序
#进入Hadoop的根目录 [root@hadoop102 profile.d]# cd /export/servers/hadoop [root@hadoop102 hadoop]# mkdir wcinput [root@hadoop102 hadoop]# cd wcinput/ [root@hadoop102 wcinput]# vim word.txt #编辑word.txt文件内容如下 hadoop hadoop java java javac javac linux linux linux word ord ord ord wo #保存并退出
执行wordcount程序,执行结果保存在/export/servers/hadoop/wcoutput目录下
[root@hadoop102 hadoop]# cd /export/servers/hadoop [root@hadoop102 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount wcinput wcoutput
查看执行的结果
[root@hadoop102 hadoop]# cd /export/servers/hadoop/wcoutput/ [root@hadoop102 wcoutput]# cat part-r-00000 hadoop 2 java 2 javac 2 linux 3 ord 3 wo 1 word 1 [root@hadoop102 wcoutput]#
至此,hadoop伪分布式搭建成功。
hadoop完全分布式集群搭建搭建完全分布式需要三台Linux服务器并且配置好IP地址
通过Hadoop102克隆出hadoop103,hadoop104
修改Hadoop103的IP与hostname
[root@hadoop103 ~]# vim /etc/hostname #内容如下 hadoop103 [root@hadoop103 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 #内容如下 BOOTPROTO=static onBOOT=yes IPADDR=192.168.150.103 GATEWAY=192.168.150.2 NETMASK=255.255.255.0 DNS1=8.8.8.8
修改Hadoop104的IP与hostname
[root@hadoop104 ~]# vim /etc/hostname #内容如下 hadoop104 [root@hadoop104 ~]# vim /etc/sysconfig/network-scripts/ifcfg-ens33 #内容如下 BOOTPROTO=static onBOOT=yes IPADDR=192.168.150.104 GATEWAY=192.168.150.2 NETMASK=255.255.255.0 DNS1=8.8.8.8
三台服务器分别测试网络状态
[root@hadoop102 ~]# ping www.baidu.com #下面是hadoop102执行结果 PING www.a.shifen.com (14.215.177.38) 56(84) bytes of data. 64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=1 ttl=128 time=25.0 ms 64 bytes from 14.215.177.38 (14.215.177.38): icmp_seq=2 ttl=128 time=24.8 ms [root@hadoop103 ~]# ping www.baidu.com #下面是hadoop103执行结果 PING www.wshifen.com (45.113.192.102) 56(84) bytes of data. 64 bytes from 45.113.192.102 (45.113.192.102): icmp_seq=1 ttl=128 time=259 ms 64 bytes from 45.113.192.102 (45.113.192.102): icmp_seq=2 ttl=128 time=265 ms [root@hadoop104 ~]# ping www.baidu.com #下面是hadoop104执行结果 PING www.wshifen.com (45.113.192.101) 56(84) bytes of data. 64 bytes from 45.113.192.101 (45.113.192.101): icmp_seq=1 ttl=128 time=224 ms 64 bytes from 45.113.192.101 (45.113.192.101): icmp_seq=2 ttl=128 time=225 ms
测试三台服务器是否互通
#三台服务器分别互ping一下这里不多做演示,互ping需要建立在配置了映射的基础上 [root@hadoop102 ~]# ping hadoop103 PING hadoop103 (192.168.150.103) 56(84) bytes of data. 64 bytes from hadoop103 (192.168.150.103): icmp_seq=1 ttl=64 time=0.658 ms 64 bytes from hadoop103 (192.168.150.103): icmp_seq=2 ttl=64 time=0.644 ms [root@hadoop102 ~]# ping hadoop104 PING hadoop104 (192.168.150.104) 56(84) bytes of data. 64 bytes from hadoop104 (192.168.150.104): icmp_seq=1 ttl=64 time=0.899 ms 64 bytes from hadoop104 (192.168.150.104): icmp_seq=2 ttl=64 time=0.719 ms #如果不成功那么请修改/etc/hosts文件 192.168.150.102 hadoop102 192.168.150.103 hadoop103 192.168.150.104 hadoop104
三台服务器分别测试hadoop与jdk是否正常
配置免密登录
在hadoop102执行以下操作
[root@hadoop102 ~]# cd .ssh/ [root@hadoop102 .ssh]# ssh-keygen -t rsa # 连续按三次回车生成公钥和私钥 Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:8kVdvUkmEDFrKm2Cw3DN94RdlOH+P7F1WwNzM+87Cgk root@hadoop102 The key's randomart image is: +---[RSA 2048]----+ | ==++. | | o o.=+ o.| | . . o o.=..+ o| | + . o.= .o * | | +.oSE.. .+ +| | .o+.. . .o=| | . o oO| | . =o| | ...=| +----[SHA256]-----+ [root@hadoop102 .ssh]# ssh-copy-id hadoop102 [root@hadoop102 .ssh]# ssh-copy-id hadoop103 [root@hadoop102 .ssh]# ssh-copy-id hadoop104
hadoop103,hadoop104上执行相同的操作
[root@hadoop103 ~]# cd .ssh/ [root@hadoop103 .ssh]# ssh-keygen -t rsa # 连续按三次回车生成公钥和私钥 Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:8kVdvUkmEDFrKm2Cw3DN94RdlOH+P7F1WwNzM+87Cgk root@hadoop102 The key's randomart image is: +---[RSA 2048]----+ | ==++. | | o o.=+ o.| | . . o o.=..+ o| | + . o.= .o * | | +.oSE.. .+ +| | .o+.. . .o=| | . o oO| | . =o| | ...=| +----[SHA256]-----+ [root@hadoop103 .ssh]# ssh-copy-id hadoop102 [root@hadoop103 .ssh]# ssh-copy-id hadoop103 [root@hadoop103 .ssh]# ssh-copy-id hadoop104
修改Hadoop的配置文件
cd /export/servers/hadoop/etc/hadoop/
修改core-site.xml
fs.defaultFS hdfs://hadoop102:8020 hadoop.tmp.dir /export/servers/hadoop/data hadoop.http.staticuser.user root
修改yarn-site.xml
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname hadoop103 yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME yarn.log-aggregation-enable true yarn.log.server.url http://hadoop102:19888/jobhistory/logs yarn.log-aggregation.retain-seconds 604800
修改hdfs-stie.xml
dfs.namenode.http-address hadoop102:9870 dfs.namenode.secondary.http-address hadoop104:9868
修改maperd-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop102:10020
mapreduce.jobhistory.webapp.address
hadoop102:19888
修改workers
hadoop102 hadoop103 hadoop104
将配置文件复制到Hadoop103,hadoop104
[root@hadoop102 ~]# cd /export/servers/hadoop/etc/hadoop/ scp ./core-site.xml root@hadoop103:/export/servers/hadoop/etc/hadoop/ scp ./core-site.xml root@hadoop104:/export/servers/hadoop/etc/hadoop/ scp ./hdfs-site.xml root@hadoop103:/export/servers/hadoop/etc/hadoop/ scp ./hdfs-site.xml root@hadoop104:/export/servers/hadoop/etc/hadoop/ scp ./yarn-site.xml root@hadoop103:/export/servers/hadoop/etc/hadoop/ scp ./yarn-site.xml root@hadoop104:/export/servers/hadoop/etc/hadoop/ scp ./mapred-site.xml root@hadoop103:/export/servers/hadoop/etc/hadoop/ scp ./mapred-site.xml root@hadoop104:/export/servers/hadoop/etc/hadoop/ scp ./workers root@hadoop103:/export/servers/hadoop/etc/hadoop/ scp ./workers root@hadoop104:/export/servers/hadoop/etc/hadoop/
在Hadoop102上格式化namenode
[root@hadoop102 ~]# hdfs namenode -format
在hadoop102启动hdfs
[root@hadoop102 profile.d]# start-dfs.sh
在hadoop103启动yarn
[root@hadoop103 .ssh]# start-yarn.sh
在hadoop102启动历史服务器
[root@hadoop102 hadoop]# mapred --daemon start historyserver
通jps命令查看三台服务器上的所有进程
[root@hadoop102 hadoop]# jps 2625 NameNode 3251 JobHistoryServer 3099 NodeManager 2765 DataNode 3311 Jps [root@hadoop103 .ssh]# jps 2390 ResourceManager 2536 NodeManager 2857 Jps 2189 DataNode [root@hadoop104 .ssh]# jps 2338 NodeManager 2436 Jps 2238 SecondaryNameNode 2143 DataNode
进入浏览器访问web服务
http://192.168.150.102:9870/explorer.html#/
http://192.168.150.103:8088/cluster
http://192.168.150.102:19888/jobhistory
编写Hadoop群启与群关脚本
[root@hadoop104 servers]# cd /export/servers/
[root@hadoop104 servers]# mkdir bin
[root@hadoop104 servers]# cd bin
[root@hadoop104 bin]# vim my_hadoop.sh
#编辑内容如下
#!/bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit ;
fi
case $1 in
"start")
echo " =================== 启动 hadoop集群 ==================="
echo " --------------- 启动 hdfs ---------------"
ssh hadoop102 "/export/servers/hadoop/sbin/start-dfs.sh"
echo " --------------- 启动 yarn ---------------"
ssh hadoop103 "/export/servers/hadoop/sbin/start-yarn.sh"
echo " --------------- 启动 historyserver ---------------"
ssh hadoop102 "/export/servers/hadoop/bin/mapred --daemon start historyserver"
;;
"stop")
echo " =================== 关闭 hadoop集群 ==================="
echo " --------------- 关闭 historyserver ---------------"
ssh hadoop102 "/export/servers/hadoop/bin/mapred --daemon stop historyserver"
echo " --------------- 关闭 yarn ---------------"
ssh hadoop103 "/export/servers/hadoop/sbin/stop-yarn.sh"
echo " --------------- 关闭 hdfs ---------------"
ssh hadoop102 "/export/servers/hadoop/sbin/stop-dfs.sh"
;;
*)
echo "Input Args Error..."
;;
esac
#给my_hadoop.sh添加可执行权限
[root@hadoop104 bin]# chmod +x my_hadoop.sh
群启群关Hadoop集群
[root@hadoop104 bin]# ./my_hadoop.sh start [root@hadoop104 bin]# ./my_hadoop.sh stop
编写集群查看进程jpsall脚本
[root@hadoop104 bin]# cd /export/servers/bin/
[root@hadoop104 bin]# vim jpsall
#编写代码如下
#!/bin/bash
for host in hadoop102 hadoop103 hadoop104
do
echo =============== $host ===============
ssh $host jps
done
#给jpsall添加可执行权限
[root@hadoop104 bin]# chmod +x jpsall
执行jpsall脚本
[root@hadoop104 bin]# ./jpsall =============== hadoop102 =============== 4758 Jps 4072 NameNode 4524 NodeManager 4237 DataNode 4687 JobHistoryServer =============== hadoop103 =============== 3939 Jps 3452 ResourceManager 3598 NodeManager 3263 DataNode =============== hadoop104 =============== 2800 SecondaryNameNode 2696 DataNode 3022 Jps 2879 NodeManager
测试Hadoop完全分布式集群
#在hadoop集群上新建一个input文件夹 [root@hadoop102 servers]# hadoop fs -mkdir /input #上传文件到集群的input文件夹中 [root@hadoop102 hadoop]# hadoop fs -put /export/servers/hadoop/wcinput/word.txt /input #查看集群中的根目录有哪些文件 [root@hadoop102 hadoop]# hadoop fs -ls / #执行wordcount程序 [root@hadoop102 hadoop]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output #查看Hadoop集群中的文件 [root@hadoop102 hadoop]# hadoop fs -cat /output/*
至此hadoop完全分布式集群搭建完成
总结完全分布式核心内容1 准备三台Linux客户机,配置静态IP,主机名,映射,关闭防火墙
2 安装jdk和hadoop
3 配置jdk和hadoop环境变量
4 修改Hadoop配置文件
5 配置ssh免密登录
6 单点启动
7 群起并测试集群是否正常运行



