| 角色服务器 | node01 | node02 | node03 | node04 |
|---|---|---|---|---|
| zookeeper | √ | √ | √ | |
| journalnode | √ | √ | √ | |
| namenode | √ | √ | ||
| zkfc | √ | √ | ||
| datanode | √ | √ | √ | |
| resourceManager | √ | √ | ||
| nodeManager | √ | √ | √ |
- 准备4台服务器,node01node02node03node04node01能免密登录其他3台服务器,node02能免密其他3台服务器zk集群
安装ZK集群
#--------------node02节点执行-------------------- tar xf zookeeper-3.5.6.tar.gz mv zookeeper-3.5.6 /opt/bigdata cd /opt/bigdata/zookeeper-3.5.6/conf cp zoo_sample.cfg zoo.cfg vim zoo.cfg #修改行 dataDir=/var/bigdata/hadoop/zk #增加行 server.1=node02:2888:3888 server.2=node03:2888:3888 server.3=node04:2888:3888 mkdir -p /var/bigdata/hadoop/zk echo 1 >> myid vi /etc/profile export ZK_HOME=/opt/bigdata/zookeeper-3.5.6 export PATH=原来的:$ZK_HOME/bin source /etc/profile #分发zk到node03和node04 scp -r /opt/bigdata/zookeeper-3.5.6 node03:/opt/bigdata scp -r /opt/bigdata/zookeeper-3.5.6 node04:/opt/bigdata #--------------node03节点执行-------------------- mkdir -p /var/bigdata/hadoop/zk echo 2 >> /var/bigdata/hadoop/zk/myid vi /etc/profile export ZK_HOME=/opt/bigdata/zookeeper-3.5.6 export PATH=原来的:$ZK_HOME/bin source /etc/profile #--------------node04节点执行-------------------- mkdir -p /var/bigdata/hadoop/zk echo 3 >> /var/bigdata/hadoop/zk/myid vi /etc/profile export ZK_HOME=/opt/bigdata/zookeeper-3.5.6 export PATH=原来的:$ZK_HOME/bin source /etc/profile
修改hadoop的配置文件,node01~node04都要修改,
core-site.xml配置修改,hadoop.apache.org官网地址参见
dfs.nameservices hdfs://mycluster ha.zookeeper.quorum node02:2181,node03:2181,node04:2181
hdfs-site.xml配置修改
dfs.replication 2 dfs.namenode.name.dir /var/bigdata/hadoop/ha/dfs/name dfs.datanode.data.dir /var/bigdata/hadoop/ha/dfs/data dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn1,nn2 dfs.namenode.rpc-address.mycluster.nn1 node01:8020 dfs.namenode.rpc-address.mycluster.nn2 node02:8020 dfs.namenode.http-address.mycluster.nn1 node01:50070 dfs.namenode.http-address.mycluster.nn2 node02:50070 dfs.namenode.shared.edits.dir qjournal://node01:8485;node02:8485;node03:8485/mycluster dfs.journalnode.edits.dir /var/bigdata/hadoop/ha/dfs/jn dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files /home/hadoop/.ssh/id_dsa dfs.ha.automatic-failover.enabled true
slaves文件修改
node02 node03 node04
yarn的配置文件,复制mapred-site.xml.template,名称为mapred-site.xml:
mapreduce.framework.name yarn
修改yarn-site.xml:
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id yarnCluster yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 node03 yarn.resourcemanager.hostname.rm2 node04 yarn.resourcemanager.zk-address node02:2181,node03:2181,node04:2181
修改完配置后可以使用远程复制命令进行分发:
#假如在node01修改 scp core-site.xml hdfs-site.xml node02:`pwd` scp core-site.xml hdfs-site.xml node03:`pwd` scp core-site.xml hdfs-site.xml node04:`pwd`
- 在启动node02~04服务器上的zk:zkServer.sh start在node01~node03上启动JN:hadoop-daemon.sh start journalnode选择一个NN做格式化(比如node01):hdfs namenode -format启动这个NN:hadoop-daemon.sh start namenode在另外一个NN节点:hdfs namenode -bootstrapStandby在node01执行格式化ZK:hdfs zkfc -formatZK在node01执行:start-dfs.sh在node01启动yarn(此时会在node02/03/04节点启动nodemanager,但是并未启动resourceManager,另外由于配置文件配置的在node02/03/04启动nm,所以即使在node01启动也会被自动杀死),执行:start-yarn.sh在node03和node04启动resourcemanager,执行:yarn-daemon.sh start resourcemanager=========================================以后的启动流程为:
node03~04:zkServer.sh start
node01~03:hadoop-daemon.sh start journalnode
node01(或node02):start-dfs.sh
node01:start-yarn.sh
node03~04:yarn-daemon.sh start resourcemanager
hdfs的地址:
http://node01:50070 active
http://node02:50070 standby
yarn的地址:
http://node03:8088/
http://node04:8088/
yarn的standby节点访问首页时会自动跳转到active节点
mapreduce示例代码执行:
- 创建一个文件上传到hdfs
touch createtxt.sh
touch data.txt
vim createtxt.sh
# 写入以下内容
for n in {1..100000};
do
echo "hello hadoop $n" >> data.txt
done
chmod 700 createtxt.sh
./createtxt.sh
hdfs dfs -mkdir -p /data/wc/input
hdfs dfs -D dfs.blocksize=1048576 -put data.txt /data/wc/input
执行hadoop的mapreduce示例程序# 去到hadoop提供的示例程序目录 cd /opt/bigdata/hadoop-2.6.5/share/hadoop/mapreduce # 执行单词统计的示例程序 hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wc/input /data/wc/output # 执行结束后查看hdfs目录下产出的文件 hdfs dfs -ls /data/wc/output # 可以看到有2个文件 Found 2 items -rw-r--r-- 2 hadoop supergroup 0 2022-03-27 18:07 /data/wc/output/_SUCCESS -rw-r--r-- 2 hadoop supergroup 788922 2022-03-27 18:07 /data/wc/output/part-r-00000



