服务器:hadoop0、hadoop1、hadoop2 操作系统:CentOS 7.6 软件清单及版本: - jdk-8u202-linux-x64 - hadoop-3.2.28. - zookeeper-3.4.10 - afka_2.12-2.7.11 - spark-3.1.2-bin-hadoop3.2 - MySQL-5.1.72-1.glibc23.x86_64.rpm-bundle - hbase-2.3.7 - hive-3.1.2一、基础环境准备 1.1、修改hostname(所有节点)
[root@hadoop0 ~]# vi /etc/hostname hadoop01.2、修改hosts文件(所有节点)
[root@hadoop0 ~]# vi /etc/hosts 172.16.0.177 hadoop0 172.16.0.178 hadoop1 172.16.0.179 hadoop21.3、关闭防火墙(所有节点)
查看防火墙状态命令:systemctl status firewalld
active(running)就意味着防火墙打开了
永久关闭的方式:
[root@hadoop0 ~]# systemctl disable firewalld [root@hadoop0 ~]# systemctl stop firewalld
执行后状态变为inactive(dead)说明防火墙已关闭
1.4、时钟同步(所有节点) 1.4.1、安装配置ntp同步阿里云时间服务器,从国内通用时间服务器同步时间[root@hadoop0 ~]# yum -y install ntp [root@hadoop0 ~]# crontab -e --后加入以下文本代表每5分钟定时执行ntpdate命令同步时间 */5 * * * * /usr/sbin/ntpdate cn.pool.ntp.org1.4.2、关闭ntp:
[root@hadoop0 ~]# systemctl stop ntpd1.4.3、启动ntp:
[root@hadoop0 ~]# systemctl start ntpd [root@hadoop0 ~]# systemctl enable ntpd1.4.4、查看crontab是否执行
[root@hadoop0 ~]# tail -f /var/log/cron1.5、免密登录 1.5.1、先在 hadoop0、hadoop1、hadoop2 上面分别执行。
进入/root 目录
[root@hadoop0 ~]# ssh-keygen -t dsa [root@hadoop1 ~]# ssh-keygen -t dsa [root@hadoop2 ~]# ssh-keygen -t dsa1.5.2、在hadoop0上执行
[root@hadoop0 ~]# cd /root/.ssh/ [root@hadoop0 .ssh]# cat id_dsa.pub >> authorized_keys1.5.3、将hadoop0的密钥拷贝到hadoop2
[root@hadoop0 ~]# ssh-copy-id -i /root/.ssh/id_dsa.pub hadoop21.5.4、将hadoop1的密钥拷贝到hadoop2
[root@hadoop1 ~]# ssh-copy-id -i /root/.ssh/id_dsa.pub hadoop21.5.5、将hadoop2的密钥拷贝到hadoop0和hadoop1
[root@hadoop2 ~]# scp /root/.ssh/authorized_keys hadoop0:/root/.ssh/ [root@hadoop2 ~]# scp /root/.ssh/authorized_keys hadoop1:/root/.ssh/1.6、关闭selinux(所有节点)
修改配置文件
[root@hadoop0 ~]# vi /etc/selinux/config
修改SELINUX=enforcing为SELINUX=disabled
重启后生效
[root@hadoop0 /]# mkdir software1.7.2、将jdk安装包放到software文件夹下面
[root@hadoop0 software]# ll -rw-r--r--. 1 root root 194042837 Nov 11 16:46 jdk-8u202-linux-x64.tar.gz1.7.3、解压,重命名
[root@hadoop0 software]# tar -zxvf jdk-8u202-linux-x64.tar.gz [root@hadoop0 software]# mv jdk1.8.0_202/ jdk [root@hadoop0 software]# ll drwxr-xr-x. 7 10 143 245 Dec 16 2018 jdk -rw-r--r--. 1 root root 194042837 Nov 11 16:46 jdk-8u202-linux-x64.tar.gz1.7.4、拷贝到hadoop1、hadoop2
[root@hadoop0 software]scp -r /software/jdk hadoop1:/software/ [root@hadoop0 software]scp -r /software/jdk hadoop2:/software/1.7.5、配置环境变量(所有节点)
[root@hadoop0 software]# vi /etc/profile export JAVA_HOME=/software/jdk export PATH=.:$PATH:$JAVA_HOME/bin [root@hadoop0 software]# source /etc/profile [root@hadoop0 software]# java -version java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
[root@hadoop1 software]# vi /etc/profile export JAVA_HOME=/software/jdk export PATH=.:$PATH:$JAVA_HOME/bin [root@hadoop1 software]# source /etc/profile [root@hadoop1 software]# java -version java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
[root@hadoop2 software]# vi /etc/profile export JAVA_HOME=/software/jdk export PATH=.:$PATH:$JAVA_HOME/bin [root@hadoop2 software]# source /etc/profile [root@hadoop2 software]# java -version java version "1.8.0_202" Java(TM) SE Runtime Environment (build 1.8.0_202-b08) Java HotSpot(TM) 64-Bit Server VM (build 25.202-b08, mixed mode)
注:以上步骤也可参考:Hadoop集群首次搭建记录(三节点)step02:CentOS 7系统环境准备
二、Hadoop生态环境安装 2.1、hadoop安装 2.1.1、将所有安装包上传到software文件夹[root@hadoop0 software]# ll total 1581028 -rw-r--r--. 1 root root 278813748 Nov 15 10:21 apache-hive-3.1.2-bin.tar.gz -rw-r--r--. 1 root root 395448622 Nov 11 17:29 hadoop-3.2.2.tar.gz -rw-r--r--. 1 root root 272812222 Nov 15 10:37 hbase-2.3.7-bin.tar.gz drwxr-xr-x. 7 10 143 245 Dec 16 2018 jdk -rw-r--r--. 1 root root 194042837 Nov 11 16:46 jdk-8u202-linux-x64.tar.gz -rw-r--r--. 1 root root 68778834 Nov 12 21:42 kafka_2.12-2.7.1.tgz -rw-r--r--. 1 root root 141813760 Nov 15 13:47 MySQL-5.1.72-1.glibc23.x86_64.rpm-bundle.tar -rw-r--r--. 1 root root 3362563 Nov 15 10:34 mysql-connector-java-5.1.49.tar.gz -rw-r--r--. 1 root root 228834641 Nov 15 09:12 spark-3.1.2-bin-hadoop3.2.tgz -rw-r--r--. 1 root root 35042811 Nov 12 20:02 zookeeper-3.4.10.tar.gz2.1.2、解压并重命名
[root@hadoop0 software]# tar -zxvf hadoop-3.2.2.tar.gz [root@hadoop0 software]# mv hadoop-3.2.2 hadoop2.1.3、添加环境变量
[root@hadoop0 software]# vi /etc/profile export HADOOP_HOME=/software/hadoop export PATH=.:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin [root@hadoop0 software]# source /etc/profile
同样配置hadoop1、hadoop2的环境变量
2.1.4、配置hadoop-env.sh[root@hadoop0 software]# vi hadoop-env.sh export JAVA_HOME=/software/jdk2.1.5、配置hdfs-site.xml
[root@hadoop0 software]# vi hdfs-site.xml2.1.6、配置 yarn-site.xmldfs.datanode.data.dir file:///software/hadoop/data/datanode dfs.namenode.name.dir file:///software/hadoop/data/namenode dfs.namenode.http-address hadoop0:50070 dfs.namenode.secondary.http-address hadoop1:50090 dfs.replication 1
[root@hadoop0 software]# vi yarn-site.xml2.1.7、配置 core-site.xmlyarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.resource-tracker.address hadoop0:8025 yarn.resourcemanager.scheduler.address hadoop0:8030 yarn.resourcemanager.address hadoop0:8050
[root@hadoop0 software]# vi core-site.xml2.1.8、配置mapred-site.xmlfs.defaultFS hdfs://hadoop0/
[root@hadoop0 hadoop]# vi mapred-site.xml2.1.9、配置workersmapreduce.jobhistory.address hadoop0:10020 mapreduce.jobhistory.webapp.address hadoop0:19888
[root@hadoop0 hadoop]# vi workers
hadoop1
hadoop2
拷贝hadoop
[root@hadoop0 hadoop]# scp -r /software/hadoop hadoop1:/software/ [root@hadoop0 hadoop]# scp -r /software/hadoop hadoop2:/software/2.1.11、格式化hdfs
[root@hadoop0 hadoop]# hdfs namenode -format2.1.12、启动测试
启动hadoop
[root@hadoop0 hadoop]# start-all.sh start
如果报错
ERROR:Attempting to operate on hdfs namenode as root
[root@hadoop0 hadoop]# cd /software/hadoop/etc/hadoop [root@hadoop0 hadoop]# vi hadoop-env.sh
添加:(主节点添加即可)
export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root
启动historyserver
[root@hadoop0 ]# cd /software/hadoop/sbin [root@hadoop0 sbin]# mr-jobhistory-daemon.sh start historyserver
停止historyserver
[root@hadoop0 sbin]# mr-jobhistory-daemon.sh stop historyserver2.1.12、启动成功查看网页:
jobhistory web界面
[root@hadoop0 software]# tar -zxvf zookeeper-3.4.10.tar.gz [root@hadoop0 software]# mv zookeeper-3.4.10 zk2.2.2、配置环境变量
[root@hadoop0 software]# vi /etc/profile export ZOOKEEPER_HOME=/software/zk export PATH=.:$PATH:$ZOOKEEPER_HOME/bin [root@hadoop0 software]# source /etc/profile
同样配置hadoop1、hadoop2的环境变量
2.2.3、配置文件修改[root@hadoop0 software]# cd /software/zk/conf [root@hadoop0 conf]# mv zoo_sample.cfg zoo.cfg [root@hadoop0 conf]# vi zoo.cfg dataDir=/software/zk/data server.0=hadoop0:2888:3888 server.1=hadoop1:2888:3888 server.2=hadoop2:2888:38882.2.4、创建数据存放目录
[root@hadoop0 zk]# mkdir data [root@hadoop0 zk]# cd data [root@hadoop0 data]# vi myid 02.2.5、拷贝到其他节点上
[root@hadoop0 software]# scp -r /software/zk hadoop1:/software/ [root@hadoop0 software]# scp -r /software/zk hadoop2:/software/2.2.6、修改其他节点的myid
hadoop0 对应0,hadoop1对应1,hadoop2对应2,保证每个节点值不同就行
[root@hadoop1 data]# vi myid 1
[root@hadoop2 data]# vi myid 22.2.7、分别在三台服务器启动zookeeper
cd /software/zk/bin
[root@hadoop0 bin]# zkServer.sh start [root@hadoop1 bin]# zkServer.sh start [root@hadoop2 bin]# zkServer.sh start2.2.8、查看状态
只会有一个节点是Mode: leader
[root@hadoop0 bin]# zkServer.sh status Using config: /software/zk/bin/../conf/zoo.cfg Mode: follower
[root@hadoop1 ~]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /software/zk/bin/../conf/zoo.cfg Mode: leader
[root@hadoop2 ~]# zkServer.sh status ZooKeeper JMX enabled by default Using config: /software/zk/bin/../conf/zoo.cfg Mode: follower2.3、hbase安装 2.3.1、解压并重命名
[root@hadoop0 software]# tar -zxvf hbase-2.3.7-bin.tar.gz [root@hadoop0 software]# mv hbase-2.3.7 hbase2.3.2、配置环境变量
[root@hadoop0 software]# vi /etc/profile export Hbase_HOME=/software/hbase export PATH=.:$PATH:$Hbase_HOME/bin [root@hadoop0 software]# source /etc/profile
同样配置hadoop1、hadoop2的环境变量
2.3.3、配置文件修改在目录/software/hbase/conf下面,配置hbase-env.sh
[root@hadoop0 conf]# vi hbase-env.sh export JAVA_HOME=/software/jdk/ export Hbase_MANAGES_ZK=false 注:这里false代表的使用的外置的zookeeper,true代表的使用的是hadoop自带的zookeeper2.3.4、配置hbase-site.xml
注:hdfs://hadoop0:8020/hbase的端口和hadoop保持一致,hadoop没有配置端口默认是
8020,配置了一般配置9000端口。
[root@hadoop0 conf]# vi hbase-site.xml2.3.5、配置regionservers文件(存放的region server的hostname)hbase.rootdir hdfs://hadoop0:8020/hbase hbase.cluster.distributed true hbase.zookeeper.quorum hadoop0,hadoop1,hadoop2 hbase.zookeeper.property.dataDir /software/zk
hadoop1 hadoop22.3.6、拷贝到其他节点上
[root@hadoop0 software]# scp -r /software/hbase hadoop1:/software/ [root@hadoop0 software]# scp -r /software/hbase hadoop2:/software/2.3.7、启动
必要前置条件:
- a、启动ZooKeeper
- b、启动Hadoop
在主节点hadoop0上面启动,进入/software/hbase/bin目录
配置了环境变量可以在任意目录下使用start-hbase.sh都可以启动。
[root@hadoop0 bin]# ./start-hbase.sh2.3.8、停止
[root@hadoop0 bin]# ./stop-hbase.sh2.3.9、shell命令验证
在主服务器上面
[root@hadoop0 conf]# hbase shell SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/software/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/software/hbase/lib/client-facing-thirdparty/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Hbase Shell Use "help" to get list of supported commands. Use "exit" to quit this interactive shell. For Reference, please visit: http://hbase.apache.org/2.0/book.html#shell Version 2.3.7, r8b2f5141e900c851a2b351fccd54b13bcac5e2ed, Tue Oct 12 16:38:55 UTC 2021 Took 0.0011 seconds hbase(main):001:0> list TABLE 0 row(s) Took 0.9189 seconds => [] hbase(main):002:0> quit
quit 或 exit 退出
2.3.10、界面化验证,浏览器访问:http://hadoop0:16010(原来0.x版本是60010端口)
注:先停止hbase,再停止hadoop,最后停止zkServer
有就删除,没有就不用管。
[root@hadoop0 ~]# rpm -qa | grep mysql [root@hadoop0 ~]# rpm -e mysql-libs-5.1.73-8.el6_8.x86_64 --nodeps [root@hadoop0 ~]# rpm -qa | grep mariadb mariadb-libs-5.5.60-1.el7_5.x86_64 [root@hadoop0 software]# rpm -e --nodeps mariadb-libs-5.5.60-1.el7_5.x86_642.4.2、删除mysql分散的文件夹
[root@hadoop0 software]# whereis mysql [root@hadoop0 software]# rm -rf /usr/lib64/mysql2.4.3、 解压重命名
[root@hadoop0 software]# tar -xvf MySQL-5.1.72-1.glibc23.x86_64.rpm-bundle.tar [root@hadoop0 software]# tar -xvf MySQL-5.1.72-1 mysql2.4.4、 安装 server
[root@hadoop0 software]# cd mysql [root@hadoop0 mysql]# ll total 138496 -rw-r--r--. 1 7155 wheel 7403559 Sep 12 2013 MySQL-client-5.1.72-1.glibc23.x86_64.rpm -rw-r--r--. 1 7155 wheel 65449113 Sep 12 2013 MySQL-debuginfo-5.1.72-1.glibc23.x86_64.rpm -rw-r--r--. 1 7155 wheel 8791454 Sep 12 2013 MySQL-devel-5.1.72-1.glibc23.x86_64.rpm -rw-r--r--. 1 7155 wheel 20787882 Sep 12 2013 MySQL-embedded-5.1.72-1.glibc23.x86_64.rpm -rw-r--r--. 1 7155 wheel 19526788 Sep 12 2013 MySQL-server-5.1.72-1.glibc23.x86_64.rpm -rw-r--r--. 1 7155 wheel 1883524 Sep 12 2013 MySQL-shared-5.1.72-1.glibc23.x86_64.rpm -rw-r--r--. 1 7155 wheel 3317236 Sep 12 2013 MySQL-shared-compat-5.1.72-1.glibc23.x86_64.rpm -rw-r--r--. 1 7155 wheel 14643692 Sep 12 2013 MySQL-test-5.1.72-1.glibc23.x86_64.rpm [root@hadoop0 mysql]# rpm -ivh MySQL-server-5.1.72-1.glibc23.x86_64.rpm2.4.5、 安装客户端
[root@hadoop0 mysql]# rpm -ivh MySQL-client-5.1.72-1.glibc23.x86_64.rpm2.4.6、 登陆 MYSQL(登录之前千万记得一定要启动 mysql 服务)
启动MySQL服务
[root@hadoop0 mysql]# service mysql start
登录MySQL
然后登陆,初始密码在 /root/.mysql_secret 这个文件里
[root@hadoop0 software]# mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 190 Server version: 5.1.72 MySQL Community Server (GPL) Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>
或者用root免密登录
2.4.7、 修改密码先用root帐户登入mysql,然后执行:
方式1
mysql>UPDATE user SET password=PASSWORd('123456') WHERe user='root';
mysql>FLUSH PRIVILEGES;
方式2
mysql> set PASSWORD=PASSWORd('root');
退出登陆验证,看是否改密码成功
2.4.8、增加远程登陆权限,执行以下两个命令:mysql> grant all privileges on *.* to 'root'@'%' identified by 'root' with grant option; mysql> flush privileges;2.4.9、创建hive数据库
创建hive数据库 >create database hive; 创建hive用户并设置密码 >create user 'hive'@'%' identified by 'hive'; 授权 >grant all privileges on hive.* to 'hive'@'%'; 刷新权限 >flush privileges;2.5、hive安装 2.5.1、解压重命名
[root@hadoop0 software]# tar -zxvf apache-hive-3.1.2-bin.tar.gz [root@hadoop0 software]# mv apache-hive-3.1.2-bin hive2.5.2、 修改配置文件
目录下面没有,直接vim就可以了。
[root@hadoop0 software]# cd hive [root@hadoop0 conf]# vim hive-site.xml2.5.3、加入 MySQL 驱动包(mysql-connector-java-5.1.49-bin.jar)javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hivedb?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName hive javax.jdo.option.ConnectionPassword hive hive.metastore.warehouse.dir /user/hive/warehouse
该 jar 包放置在 hive 的根路径下的 lib 目录
2.5.4、配置环境变量[root@hadoop0 conf]# vi /etc/profile [root@hadoop0 conf]# export HIVE_HOME=/software/hive [root@hadoop0 conf]# export PATH=$PATH:$HIVE_HOME/bin [root@hadoop0 conf]# source /etc/profile
同样配置hadoop1、hadoop2的环境变量
2.5.5、 验证 Hive 安装[root@hadoop0 lib]# hive --help Usage ./hive2.5.6、 初始化元数据库--service serviceName Service List: beeline cli help hiveburninclient hiveserver2 hiveserver hwi jar lineage metastore metatool orcfiledump rcfilecat schemaTool version Parameters parsed: --auxpath : Auxillary jars --config : Hive configuration directory --service : Starts specific service/component. cli is default Parameters used: HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory HIVE_OPT : Hive options For help on a particular service: ./hive --service serviceName --help Debug help: ./hive --debug --help
注意:当 hive 依赖普通分布式 hadoop 集群,第一次启动的时候会自动进行初始化。
[root@hadoop0 lib]# schematool -dbType mysql -initSchema2.5.7、拷贝到其他节点上
[root@hadoop0 software]# scp -r /software/hive hadoop1:/software/ [root@hadoop0 software]# scp -r /software/hive hadoop2:/software/2.5.8、 启动 Hive 客户端
[root@hadoop0 bin]# hive --service cli Logging initialized using configuration in jar:file:/software/hive/lib/hive-common-1.2.1.jar!/hive-log4j.properties hive>2.5.9、退出 Hive
hive> quit; 或者 hive> exit;2.6、kafka安装 2.6.1、解压重命名
[root@hadoop0 software]# tar -zxvf kafka_2.12-2.7.1.tgz [root@hadoop0 software]# mv kafka_2.12-2.7.1 kafka2.6.2、 修改配置文件
[root@hadoop0 software]# cd kafka [root@hadoop0 kafka]# mkdir data [root@hadoop0 kafka]# cd config [root@hadoop0 config]# vi server.properties #每台brokerId都不相同 broker.id=0 #在log.retention.hours=168 后新增下面三项 message.max.byte=5242880 default.replication.factor=1 replica.fetch.max.bytes=5242880 #设置zookeeper的连接端口 zookeeper.connect=hoatname:2181,hoatname:2181,hostname:2181 #日志文件的目录 log.dirs=/software/kafka/logs/2.6.3、拷贝到其他节点上
[root@hadoop0 software]# scp -r /software/kafka hadoop1:/software/ [root@hadoop0 software]# scp -r /software/kafka hadoop2:/software/
注意修改集群节点kafka/config/server.properties 中broker.id值,每个节点唯一即可
2.7、spark安装 2.7.1、解压重命名[root@hadoop0 software]# tar -zxvf spark-3.1.2-bin-hadoop3.2.tgz [root@hadoop0 software]# mv spark-3.1.2-bin-hadoop3.2 spark2.7.2、 修改配置spark-env.sh
[root@hadoop0 software]# cd spark/conf [root@hadoop0 conf]# vi spark-env.sh export JAVA_HOME=/root/training/jdk1.7.0_75 export SPARK_MASTER_HOST=spark82 export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=1 export SPARK_WORKER_MEMORY=1024m2.7.3、修改配置slave
[root@hadoop0 conf]# vi slave hadoop1 hadoop22.7.4、拷贝到其他节点上
[root@hadoop0 software]# scp -r /software/spark hadoop1:/software/ [root@hadoop0 software]# scp -r /software/spark hadoop2:/software/2.7.5、启动Spark集群:start-all.sh



