IP1 Master IP2 Slave1 IP3 Slave2 请修改自己对应的IPSSH免密登录 1、在Master上生成秘钥
ssh-keygen -t rsa
一路回车,都设置为默认值,然后再当前用户的Home目录下的.ssh目录中会生成公钥文件(id_rsa.pub)和私钥文件(id_rsa)。
2、分发公钥sh-copy-id Master sh-copy-id Slave1 sh-copy-id Slave2
同样的在Slave1、Slave2上生成公钥和私钥后,将公钥分发到三台机器上。
环境安装 1、java下载 jdk-8u191-linux-x64.tar.gz,解压至 /usr/local 目录下 (所有机器)
tar -zxvf jdk-8u191-linux-x64.tar.gz mv jdk1.8.0_191 /usr/local
添加Java环境变量,在/etc/profile中添加:
export JAVA_HOME=/usr/local/jdk1.8.0_191 PATH=$JAVA_HOME/bin:$PATH CLASSPATH=.:$JAVA_HOME/lib/rt.jar export JAVA_HOME PATH CLASSPATH2、SCALA
下载scala安装包scala-2.12.7.tgz,解压至 /usr/share 目录下 (所有机器)
tar -zxvf scala-2.12.7.tgz mv scala-2.12.7 /usr/share
添加Scala环境变量,在/etc/profile中添加:
export SCALA_HOME=/usr/share/scala-2.12.7 export PATH=$SCALA_HOME/bin:$PATH
刷新生效配置
source /etc/profilehadoop完全分布式安装
下载地址
https://archive.apache.org/dist/hadoop/common/
以下是在Master节点操作:
1)下载二进制包hadoop-2.8.5.tar.gz
2)解压并移动到相应目录,我习惯将软件放到/opt目录下,命令如下:
tar -zxvf hadoop-2.8.5.tar.gz mv hadoop-2.8.5 /opt
3)修改相应的配置文件。
修改/etc/profile,增加如下内容:
export HADOOP_HOME=/opt/hadoop-2.8.5/ export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_ROOT_LOGGER=INFO,console export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
生效配置
source /etc/profile
修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh (修改成自己对应的路径)
export JAVA_HOME=/usr/local/jdk1.8.0_191
修改$HADOOP_HOME/etc/hadoop/slaves,将原来的localhost删除,改成如下内容:
Slave1 Slave2
修改$HADOOP_HOME/etc/hadoop/core-site.xml
fs.defaultFS hdfs://Master:9000 io.file.buffer.size 131072 hadoop.tmp.dir /opt/hadoop-2.8.5/tmp
修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml
dfs.namenode.secondary.http-address Master:50090 dfs.replication 2 dfs.namenode.name.dir file:/opt/hadoop-2.8.5/hdfs/name dfs.datanode.data.dir file:/opt/hadoop-2.8.5/hdfs/data
复制 mapred-site.xml.template ,生成xml,命令如下:
cp mapred-site.xml.template mapred-site.xml
修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
mapreduce.framework.name yarn mapreduce.jobhistory.address Master:10020 mapreduce.jobhistory.address Master:19888
修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.address Master:8032 yarn.resourcemanager.scheduler.address Master:8030 yarn.resourcemanager.resource-tracker.address Master:8031 yarn.resourcemanager.admin.address Master:8033 yarn.resourcemanager.webapp.address Master:8088
复制Master节点的hadoop文件夹到Slave1和Slave2上
scp -r /opt/hadoop-2.8.5 root@Slave1:/opt scp -r /opt/hadoop-2.8.5 root@Slave2:/opt
在Slave1和Slave2上分别修改/etc/profile,过程同Master一样。
在Master节点启动集群,启动之前格式化一下namenode:
#格式化 hadoop namenode -format
#启动 /opt/hadoop-2.8.5/sbin/start-all.sh
至此hadoop的完全分布式环境搭建完毕。
查看集群是否启动成功:
Master显示: SecondaryNameNode ResourceManager NameNode Slave显示: NodeManager DataNodeMYSQL(已安装跳过)
wget -i -c http://dev.mysql.com/get/mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql57-community-release-el7-10.noarch.rpm
yum -y install mysql-community-server #时间稍微有点长
启动mysql
systemctl start mysqld.service
查看密码()
grep "password" /var/log/mysqld.log
mysql -uroot -p
输入初始密码
ALTER USER 'root'@'localhost' IDENTIFIED BY 'new password';
其中‘new password’替换成你要设置的密码,注意:密码设置必须要大小写字母数字和特殊符号(,/’;:等),不然不能配置成功
开启远程
grant all privileges on *.* to 'root'@'%' identified by 'password' with grant option;
flush privileges; exitHIVE
下载地址
https://dlcdn.apache.org/hive/
tar -zxvf apache-hive-2.3.9-bin.tar.gz -C /opt cd /opt/ mv apache-hive-2.3.9-bin/ hive cd hive cd conf/
复制hive-env.sh.template文件
cp hive-env.sh.template hive-env.sh
在hive-env.sh文件中添加
export HADOOP_HOME=/opt/hadoop-2.8.5/ export HIVE_CONF_DIR=/opt/hive/conf
在hadoop上创建文件夹
hadoop fs -mkdir /tmp hadoop fs -mkdir -p /user/hive/warehouse hadoop fs -chmod g+w /tmp hadoop fs -chmod g+w /user/hive/warehouse
解压mysql
tar -zxvf mysql-connector-java-5.1.49.tar.gz -C /opt/ cd /opt/mysql-connector-java-5.1.49/ cp mysql-connector-java-5.1.49-bin.jar /opt/hive/lib/ cd /opt/hive/conf/ touch hive-site.xml
在hive-site.xml添加一下内容(下面的mysql需要自己信息,根据需要修改)
javax.jdo.option.ConnectionURL jdbc:mysql://Master:3306/metastore?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username to use against metastore database javax.jdo.option.ConnectionPassword 000000 password to use against metastore database
初始化hive
cd /opt/hive bin/schematool -dbType mysql -initSchema
启动并测试
bin/hive show databasesSpark安装
以下操作都在Master节点进行。
下载地址
https://archive.apache.org/dist/spark/
1)下载二进制包spark-2.4.0-bin-hadoop2.7.tgz
2)解压并移动到相应目录,命令如下:
tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz mv spark-2.4.0-bin-hadoop2.7 /opt
3)修改相应的配置文件。
修改/etc/profie,增加如下内容:
进入/opt/spark-2.4.0-bin-hadoop2.7/conf
复制 spark-env.sh.template 成 spark-env.sh
cp spark-env.sh.template spark-env.sh
修改$SPARK_HOME/conf/spark-env.sh,添加如下内容:(IP1改成自己IP)
export JAVA_HOME=/usr/local/jdk1.8.0_191 export SCALA_HOME=/usr/share/scala export HADOOP_HOME=/opt/hadoop-2.8.5 export HADOOP_CONF_DIR=/opt/hadoop-2.8.5/etc/hadoop export SPARK_MASTER_IP=IP1 export SPARK_MASTER_HOST=IP1 export SPARK_LOCAL_IP=IP1 export SPARK_WORKER_MEMORY=1g export SPARK_WORKER_CORES=2 export SPARK_HOME=/opt/spark-2.4.0-bin-hadoop2.7 export SPARK_DIST_CLASSPATH=$(/opt/hadoop-2.8.5/bin/hadoop classpath)
复制slaves.template成slaves
cp slaves.template slaves
修改$SPARK_HOME/conf/slaves,添加如下内容:
Master Slave1 Slave2
将配置好的spark文件复制到Slave1和Slave2节点。
scp /opt/spark-2.4.0-bin-hadoop2.7 root@Slave1:/opt scp /opt/spark-2.4.0-bin-hadoop2.7 root@Slave2:/opt
修改Slave1和Slave2配置。
在Slave1和Slave2上分别修改/etc/profile,增加Spark的配置,过程同Master一样。
在Slave1和Slave2修改$SPARK_HOME/conf/spark-env.sh,将export SPARK_LOCAL_IP=114.55.246.88改成Slave1和Slave2对应节点的IP。
在Master节点启动集群。
/opt/spark-2.4.0-bin-hadoop2.7/sbin/start-all.sh
查看集群是否启动成功:
jps Master在Hadoop的基础上新增了: Master Slave在Hadoop的基础上新增了: Worker



