- 两台华为云服务器(一台Master,一台Worker)
- Ubantu20系统
- 软件 :
jdk1.8
hadoop-2.7.3.tar
scala-2.13.0.tgz
spark-2.1.0-bin-hadoop2.7.tgz
vim /etc/hosts 配置网络别名
192.168.0.237 Master 192.168.0.14 Worker1配置ssh无密码登录worker
#在Master执行,生成公钥 scp /root/.ssh/id_rsa.pub root@worker1:/root/.ssh/id_rsa.pub.master #从master节点拷贝id_rsa.pub到worker主机上,并且改名为id_rsa.pub.master scp /etc/hosts root@workerN:/etc/hosts #统一hosts文件,统一别名 #在对应的主机下执行如下命令: cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys #master主机 cat /root/.ssh/id_rsa.pub.master >> /root/.ssh/authorized_keys #worker1主机安装基础环境
我的所有软件都安装在/usr/local/目录下,重点是配置/etc/profile文件
export JAVA_HOME=/usr/local/jdk1.8
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=.:${JAVA_HOME}/bin:$PATH
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
export HADOOP_HOME=/usr/local/hadoop
export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/usr/local/spark
使用source /etc/profile 使其生效,再将配置scp拷贝到wroker,在source命令让worker节点配置生效
配置完后,可以用scp 把配置好的软件从Master拷贝到Worker
scp -r 软件目录 用户名@IP地址:拷贝目标 scp -r /usr/local/hadoop root@Worker:/usr/local/各种配置文件 1. Hadoop配置 1. $HADOOP_HOME/etc/hadoop/hadoop-env.sh
修改JAVA_HOME变量
export JAVA_HOME=/usr/java/jdk1.8.0_112/2. $HADOOP_HOME/etc/hadoop/slaves
worker13. $HADOOP_HOME/etc/hadoop/core-site.xml
4. $HADOOP_HOME/etc/hadoop/hdfs-site.xmlfs.defaultFS hdfs://master:9000 io.file.buffer.size 131072 hadoop.tmp.dir /usr/local/hadoop/tmp
5. $HADOOP_HOME/etc/hadoop/mapred-site.xmldfs.namenode.secondary.http-address master:50090 dfs.replication 2 dfs.namenode.name.dir file:/usr/local/hadoop/hdfs/name dfs.datanode.data.dir file:/usr/local/hadoop/hdfs/data
复制template,生成xml:
cp mapred-site.xml.template mapred-site.xml
6. $HADOOP_HOME/etc/hadoop/yarn-site.xmlmapreduce.framework.name yarn mapreduce.jobhistory.address master:10020 mapreduce.jobhistory.address master:19888
yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker.address master:8031 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088
至此master节点的hadoop搭建完毕,再启动之前我们需要格式化一下namenode
hadoop namenode -format2. Spark 1. $SPARK_HOME/conf/spark-env.sh
cp spark-env.sh.template spark-env.sh
export SCALA_HOME=/usr/local/scala export JAVA_HOME=/usr/local/jdk1.8/ export SPARK_MASTER_IP=master export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop2. $SPARK_HOME/conf/slaves
cp slaves.template slaves
master worker1
再把配好的spark scp到worker1节点
启动服务封装成启动脚本
#!/bin/bash echo -e " 33[31m ========Start The Cluster======== 33[0m" echo -e " 33[31m Starting Hadoop Now !!! 33[0m" /usr/local/hadoop/sbin/start-all.sh echo -e " 33[31m Starting Spark Now !!! 33[0m" /usr/local/spark/sbin/start-all.sh echo -e " 33[31m The Result Of The Command "jps" : 33[0m" jps #打印正在运行的java进程 echo -e " 33[31m ========END======== 33[0m"
输入http://ip:8080 可以进入spark管理系统
测试Hadoophadoop fs -mkdir -p /Hadoop/Input hadoop fs -put 测试文件 /Hadoop/Input hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar 测试文件 /Hadoop/Input /Hadoop/Output
hadoop fs -cat /Hadoop/Output/* #查看输出结果测试Spark
spark-shell 开启交互式窗口
val file=sc.textFile("hdfs://master:9000/Hadoop/Input/测试文件")
val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd.collect()
rdd.foreach(println)
查看workCount的结果
关闭脚本
#!/bin/bash echo -e " 33[31m ===== Stoping The Cluster ====== 33[0m" echo -e " 33[31m Stoping Spark Now !!! 33[0m" /usr/local/spark/sbin/stop-all.sh echo -e " 33[31m Stopting Hadoop Now !!! 33[0m" /usr/local/hadoop/sbin/stop-all.sh echo -e " 33[31m The Result Of The Command "jps" : 33[0m" jps echo -e " 33[31m ======END======== 33[0m" ~ ~



