栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Spark 分布式环境搭建

Spark 分布式环境搭建

Spark 分布式环境搭建 准备
  1. 两台华为云服务器(一台Master,一台Worker)
  2. Ubantu20系统
  3. 软件 :
    jdk1.8
    hadoop-2.7.3.tar
    scala-2.13.0.tgz
    spark-2.1.0-bin-hadoop2.7.tgz
网络配置

vim /etc/hosts 配置网络别名

192.168.0.237 Master
192.168.0.14 Worker1
配置ssh无密码登录worker
#在Master执行,生成公钥
scp /root/.ssh/id_rsa.pub root@worker1:/root/.ssh/id_rsa.pub.master 
#从master节点拷贝id_rsa.pub到worker主机上,并且改名为id_rsa.pub.master

scp /etc/hosts root@workerN:/etc/hosts   
#统一hosts文件,统一别名
#在对应的主机下执行如下命令:
cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys 
#master主机
cat /root/.ssh/id_rsa.pub.master >> /root/.ssh/authorized_keys 
#worker1主机
安装基础环境

我的所有软件都安装在/usr/local/目录下,重点是配置/etc/profile文件

export JAVA_HOME=/usr/local/jdk1.8
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=.:${JAVA_HOME}/bin:$PATH


export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin


export HADOOP_HOME=/usr/local/hadoop
export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

export SPARK_HOME=/usr/local/spark

使用source /etc/profile 使其生效,再将配置scp拷贝到wroker,在source命令让worker节点配置生效

配置完后,可以用scp 把配置好的软件从Master拷贝到Worker

scp -r 软件目录 用户名@IP地址:拷贝目标 
scp -r /usr/local/hadoop root@Worker:/usr/local/
各种配置文件 1. Hadoop配置 1. $HADOOP_HOME/etc/hadoop/hadoop-env.sh

修改JAVA_HOME变量

export JAVA_HOME=/usr/java/jdk1.8.0_112/
2. $HADOOP_HOME/etc/hadoop/slaves
worker1
3. $HADOOP_HOME/etc/hadoop/core-site.xml

        
                fs.defaultFS
                hdfs://master:9000
        
        
         io.file.buffer.size
         131072
       
        
                hadoop.tmp.dir
                /usr/local/hadoop/tmp
        

                         
4. $HADOOP_HOME/etc/hadoop/hdfs-site.xml

    
      dfs.namenode.secondary.http-address
      master:50090
    
    
      dfs.replication
      2
    
    
      dfs.namenode.name.dir
      file:/usr/local/hadoop/hdfs/name
    
    
      dfs.datanode.data.dir
      file:/usr/local/hadoop/hdfs/data
    

                 
5. $HADOOP_HOME/etc/hadoop/mapred-site.xml

复制template,生成xml:
cp mapred-site.xml.template mapred-site.xml


 
    mapreduce.framework.name
    yarn
  
  
          mapreduce.jobhistory.address
          master:10020
  
  
          mapreduce.jobhistory.address
          master:19888
  

                       
6. $HADOOP_HOME/etc/hadoop/yarn-site.xml


         
          yarn.nodemanager.aux-services
          mapreduce_shuffle
     
     
           yarn.resourcemanager.address
           master:8032
     
     
          yarn.resourcemanager.scheduler.address
          master:8030
      
     
         yarn.resourcemanager.resource-tracker.address
         master:8031
     
     
         yarn.resourcemanager.admin.address
         master:8033
     
     
         yarn.resourcemanager.webapp.address
         master:8088
     


                           

至此master节点的hadoop搭建完毕,再启动之前我们需要格式化一下namenode

hadoop namenode -format
2. Spark 1. $SPARK_HOME/conf/spark-env.sh
cp spark-env.sh.template spark-env.sh
export SCALA_HOME=/usr/local/scala
export JAVA_HOME=/usr/local/jdk1.8/
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
2. $SPARK_HOME/conf/slaves
cp slaves.template slaves
master
worker1

再把配好的spark scp到worker1节点

启动服务

封装成启动脚本

#!/bin/bash
echo -e "33[31m ========Start The Cluster======== 33[0m"
echo -e "33[31m Starting Hadoop Now !!! 33[0m"
/usr/local/hadoop/sbin/start-all.sh
echo -e "33[31m Starting Spark Now !!! 33[0m"
/usr/local/spark/sbin/start-all.sh
echo -e "33[31m The Result Of The Command "jps" :  33[0m"
jps  #打印正在运行的java进程
echo -e "33[31m ========END======== 33[0m"

输入http://ip:8080 可以进入spark管理系统

测试Hadoop
hadoop fs -mkdir -p /Hadoop/Input
hadoop fs -put 测试文件 /Hadoop/Input
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar 测试文件 /Hadoop/Input /Hadoop/Output
hadoop fs -cat /Hadoop/Output/*  #查看输出结果
测试Spark

spark-shell 开启交互式窗口

val file=sc.textFile("hdfs://master:9000/Hadoop/Input/测试文件")
val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
rdd.collect()
rdd.foreach(println)

查看workCount的结果

关闭脚本

#!/bin/bash
echo -e "33[31m ===== Stoping The Cluster ====== 33[0m"
echo -e "33[31m Stoping Spark Now !!! 33[0m"
/usr/local/spark/sbin/stop-all.sh
echo -e "33[31m Stopting Hadoop Now !!! 33[0m"
/usr/local/hadoop/sbin/stop-all.sh
echo -e "33[31m The Result Of The Command "jps" :  33[0m"
jps
echo -e "33[31m ======END======== 33[0m"
~                                                                                                                                                                                                  
~                                                        
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/342822.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号