栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Cloudera Manager拓展SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel

Cloudera Manager拓展SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel

一、准备工作
  1. Centos6.5的系统
  2. Cloudera Manager 版本5.6
  3. Spark2.3依赖的jdk1.8
  4. 网上的资料提供的下载地址下载不了,只能基于原有的SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101-el7.parcel修改
  5. 从头制作可以参考如下

参考地址1

二、开始适配 1.parcel包制作

包名规则:
以SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel为例,第一个-前是包名,最后一个-后是运行平台,el6是代表centos6系统,el7表示centos7系统,中间为版本号,修改的重点也是版本号好对应后续文件

修改meta/parcel.json为

{
    "schema_version": 1,
	"name": "SPARK2",
	"version": "2.3.0.cloudera3-1.cdh5.6.0.p0.1",
    "components": [
        {
            "name": "spark2n",
            "pkg_release": "nan",
            "pkg_version": "nan",
            "version": "2.3.0.cloudera3n"
        }
    ],
    "depends": "CDH (>= 5.5), CDH (<< 6)",
    "extraVersionInfo": {
        "baseVersion": "cdh5.6",
        "patchCount": "0"
    },
    "groups": [
        "spark"
    ],
    
    "packages": [],
    "provides": [
        "spark2"
    ],
    "replaces": "SPARK",
    "scripts": {
        "defines": "spark2_env.sh"
    },
    "setActiveSymlink": true,
    "users": {
        "spark": {
            "extra_groups": [],
            "home": "/var/lib/spark",
            "longname": "Spark",
            "shell": "/sbin/nologin"
        }
    }
    
}

修改meta/spark2_env.sh为

#!/bin/bash
CDH_DIRNAME=${PARCEL_DIRNAME:-"SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6"}
export CDH_SPARK2_HOME=$PARCELS_ROOT/$CDH_DIRNAME/lib/spark2

修改libspark2clouderaspark2_version.properties

# Autogenerated build properties
version=2.3.0.cloudera3
git.hash=9f5baab06f127486a030024877fc13a3992f100f
cloudera.hash=9f5baab06f127486a030024877fc13a3992f100f
cloudera.cdh.hash=na
cloudera.cdh-packaging.hash=na
cloudera.base-branch=na
cloudera.build-branch=
cloudera.pkg.version=na
cloudera.pkg.release=na
cloudera.cdh.release=2.3.0.cloudera3
cloudera.build.time=2018.04.11-01:17:20GMT
cloudera.pkg.name=spark2

将parcel包传到系统上,注意原包名不要带系统版本

# ll
total 185416
drwxr-xr-x 6 root root      4096 Nov 26 11:15 SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1

然后修改一些文件的执行权限

# cd SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/bin/
# chmod +x spark2-shell pyspark2 spark2-submit
# ll
total 12
-rwxr-xr-x 1 root root 692 Apr 11  2018 pyspark2
-rwxr-xr-x 1 root root 653 Apr 11  2018 spark2-shell
-rwxr-xr-x 1 root root 654 Apr 11  2018 spark2-submit

# cd ../lib/spark2/bin/
# chmod +x docker-image-tool.sh find-spark-home  pyspark run-example spark-class spark-shell spark-submit
# ll
total 96
-rw-r--r-- 1 root root 1064 Apr 11  2018 beeline.cmd
-rwxr-xr-x 1 root root 3826 Apr 11  2018 docker-image-tool.sh
-rwxr-xr-x 1 root root 1933 Apr 11  2018 find-spark-home
-rw-r--r-- 1 root root 2681 Apr 11  2018 find-spark-home.cmd
-rw-r--r-- 1 root root 1892 Apr 11  2018 load-spark-env.cmd
-rw-r--r-- 1 root root 2739 Apr 11  2018 load-spark-env.sh
-rwxr-xr-x 1 root root 2989 Apr 11  2018 pyspark
-rw-r--r-- 1 root root 1540 Apr 11  2018 pyspark2.cmd
-rw-r--r-- 1 root root 1170 Apr 11  2018 pyspark.cmd
-rwxr-xr-x 1 root root 1030 Apr 11  2018 run-example
-rw-r--r-- 1 root root 1223 Apr 11  2018 run-example.cmd
-rwxr-xr-x 1 root root 3196 Apr 11  2018 spark-class
-rw-r--r-- 1 root root 2545 Apr 11  2018 spark-class2.cmd
-rw-r--r-- 1 root root 1180 Apr 11  2018 spark-class.cmd
-rw-r--r-- 1 root root 1097 Apr 11  2018 sparkR2.cmd
-rw-r--r-- 1 root root 1168 Apr 11  2018 sparkR.cmd
-rwxr-xr-x 1 root root 3017 Apr 11  2018 spark-shell
-rw-r--r-- 1 root root 1631 Apr 11  2018 spark-shell2.cmd
-rw-r--r-- 1 root root 1178 Apr 11  2018 spark-shell.cmd
-rw-r--r-- 1 root root 1118 Apr 11  2018 spark-sql2.cmd
-rw-r--r-- 1 root root 1173 Apr 11  2018 spark-sql.cmd
-rwxr-xr-x 1 root root 1040 Apr 11  2018 spark-submit
-rw-r--r-- 1 root root 1155 Apr 11  2018 spark-submit2.cmd
-rw-r--r-- 1 root root 1180 Apr 11  2018 spark-submit.cmd

修改conf文件为软连接

# pwd
/home/root/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/lib/spark2
# rm -rf conf 
# ln -s /etc/spark2/conf conf
# ll 
total 112
drwxr-xr-x 2 root root  4096 Nov 18 17:13 bin
drwxr-xr-x 2 root root  4096 Nov 18 17:13 cloudera
lrwxrwxrwx 1 root root    16 Nov 24 13:51 conf -> /etc/spark2/conf
drwxr-xr-x 5 root root  4096 Nov 18 17:13 data
drwxr-xr-x 4 root root  4096 Nov 18 17:13 examples
drwxr-xr-x 2 root root 12288 Nov 18 17:13 jars
drwxr-xr-x 2 root root  4096 Nov 18 17:13 kafka-0.10
drwxr-xr-x 2 root root  4096 Nov 18 17:13 kafka-0.9
-rw-r--r-- 1 root root 18045 Apr 11  2018 LICENSE
drwxr-xr-x 2 root root  4096 Nov 18 17:13 licenses
-rw-r--r-- 1 root root 24913 Apr 11  2018 NOTICE
drwxr-xr-x 6 root root  4096 Nov 18 17:13 python
-rw-r--r-- 1 root root  3809 Apr 11  2018 README.md
-rw-r--r-- 1 root root   313 Apr 11  2018 RELEASE
drwxr-xr-x 2 root root  4096 Nov 18 17:13 sbin
-rw-r--r-- 1 root root    20 Apr 11  2018 work
drwxr-xr-x 2 root root  4096 Nov 18 17:13 yarn

打包,注意打包名字后加上系统版本

tar -zcvf SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1

验证parcel是否可用
官网检验包地址链接

# java -jar validator.jar -f SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel 
Validating: SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel
Validating: SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/meta/parcel.json
Validating: SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/meta/alternatives.json
Validation succeeded.

生成hash校验文件

# sha1sum SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel | cut -d ' ' -f 1 > SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha

# cat SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha
b95e64a0c5a0a183c75b2fdb6c284c4ca2c2aeaa

将parcel拷贝到Cloudera下,并修改文件所属用户和分组

# cp SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel /opt/cloudera/parcel-repo/
# cp SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha /opt/cloudera/parcel-repo/

# chown -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel
# chgrp -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel

# chown -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha
# chgrp -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha

# ll /opt/cloudera/parcel-repo/
total 1607468
-rw-r----- 1 cloudera-scm cloudera-scm 1457371397 Nov 25 18:20 CDH-5.6.0-1.cdh5.6.0.p0.45-el6.parcel
-rw-r----- 1 cloudera-scm cloudera-scm         41 Nov 25 18:20 CDH-5.6.0-1.cdh5.6.0.p0.45-el6.parcel.sha
-rw-r--r-- 1 cloudera-scm cloudera-scm  188658409 Nov 26 10:45 SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel
-rw-r--r-- 1 cloudera-scm cloudera-scm         41 Nov 26 10:45 SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha

然后web页面上配置jdk

检查新的parcel,分配激活

2. 制作csd

修改meta/version为

2.3.0.cloudera3

修改descriptor/service.sdl,主要是修改版本信息,有一些冲突的删掉了

{
  "name" : "SPARK2_ON_YARN",
  "label" : "Spark 2",
  "description" : "Apache Spark is an open source cluster computing system. This service runs Spark 2 as an application on YARN. Before adding this service, ensure that you have installed the Spark2 binaries, which are not included in CDH.",
  "version" : "2.3.0.cloudera3",
  "compatibility" : { "cdhVersion" : { "min" : "5.6", "max" : "6.0" } },
  "runAs" : {
    "user" : "spark",
    "group" : "spark",
    "principal" : "spark"
  },
  "inExpressWizard" : true,
  "icon" : "images/icon.png",
  "parcel" : {
    "repoUrl" : "http://archive.cloudera.com/spark2/parcels/2.3.0.cloudera3/",
    "requiredTags" : ["spark2", "cdh"],
    "optionalTags" : ["spark-plugin", "spark2-plugin"]
  },
......
//以下是删除部分
    {
        "name" : "navigator_lineage_enabled",
        "type" : "provided"
    }
   {
              "key" : "spark.lineage.enabled",
              "value" : "${navigator_lineage_enabled}"
            }

然后上传打包

# jar -cvf SPARK2_ON_YARN-2.3.0.cloudera3.jar *
# ll
total 44
drwxr-xr-x 3 root root  4096 Nov 22 13:30 aux
drwxr-xr-x 2 root root  4096 Nov 19 11:59 descriptor
drwxr-xr-x 2 root root  4096 Nov 22 10:46 images
drwxr-xr-x 2 root root  4096 Nov 22 13:31 meta
drwxr-xr-x 2 root root  4096 Nov 22 13:41 meta-INF
drwxr-xr-x 2 root root  4096 Nov 22 10:48 scripts
-rw-r--r-- 1 root root 18976 Nov 26 10:03 SPARK2_ON_YARN-2.3.0.cloudera3.jar

复制jar包到Cloudera下,然后重启服务

# cp SPARK2_ON_YARN-2.3.0.cloudera3.jar /opt/cloudera/csd/

# service cloudera-scm-server restart

添加服务

找到spark2,如果没有请查看日志,可能csd中json有些冲突

按需求选择

添加Spark History 和Gateway,之后一直继续,
等待安装完毕
验证

# su hdfs 
$ spark2-shell 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://nn.root.com:4040
Spark context available as 'sc' (master = yarn, app id = application_1637893774929_0001).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/_,_/_/ /_/_   version 2.3.0.cloudera2
      /_/
         
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

查看18089端口

三、问题 1. 启动spark2-shell报错
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
        at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
        at java.lang.Class.getMethod0(Class.java:3018)
        at java.lang.Class.getMethod(Class.java:1784)
        at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 7 more

可以查看以下两点
软连接是否有问题

# 正常的软连接
# ll /etc/spark2/conf
lrwxrwxrwx 1 root root 29 Nov 26 10:50 /etc/spark2/conf -> /etc/alternatives/spark2-conf
# ll /etc/alternatives/spark2-conf
lrwxrwxrwx 1 root root 40 Nov 26 10:50 /etc/alternatives/spark2-conf -> /etc/spark2/conf.cloudera.spark2_on_yarn

# 没有分配gateway角色的软连接,这个软连接也可以,不过我的这个包下没有任何东西,所以要按上边的进行链接
# ll /etc/spark2/conf
lrwxrwxrwx 1 root root 29 Nov 26 10:50 /etc/spark2/conf -> /etc/alternatives/spark2-conf
# ll /etc/alternatives/spark2-conf
lrwxrwxrwx 1 root root 81 Nov 25 10:35 /etc/alternatives/spark2-conf -> /opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/etc/spark2/conf.dist

查看spark-env.sh中是否导入hadoop的包,

export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt" 
或者
export SPARK_DIST_CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath)

SPARK_EXTRA_LIB_PATH="/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/lib/native"
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/603966.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号