- Centos6.5的系统
- Cloudera Manager 版本5.6
- Spark2.3依赖的jdk1.8
- 网上的资料提供的下载地址下载不了,只能基于原有的SPARK2-2.3.0.cloudera2-1.cdh5.13.3.p0.316101-el7.parcel修改
- 从头制作可以参考如下
参考地址1
二、开始适配 1.parcel包制作包名规则:
以SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel为例,第一个-前是包名,最后一个-后是运行平台,el6是代表centos6系统,el7表示centos7系统,中间为版本号,修改的重点也是版本号好对应后续文件
修改meta/parcel.json为
{
"schema_version": 1,
"name": "SPARK2",
"version": "2.3.0.cloudera3-1.cdh5.6.0.p0.1",
"components": [
{
"name": "spark2n",
"pkg_release": "nan",
"pkg_version": "nan",
"version": "2.3.0.cloudera3n"
}
],
"depends": "CDH (>= 5.5), CDH (<< 6)",
"extraVersionInfo": {
"baseVersion": "cdh5.6",
"patchCount": "0"
},
"groups": [
"spark"
],
"packages": [],
"provides": [
"spark2"
],
"replaces": "SPARK",
"scripts": {
"defines": "spark2_env.sh"
},
"setActiveSymlink": true,
"users": {
"spark": {
"extra_groups": [],
"home": "/var/lib/spark",
"longname": "Spark",
"shell": "/sbin/nologin"
}
}
}
修改meta/spark2_env.sh为
#!/bin/bash
CDH_DIRNAME=${PARCEL_DIRNAME:-"SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6"}
export CDH_SPARK2_HOME=$PARCELS_ROOT/$CDH_DIRNAME/lib/spark2
修改libspark2clouderaspark2_version.properties
# Autogenerated build properties version=2.3.0.cloudera3 git.hash=9f5baab06f127486a030024877fc13a3992f100f cloudera.hash=9f5baab06f127486a030024877fc13a3992f100f cloudera.cdh.hash=na cloudera.cdh-packaging.hash=na cloudera.base-branch=na cloudera.build-branch= cloudera.pkg.version=na cloudera.pkg.release=na cloudera.cdh.release=2.3.0.cloudera3 cloudera.build.time=2018.04.11-01:17:20GMT cloudera.pkg.name=spark2
将parcel包传到系统上,注意原包名不要带系统版本
# ll total 185416 drwxr-xr-x 6 root root 4096 Nov 26 11:15 SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1
然后修改一些文件的执行权限
# cd SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/bin/ # chmod +x spark2-shell pyspark2 spark2-submit # ll total 12 -rwxr-xr-x 1 root root 692 Apr 11 2018 pyspark2 -rwxr-xr-x 1 root root 653 Apr 11 2018 spark2-shell -rwxr-xr-x 1 root root 654 Apr 11 2018 spark2-submit # cd ../lib/spark2/bin/ # chmod +x docker-image-tool.sh find-spark-home pyspark run-example spark-class spark-shell spark-submit # ll total 96 -rw-r--r-- 1 root root 1064 Apr 11 2018 beeline.cmd -rwxr-xr-x 1 root root 3826 Apr 11 2018 docker-image-tool.sh -rwxr-xr-x 1 root root 1933 Apr 11 2018 find-spark-home -rw-r--r-- 1 root root 2681 Apr 11 2018 find-spark-home.cmd -rw-r--r-- 1 root root 1892 Apr 11 2018 load-spark-env.cmd -rw-r--r-- 1 root root 2739 Apr 11 2018 load-spark-env.sh -rwxr-xr-x 1 root root 2989 Apr 11 2018 pyspark -rw-r--r-- 1 root root 1540 Apr 11 2018 pyspark2.cmd -rw-r--r-- 1 root root 1170 Apr 11 2018 pyspark.cmd -rwxr-xr-x 1 root root 1030 Apr 11 2018 run-example -rw-r--r-- 1 root root 1223 Apr 11 2018 run-example.cmd -rwxr-xr-x 1 root root 3196 Apr 11 2018 spark-class -rw-r--r-- 1 root root 2545 Apr 11 2018 spark-class2.cmd -rw-r--r-- 1 root root 1180 Apr 11 2018 spark-class.cmd -rw-r--r-- 1 root root 1097 Apr 11 2018 sparkR2.cmd -rw-r--r-- 1 root root 1168 Apr 11 2018 sparkR.cmd -rwxr-xr-x 1 root root 3017 Apr 11 2018 spark-shell -rw-r--r-- 1 root root 1631 Apr 11 2018 spark-shell2.cmd -rw-r--r-- 1 root root 1178 Apr 11 2018 spark-shell.cmd -rw-r--r-- 1 root root 1118 Apr 11 2018 spark-sql2.cmd -rw-r--r-- 1 root root 1173 Apr 11 2018 spark-sql.cmd -rwxr-xr-x 1 root root 1040 Apr 11 2018 spark-submit -rw-r--r-- 1 root root 1155 Apr 11 2018 spark-submit2.cmd -rw-r--r-- 1 root root 1180 Apr 11 2018 spark-submit.cmd
修改conf文件为软连接
# pwd /home/root/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/lib/spark2 # rm -rf conf # ln -s /etc/spark2/conf conf # ll total 112 drwxr-xr-x 2 root root 4096 Nov 18 17:13 bin drwxr-xr-x 2 root root 4096 Nov 18 17:13 cloudera lrwxrwxrwx 1 root root 16 Nov 24 13:51 conf -> /etc/spark2/conf drwxr-xr-x 5 root root 4096 Nov 18 17:13 data drwxr-xr-x 4 root root 4096 Nov 18 17:13 examples drwxr-xr-x 2 root root 12288 Nov 18 17:13 jars drwxr-xr-x 2 root root 4096 Nov 18 17:13 kafka-0.10 drwxr-xr-x 2 root root 4096 Nov 18 17:13 kafka-0.9 -rw-r--r-- 1 root root 18045 Apr 11 2018 LICENSE drwxr-xr-x 2 root root 4096 Nov 18 17:13 licenses -rw-r--r-- 1 root root 24913 Apr 11 2018 NOTICE drwxr-xr-x 6 root root 4096 Nov 18 17:13 python -rw-r--r-- 1 root root 3809 Apr 11 2018 README.md -rw-r--r-- 1 root root 313 Apr 11 2018 RELEASE drwxr-xr-x 2 root root 4096 Nov 18 17:13 sbin -rw-r--r-- 1 root root 20 Apr 11 2018 work drwxr-xr-x 2 root root 4096 Nov 18 17:13 yarn
打包,注意打包名字后加上系统版本
tar -zcvf SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1
验证parcel是否可用
官网检验包地址链接
# java -jar validator.jar -f SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel Validating: SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel Validating: SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/meta/parcel.json Validating: SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/meta/alternatives.json Validation succeeded.
生成hash校验文件
# sha1sum SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel | cut -d ' ' -f 1 > SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha # cat SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha b95e64a0c5a0a183c75b2fdb6c284c4ca2c2aeaa
将parcel拷贝到Cloudera下,并修改文件所属用户和分组
# cp SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel /opt/cloudera/parcel-repo/ # cp SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha /opt/cloudera/parcel-repo/ # chown -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel # chgrp -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel # chown -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha # chgrp -R cloudera-scm /opt/cloudera/parcel-repo/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha # ll /opt/cloudera/parcel-repo/ total 1607468 -rw-r----- 1 cloudera-scm cloudera-scm 1457371397 Nov 25 18:20 CDH-5.6.0-1.cdh5.6.0.p0.45-el6.parcel -rw-r----- 1 cloudera-scm cloudera-scm 41 Nov 25 18:20 CDH-5.6.0-1.cdh5.6.0.p0.45-el6.parcel.sha -rw-r--r-- 1 cloudera-scm cloudera-scm 188658409 Nov 26 10:45 SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel -rw-r--r-- 1 cloudera-scm cloudera-scm 41 Nov 26 10:45 SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1-el6.parcel.sha
然后web页面上配置jdk
检查新的parcel,分配激活
修改meta/version为
2.3.0.cloudera3
修改descriptor/service.sdl,主要是修改版本信息,有一些冲突的删掉了
{
"name" : "SPARK2_ON_YARN",
"label" : "Spark 2",
"description" : "Apache Spark is an open source cluster computing system. This service runs Spark 2 as an application on YARN. Before adding this service, ensure that you have installed the Spark2 binaries, which are not included in CDH.",
"version" : "2.3.0.cloudera3",
"compatibility" : { "cdhVersion" : { "min" : "5.6", "max" : "6.0" } },
"runAs" : {
"user" : "spark",
"group" : "spark",
"principal" : "spark"
},
"inExpressWizard" : true,
"icon" : "images/icon.png",
"parcel" : {
"repoUrl" : "http://archive.cloudera.com/spark2/parcels/2.3.0.cloudera3/",
"requiredTags" : ["spark2", "cdh"],
"optionalTags" : ["spark-plugin", "spark2-plugin"]
},
......
//以下是删除部分
{
"name" : "navigator_lineage_enabled",
"type" : "provided"
}
{
"key" : "spark.lineage.enabled",
"value" : "${navigator_lineage_enabled}"
}
然后上传打包
# jar -cvf SPARK2_ON_YARN-2.3.0.cloudera3.jar * # ll total 44 drwxr-xr-x 3 root root 4096 Nov 22 13:30 aux drwxr-xr-x 2 root root 4096 Nov 19 11:59 descriptor drwxr-xr-x 2 root root 4096 Nov 22 10:46 images drwxr-xr-x 2 root root 4096 Nov 22 13:31 meta drwxr-xr-x 2 root root 4096 Nov 22 13:41 meta-INF drwxr-xr-x 2 root root 4096 Nov 22 10:48 scripts -rw-r--r-- 1 root root 18976 Nov 26 10:03 SPARK2_ON_YARN-2.3.0.cloudera3.jar
复制jar包到Cloudera下,然后重启服务
# cp SPARK2_ON_YARN-2.3.0.cloudera3.jar /opt/cloudera/csd/ # service cloudera-scm-server restart
添加服务
找到spark2,如果没有请查看日志,可能csd中json有些冲突
按需求选择
添加Spark History 和Gateway,之后一直继续,
等待安装完毕
验证
# su hdfs
$ spark2-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://nn.root.com:4040
Spark context available as 'sc' (master = yarn, app id = application_1637893774929_0001).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/_,_/_/ /_/_ version 2.3.0.cloudera2
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
查看18089端口
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
at java.lang.Class.getMethod0(Class.java:3018)
at java.lang.Class.getMethod(Class.java:1784)
at sun.launcher.LauncherHelper.validateMainClass(LauncherHelper.java:544)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:526)
Caused by: java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
可以查看以下两点
软连接是否有问题
# 正常的软连接 # ll /etc/spark2/conf lrwxrwxrwx 1 root root 29 Nov 26 10:50 /etc/spark2/conf -> /etc/alternatives/spark2-conf # ll /etc/alternatives/spark2-conf lrwxrwxrwx 1 root root 40 Nov 26 10:50 /etc/alternatives/spark2-conf -> /etc/spark2/conf.cloudera.spark2_on_yarn # 没有分配gateway角色的软连接,这个软连接也可以,不过我的这个包下没有任何东西,所以要按上边的进行链接 # ll /etc/spark2/conf lrwxrwxrwx 1 root root 29 Nov 26 10:50 /etc/spark2/conf -> /etc/alternatives/spark2-conf # ll /etc/alternatives/spark2-conf lrwxrwxrwx 1 root root 81 Nov 25 10:35 /etc/alternatives/spark2-conf -> /opt/cloudera/parcels/SPARK2-2.3.0.cloudera3-1.cdh5.6.0.p0.1/etc/spark2/conf.dist
查看spark-env.sh中是否导入hadoop的包,
export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt"
或者
export SPARK_DIST_CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath)
SPARK_EXTRA_LIB_PATH="/opt/cloudera/parcels/CDH-5.6.0-1.cdh5.6.0.p0.45/lib/hadoop/lib/native"



