栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

74.Oozie API接口向提交Spark作业

74.Oozie API接口向提交Spark作业

74.1 演示环境介绍
  • Kerberos环境的CDH集群
  • CM和CDH版本:5.13.1
74.2 操作演示

jar包上传到HDFS目录

[root@ip-186-31-16-68 ~]# kinit fayson
Password for fayson@FAYSON.COM: 
[root@ip-186-31-16-68 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: fayson@FAYSON.COM
Valid starting       Expires              Service principal
02/22/2018 21:12:41  02/23/2018 21:12:41  krbtgt/FAYSON.COM@FAYSON.COM
        renew until 03/01/2018 21:12:41
[root@ip-186-31-16-68 ~]# 
hadoop fs -mkdir -p /fayson/jars
hadoop fs -put /opt/cloudera/parcels/CDH/jars/spark-examples-1.6.0-cdh5.13.1-hadoop2.6.0-cdh5.13.1.jar /fayson/jars
hadoop fs -ls /fayson/jars
  • 定义一个Spark Action的workflow.xml文件:
    • workflow.xml文件中使用的参数配置为动态参数,会在后面的代码中指定该参数的值

    
    
        Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
    
    
        
            ${jobTracker}
            ${nameNode}
            ${master}
            ${mode}
            ${name}
            ${class}
            ${jar}
            ${sparkOpts}
            ${arg}
            ${file}
        
        
        
    
    

定义好的workflow.xml文件上传至HDFS的/user/fayson/oozie/testoozie目录下

hadoop fs -mkdir -p /user/fayson/oozie/testoozie
hadoop fs -put workflow.xml /user/fayson/oozie/testoozie
hadoop fs -ls /user/fayson/oozie/testoozie

准备JAAS文件oozie-login.conf:

com.sun.security.jgss.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    storeKey=true
    useKeyTab=true
    debug=true
    keyTab="/Volumes/Transcend/keytab/fayson.keytab"
    principal="fayson@FAYSON.COM";
};
  • Maven创建Java工程

    
        cdh-project
        com.cloudera
        1.0-SNAPSHOT
    
    4.0.0
    oozie-demo
    jar
    oozie-demo
    http://maven.apache.org
    
        UTF-8
    
    
        
            org.apache.httpcomponents
            httpclient
            4.5.4
        
        
            net.sourceforge.spnego
            spnego
            7.0
        
        
            org.apache.oozie
            oozie-client
            4.1.0
        
    

编写SparkWorkflowDemo.java

package com.cloudera.kerberos;
import org.apache.oozie.client.*;
import java.util.List;
import java.util.Properties;

public class SparkWorkflowDemo {
    private static String oozieURL = "http://ip-186-31-16-68.ap-southeast-1.compute.internal:11000/oozie";
    public static void main(String[] args) {
        System.setProperty("java.security.krb5.conf", "/Volumes/Transcend/keytab/krb5.conf");
        System.setProperty("javax.security.auth.useSubjectCredsOnly", "false");
        System.setProperty("ssun.security.jgss.debug", "true"); //Kerberos Debug模式
        System.setProperty("java.security.auth.login.config", "/Volumes/Transcend/keytab/oozie-login.conf");
        System.setProperty("user.name", "fayson");
        AuthOozieClient oozieClient = new AuthOozieClient(oozieURL, AuthOozieClient.AuthType.KERBEROS.name());
        oozieClient.setDebugMode(1);
        try {
            System.out.println(oozieClient.getServerBuildVersion());
            Properties properties = oozieClient.createConfiguration();
            properties.put("oozie.wf.application.path", "${nameNode}/user/fayson/oozie/testoozie");
            properties.put("name", "MyfirstSpark");
            properties.put("nameNode", "hdfs://ip-186-31-16-68.ap-southeast-1.compute.internal:8020");
            properties.put("oozie.use.system.libpath", "True");
            properties.put("master", "yarn-cluster");
            properties.put("mode", "cluster");
            properties.put("class", "org.apache.spark.examples.SparkPi");
            properties.put("arg", "100");
            properties.put("sparkOpts", "--num-executors 4 --driver-memory 2g --driver-cores 1 --executor-memory 2g --executor-cores 1");
            properties.put("jar", "${nameNode}/fayson/jars/spark-examples-1.6.0-cdh5.13.1-hadoop2.6.0-cdh5.13.1.jar");
            properties.put("oozie.libpath", "${nameNode}/fayson/jars");
            properties.put("jobTracker", "ip-186-31-16-68.ap-southeast-1.compute.internal:8032");
            properties.put("file", "${nameNode}/fayson/jars");
            //运行workflow
            String jobid = oozieClient.run(properties);
            System.out.println(jobid);
            //等待10s
            new Thread(){
                public void run() {
                    try {
                        Thread.sleep(10000l);
                    } catch (InterruptedException e) {
                        e.printStackTrace();
                    }
                }
            }.start();
            //根据workflow id获取作业运行情况
            WorkflowJob workflowJob = oozieClient.getJobInfo(jobid);
            //获取作业日志
            System.out.println(oozieClient.getJobLog(jobid));
            //获取workflow中所有ACTION
            List list = workflowJob.getActions();
            for (WorkflowAction action : list) {
                //输出每个Action的 Appid 即Yarn的Application ID
                System.out.println(action.getExternalId());
            }
        } catch (OozieClientException e) {
            e.printStackTrace();
        }
    }
}

总结:
1.需要先定义好workflow.xml文件
2.参数传递通过在代码里面调用oozieClient.createConfiguration()创建一个Properties对象将K,V值存储并传入oozieClient.run(properties)中
3.在指定HDFS上运行的jar或workflow的路径时需要带上HDFS的路径,否则默认会找到本地的目录
4.向Kerberos集群提交作业需要在程序中加载JAAS配置
5.Oozie-client提供了Kerberos认证的AuthOozieClient API接口

大数据视频推荐:
CSDN
大数据语音推荐:
企业级大数据技术应用
大数据机器学习案例之推荐系统
自然语言处理
大数据基础
人工智能:深度学习入门到精通

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/632783.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号