栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

Spark拼接数据的一种实现案例

Spark拼接数据的一种实现案例

package TTest;

import com.hw.chinamobile.service.SparkConfig;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.Optional;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;

public class TTest42 {
    public static void main(String[] args) {
        JavaSparkContext jsc= SparkConfig.Instance("Combine");
        JavaPairRDD rdd1=jsc.textFile("").mapToPair(new PairFunction() {
            @Override
            public Tuple2 call(String s) throws Exception {
                return new Tuple2<>(s.split(",",-1)[0],s.split(",",-1)[1]);
            }
        });
        JavaPairRDD rdd2=jsc.textFile("").mapToPair(new PairFunction() {
            @Override
            public Tuple2 call(String s) throws Exception {
                return new Tuple2<>(s.split(",",-1)[0],s.split(",",-1)[1]);
            }
        });
        getFullJoinResult(rdd1,rdd2);
    }
    private  static JavaPairRDD getFullJoinResult(JavaPairRDD rdd1,JavaPairRDD rdd2){
        JavaPairRDD rddCombine = rdd1.fullOuterJoin(rdd2).mapToPair(new PairFunction, Optional>>, String, String>() {
            @Override
            public Tuple2 call(Tuple2, Optional>> tuple) throws Exception {
                String[] word1 = (tuple._2._1.isPresent()) ? tuple._2._1.get().split(",", -1) : null;//获取左边为数组word1
                String[] word2 = (tuple._2._1.isPresent()) ? tuple._2._1.get().split(",", -1) : null;//获取右边为数组word2
                String key = "";//字段
                String value1 = "";
                String value2 = "";
                String value3 = "";
                String value4 = "";
                String value5 = "";
                String value6 = "";
                if (word1 == null) {//如果左边为空,则字段为右边的
                    key = word2[0];
                    value1 = word2[1];
                    value2 = word2[2];
                    value3 = word2[3];
                    value4 = word2[4];
                    value5 = word2[5];
                    value6 = word2[6];
                } else if (word2 == null) {//如果右边为空,则字段为左边的
                    key = tuple._1;
                    value1 = word1[0];
                    value2 = word1[1];
                    value3 = word1[2];
                    value4 = word1[0];
                    value5 = word1[1];
                    value6 = word1[2];
                } else {//如果两边都有,则拼接:前三个是左边的,后边三个是右边的
                    key = tuple._1();
                    value1 = word1[0];
                    value2 = word1[1];
                    value3 = word1[2];
                    value4 = word2[4];
                    value5 = word2[5];
                    value6 = word2[6];
                }
                String line = key + "," + value1 + "," + "," + value2 + "," + "," + value3 + "," + "," + value4 + "," + "," + value5 + "," + "," + value6;
                Tuple2 rdd3 = new Tuple2<>(key, line);
                return rdd3;
            }
        });
        return rddCombine;
    }
}

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/674036.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号