栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

【spark】4. Dataset转换算子使用

【spark】4. Dataset转换算子使用

1.groupByKey、mapGroups、flatMapGroups结合使用
package com.DataSet;

import bean.Dept;
import bean.Employee;
import org.apache.spark.sql.*;

import java.util.ArrayList;
import java.util.List;


public class DataSetConvert {
    private static SparkSession spark = SparkSession.builder().master("local[*]").appName("handle data").getOrCreate();

    public static void main(String[] args) {
        spark.conf().set("spark.sql.crossJoin.enabled", "true");
        spark.sparkContext().setLogLevel("WARN");
        Encoder employeeEncoder = Encoders.bean(Employee.class);
        String path = "spark-hello/src/main/resources/employees.json";
        Dataset ds = spark.read().json(path).as(employeeEncoder);


        Dataset out = flatMapGroups(groupByKey(ds));
        out.show();

        Dataset out2 = mapGroups(groupByKey(ds));
        out2.show();


    }

    public static KeyValueGroupedDataset groupByKey(Dataset ds) {
        return ds.groupByKey(e -> e.getName(), Encoders.STRING());
    }

    public static Dataset mapGroups(KeyValueGroupedDataset kvgDS) {
        Dataset out = kvgDS.mapGroups((key, eList) -> {
            Dept dept = new Dept();
            eList.forEachRemaining(e -> {
                dept.addEmployee(e);
            });
            return dept;
        }, Encoders.bean(Dept.class));
        return out;
    }

    public static Dataset flatMapGroups(KeyValueGroupedDataset kvgDS) {
        Dataset out = kvgDS.flatMapGroups((key, eList) -> {
            List employees = new ArrayList<>();
            eList.forEachRemaining(e -> {
                employees.add(e);
            });
            return employees.iterator();

        }, Encoders.bean(Employee.class));
        return out;
    }


}

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/775894.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号