您可以从中创建Dataframe
List<String>,然后使用
selectExpr和
split获取所需的Dataframe。
public class SparkSample{public static void main(String[] args) { SparkConf conf = new SparkConf().setAppName("SparkSample").setMaster("local[*]"); JavaSparkContext jsc = new JavaSparkContext(conf); SQLContext sqc = new SQLContext(jsc); // sample data List<String> data = new ArrayList<String>(); data.add("dev, engg, 10000"); data.add("karthik, engg, 20000"); // Dataframe Dataframe df = sqc.createDataset(data, Enprers.STRING()).toDF(); df.printSchema(); df.show(); // Convert Dataframe df1 = df.selectExpr("split(value, ',')[0] as name", "split(value, ',')[1] as degree","split(value, ',')[2] as salary"); df1.printSchema(); df1.show(); }}您将获得以下输出。
root |-- value: string (nullable = true)+--------------------+| value|+--------------------+| dev, engg, 10000||karthik, engg, 20000|+--------------------+root |-- name: string (nullable = true) |-- degree: string (nullable = true) |-- salary: string (nullable = true)+-------+------+------+| name|degree|salary|+-------+------+------+| dev| engg| 10000||karthik| engg| 20000|+-------+------+------+
您提供的样本数据有空格。如果你想删除的空间,有工资类型为“整数”,那么你可以使用
trim和
cast功能如下图所示。
df1 = df1.select(trim(col("name")).as("name"),trim(col("degree")).as("degree"),trim(col("salary")).cast("integer").as("salary"));


