栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

把RDD保存成文件

把RDD保存成文件

第1种保存方法:
scala> val p= spark.read.format("json").load("file:///usr/local/spark/examples/src/main/resources/people.json")
p: org.apache.spark.sql.Dataframe = [age: bigint, name: string]
 
scala> p.select("name", "age").write.format("csv").save("file:///usr/local/spark/mycode/newpeople.csv")

//这里使用select(“name”, “age”)确定要把哪些列进行保存,然后调用write.format(“csv”).save ()保存成csv文件
 

write.format()支持输出 json,parquet, jdbc, orc, libsvm, csv, text等格式文件,如果要输出文本文件,可以采用write.format(“text”)

  • 查看信息
scala> val t = sc.textFile("file:///usr/local/spark/mycode/newpeople.csv")
t: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/mycode/newpeople.csv MapPartitionsRDD[1] at textFile at :24
scala> t.foreach(println)
Justin,19
Michael,
Andy,30
第2种保存方法:
scala> val p = spark.read.format("json").load("file:///usr/local/spark/examples/src/main/resources/people.json")
p: org.apache.spark.sql.Dataframe = [age: bigint, name: string]
 
scala> df.rdd.saveAsTextFile("file:///usr/local/spark/mycode/newpeople.txt")
  • 查看
scala> val t = sc.textFile("file:///usr/local/spark/mycode/newpeople.txt")
t: org.apache.spark.rdd.RDD[String] = file:///usr/local/spark/mycode/newpeople.txt MapPartitionsRDD[11] at textFile at :28
 
scala> t.foreach(println)
[null,Michael]
[30,Andy]
[19,Justin]
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/681643.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号