- 1、读写CSV文件
- 2、修改数据类型
- 3、删除重复的列
val df1 = spark.read.csv("hdfs://106.12.48.46:9000/sparktest/服创大赛-原始数据.csv").toDF("timestamp","imsi","lac_id","cell_id","phone","timestamp1","tmp0","tmp1","nid","npid")
2、修改数据类型
df1.select(df1("timestamp").cast("int"))
3、删除重复的列
df1.dropDuplicates()



