文件studentinfo--> Linux --> HDFS --> Hive --> Spark读取
1.把本地的文件上传到Linux上
利用命令:rz-E 上传文件studentinfo到Linux下的/dataset/路径下
2.把Linux中/dataset/路径下studentinfo文件上传到HDFS上
hdfs dfs -mkdir -p /dataset hdfs dfs -put studentinfo /dataset/
3.使用hive或beeline执行SQL,创建hive表student
CREATE DATAbase IF NOT EXISTS spark_integrition; USE spark_integrition; CREATE EXTERNAL TABLE student ( name STRING, age INT, gpa string ) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' STORED AS TEXTFILE LOCATION '/dataset/hive';
4.加载HDFS数据到hive
LOAD DATA INPATH '/dataset/studentinfo' OVERWRITE INTO TABLE student;
5.通过SparkSQL查询hive的表
scala> spark.sql("use spark_integrition")
scala> val resultDF = spark.sql("select * from student limit 10")
scala> resultDF.show()


