环境要求:
- 虚拟机上hadoop集群hdfs开启
- 虚拟机配置hive,且hive配置metastore到mysql
- windows中配置hadoop环境,且IDEA中sparksql内部可运行
- 虚拟机防火墙关闭
网上很多帖子的操作步骤过于繁琐,现总结如下:
1.向pom.xml中导入依赖(mysql驱动、hive依赖,spark-on-hive依赖)
mysql mysql-connector-java5.1.27 org.apache.spark spark-hive_2.123.0.0 org.apache.hive hive-exec1.2.1
2.将虚拟机hive/conf目录下hive-site.xml 文件拷贝到项目的 resources 目录中,(需根据自己的mysql情况调整url、用户名和密码)
hive.metastore.schema.verification false javax.jdo.option.ConnectionURL jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username to use against metastore database javax.jdo.option.ConnectionPassword 123456 password to use against metastore database
3.idea项目target/classes目录中hive-site.xml是否已自动复制,若无,需要放置其中,否则spark只能本地运行
4.开启Hive支持,在创建SparkSession时,添加enableHiveSupport()即可
//创建 SparkSession
val spark: SparkSession = SparkSession
.builder()
.enableHiveSupport()
.master("local[*]")
.appName("sql")
.getOrCreate()


