最近接手个CDH6.3.1版本的大数据集群,以前我搭建的都是apache原生Hadoop集群,通过编辑器调试sparkSQL读取hive很容易。现在遇到CDH整合后的集群还是有点不习惯,找到cdh环境中的hive-site.xml里面配置基本没用。网上找了许多感觉没有正解,我按照原生apache的hive-site.xml配置更改了下,可以实现本地idea调试启动spark程序读取hive表。
项目结构hive-site.xml配置
hive.metastore.uris thrift://开启metastore主机ip:9083 hive.server2.thrift.port 10000 javax.jdo.option.ConnectionURL jdbc:mysql://hive使用mysql库的ip:3306/hive javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword password hive.zookeeper.quorum cdh-06.prod.ycsInsight.yonyou.com,cdh-02.prod.ycsInsight.yonyou.com,cdh-08.prod.ycsInsight.yonyou.com hive.metastore.warehouse.dir /user/hive/warehouse fs.defaultFS hdfs://namenode节点IP:8020 hive.metastore.schema.verification false datanucleus.autoCreateSchema true datanucleus.autoStartMechanism checked
测试代码:
object HiveTest {
def main(args: Array[String]): Unit = {
val spark: SparkSession = SparkSession
.builder
.master("local[*]")
.appName("Java Spark Hive Example")
.enableHiveSupport
.getOrCreate
spark.sql("show databases").show()
spark.sql("use databases").show()
spark.sql("show tables").show()
// spark.sql("select * from person").show()
spark.stop()
}
}
pom.xml
4.0.0 org.example spark-test1.0-SNAPSHOT 2.4.0 2.1.1 2.11.12 org.scala-lang scala-library${scala.version} org.apache.spark spark-core_2.11${spark.version} org.apache.spark spark-sql_2.11${spark.version} org.apache.spark spark-hive_2.11${spark.version} hive-metastore org.spark-project.hivehive-exec org.spark-project.hiveorg.apache.hive hive-exec${hive.version} * *org.apache.hive hive-jdbc1.1.0 mysql mysql-connector-java5.1.38 net.alchim31.maven scala-maven-plugin3.2.2 testCompile org.apache.maven.plugins maven-assembly-plugin3.1.0 jar-with-dependencies make-assembly package single
执行结果:
有问题欢迎留言



