解压、改名
tar -zxvf spark-3.1.2-bin-hadoop2.7.tgz -C /opt/ cd /opt/ mv spark-3.1.2-bin-hadoop2.7/ spark cd spark/conf添加Hadoop配置文件的软链接
ln -s /opt/hadoop/etc/hadoop/core-site.xml ln -s /opt/hadoop/etc/hadoop/hdfs-site.xml添加hive-site.xml配置文件
touch hive-site.xml vim hive-site.xmlhive-site.xml
启动Hive初始化元数据仓库(需要安装Hive2.3.x)javax.jdo.option.ConnectionURL jdbc:mysql://server3:3306/hive_db?createDatabaseIfNotExist=true&useSSL=false JDBC connect string for a JDBC metastore javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver Driver class name for a JDBC metastore javax.jdo.option.ConnectionUserName root username to use against metastore database javax.jdo.option.ConnectionPassword 123456 password to use against metastore database datanucleus.schema.autoCreateTables true
注:spark初始化元数据仓库不太聪明,自己去hive安装目录手动初始化吧。
初始化命令schematool -dbType mysql -initSchema编辑spark-env.sh
添加如下配置
export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop export JAVA_HOME=/opt/jdk export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=server1:2181,server2:2181,server3:2181 -Dspark.deploy.zookeeper.dir=/spark"编辑workers
这里我已经配置了ip映射
| 节点 | IP映射名 |
|---|---|
| 节点1 | server1 |
| 节点2 | server2 |
| 节点3 | server3 |
workers内容
server1 server2启动Spark 在节点3输入
sbin/start-all.sh在节点2输入
sbin/start-master.sh测试检查
浏览器输入服务器Master节点IP,8080端口,查看Spark的WebUI如下
这里浏览器也做了IP映射
server2:8080测试与Hive的集成 启动Spark sql之前别忘记打开hive的元数据服务,不知道怎么打开,启动hive也行
bin/spark-sql --master spark://server3:7077 --driver-class-path /opt/mysql-connector-java-5.1.49/mysql-connector-java-5.1.49-bin.jar
show databases; show tables;启动元数据服务的命令如下
hive --service metastore
注:启动之后别退出,另开一会话窗口就行了



