运行环境:
- 操作系统:ubuntu 16
- JDK:1.8.0_261-b12
- hadoop: 3.2.2
- spark: 3.1.2
- 下载安装
安装包官网下载即可:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
然后解压缩到指定目录,我的目录为:
/home/ffzs/softwares/hadoop-3.2.2
- 设置免密码登录
创建密钥, 已经有的可以跳过这一步,如之前git设置过:
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
将自己的秘钥放在ssh授权目录,这样ssh登录自身就不需要输入密码了:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- 修改配置
首先设置java路径, 打开hadoop-3.2.2/etc/hadoop目录中的hadoop-env.sh文件设置$JAVA_HOME:
export JAVA_HOME=/home/ffzs/softwares/jdk1.8.0_261
同样的目录, 修改core-site.xml文件,添加如下内容,设置默认hdfs路径:
hadoop.tmp.dir file:/home/ffzs/hadoop/tmp fs.defaultFS hdfs://localhost:9000
修改hdfs-site.xml文件:
dfs.replication 1 dfs.namenode.name.dir file:/home/ffzs/hadoop/tmp/dfs/name dfs.datanode.data.dir file:/home/ffzs/hadoop/tmp/dfs/data
修改mapred-site.xml文件:
mapreduce.framework.name yarn
修改yarn-site.xml文件:
yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
这时通过hadoop-3.2.2/bin中的hdfs执行命令初始化hdfs:
./hdfs namenode -format
运行介绍显示如下即为成功初始化:
- 启动
启动hdfs:
(base) [~/softwares/hadoop-3.2.2]$ ./sbin/start-dfs.sh Starting namenodes on [localhost] Starting datanodes Starting secondary namenodes [ffzs-ub]
启动yarn:
(base) [~/softwares/hadoop-3.2.2]$ ./sbin/start-yarn.sh Starting resourcemanager Starting nodemanagers
通过jps查看启动进程情况:
二、spark单机模式- 下载安装
通过官网下载spark:https://downloads.apache.org/spark/spark-3.1.2/spark-3.1.2-bin-hadoop3.2.tgz
解压到相应的目录即可:
/home/ffzs/softwares/spark-3.1.2-bin-hadoop3.2
- 配置
spark-3.1.2-bin-hadoop3.2/conf目录中:
cp spark-env.sh.template spark-env.sh
然后将JAVA_HOME, HADOOP_HOME添加到spark-env.sh中即可:
export JAVA_HOME=/home/ffzs/softwares/jdk1.8.0_261
- 运行
通过sbin目录的start-all.sh运行spark:
(base) [~/softwares/spark-3.1.2-bin-hadoop3.2]$ ./sbin/start-all.sh starting org.apache.spark.deploy.master.Master, logging to /home/ffzs/softwares/spark-3.1.2-bin-hadoop3.2/logs/spark-ffzs-org.apache.spark.deploy.master.Master-1-ffzs-ub.out localhost: starting org.apache.spark.deploy.worker.Worker, logging to /home/ffzs/softwares/spark-3.1.2-bin-hadoop3.2/logs/spark-ffzs-org.apache.spark.deploy.worker.Worker-1-ffzs-ub.out
jps查看出现Master和Worker进程:
通过http://localhost:8080/访问spark-ui:
可以通过spark-shell启动测试:
三、hive配置通过mysql作为元数据库, 这里通过docker启动一个mysql, 用户名root,密码123zxc。
mysql:
image: mysql:8
container_name: mysql
networks:
- spring
restart: always
ports:
- 33060:33060
- 3306:3306
volumes:
- ./mysql/db:/var/lib/mysql
- ./mysql/conf.d:/etc/mysql/conf.d
environment:
- MYSQL_ROOT_PASSWORD=123zxc
command: --default-authentication-plugin=mysql_native_password
然后新建hive-site.xml文件,在spark-3.1.2-bin-hadoop3.2/conf目录,写入内容如下, 这里需要注意一下使用mysql8和mysql5的配置不一样, 我使用的是mysql8:
javax.jdo.option.ConnectionDriverName com.mysql.cj.jdbc.Driver javax.jdo.option.ConnectionURL jdbc:mysql://ffzs-ub:3306/hive_db?createDatabaseIfNotExist=true&useSSL=false&serverTimezone=GMT&allowPublicKeyRetrieval=true javax.jdo.option.ConnectionUserName root javax.jdo.option.ConnectionPassword 123zxc datanucleus.schema.autoCreateAll true
通过spark SQL启动:
spark-sql --driver-class-path mysql-connector-java-8.0.26.jar --master spark://ffzs-ub:7077
启动后在http://localhost:8080/中可以看到相应的spark SQL任务:



