首先启动数据库,
mysql -u root -p
创建spark数据库,和一张student表测试使用
mysql> create database spark; Query OK, 1 row affected (0.02 sec) mysql> use spark; Database changed mysql> create table student (id int(4), name char(20), gender char(4), age int(4)); Query OK, 0 rows affected (0.04 sec) mysql> insert into student values(1,'Xueqian','F',23); Query OK, 1 row affected (0.01 sec) mysql> insert into student values(2,'Weiliang','M',24); Query OK, 1 row affected (0.01 sec) mysql> select * from student; +------+----------+--------+------+ | id | name | gender | age | +------+----------+--------+------+ | 1 | Xueqian | F | 23 | | 2 | Weiliang | M | 24 | +------+----------+--------+------+ 2 rows in set (0.01 sec)
接下来,需要在你的spark文件夹的jar里面放入mysql-connector-java的jar包,这个jar包版本很多,我用的是8.0.15,需要什么版本,去maven仓库自行下载,方法链接如下:
jar包下载
下载完成放入jar文件夹,然后连接就可以了,
放上测试代码
from pyspark import SparkContext, SparkConf
from pyspark.sql import SparkSession
spark = SparkSession.builder.config(conf=SparkConf()).getOrCreate()
df = spark.read
.format("jdbc")
.option("driver", "com.mysql.jdbc.Driver")
.option("url", "jdbc:mysql://localhost:3306/spark?serverTimezone=UTC")
.option("dbtable", "student")
.option("user", "root")
.option("password", "root")
.load()
df.show()



