依赖:
com.ververica flink-sql-connector-mysql-cdc2.1.0 provided
1,最简单的代码:
package com.ververica.cdc.connectors.mysql.source;
import org.apache.flink.api.common.eventtime.WatermarkStrategy;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import com.ververica.cdc.connectors.mysql.testutils.UniqueDatabase;
import com.ververica.cdc.debezium.JsonDebeziumDeserializationSchema;
import org.junit.Ignore;
import org.junit.Test;
public class MySqlSourceExampleTest extends MySqlSourceTestbase {
@Test
@Ignore("Test ignored because it won't stop and is used for manual test")
public void testConsumingAllEvents() throws Exception {
inventoryDatabase.createAndInitialize();
MySqlSource mySqlSource =
MySqlSource.builder()
.hostname(MYSQL_CONTAINER.getHost())
.port(MYSQL_CONTAINER.getDatabasePort())
.databaseList(inventoryDatabase.getDatabaseName())
.tableList(inventoryDatabase.getDatabaseName() + ".products")
.username(inventoryDatabase.getUsername())
.password(inventoryDatabase.getPassword())
.serverId("5401-5404")
.deserializer(new JsonDebeziumDeserializationSchema())
.includeSchemaChanges(true) // output the schema changes as well
.build();
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// enable checkpoint
env.enableCheckpointing(3000);
// set the source parallelism to 4
env.fromSource(mySqlSource, WatermarkStrategy.noWatermarks(), "MySqlParallelSource")
.setParallelism(4)
.print()
.setParallelism(1);
env.execute("Print MySQL Snapshot + Binlog");
}
}
includeSchemaChanges(true) 是开启了,表结构变更感知
我这里只分析mysql cdc的一些升级:
-
支持所有 MySQL 数据类型
包括枚举类型、数组类型、地理信息类型等复杂类型。
-
支持 metadata column
用户可以在 Flink DDL 中通过 db_name STRING metaDATA FROM 'database_name' 的方式来访问库名(database_name)、表名(table_name)、变更时间(op_ts)等 meta 信息。这对分库分表场景的数据集成非常使用。
-
支持并发读取的 DataStream API
在 2.0 版本中,无锁算法,并发读取等功能只在 SQL API 上透出给用户,而 DataStream API 未透出给用户,2.1 版本支持了 DataStream API,可通过 MySqlSourceBuilder 创建数据源。用户可以同时捕获多表数据,借此搭建整库同步链路。同时通过 MySqlSourceBuilder#includeSchemaChanges 还能捕获 schema 变更。
-
支持 currentFetchEventTimeLag,currentEmitEventTimeLag,sourceIdleTime 监控指标
这些指标遵循 FLIP-33 [1] 的连接器指标规范,可以查看 FLIP-33 获取每个指标的含义。其中,currentEmitEventTimeLag 指标记录的是 Source 发送一条记录到下游节点的时间点和该记录在 DB 里产生时间点差值,用于衡量数据从 DB 产生到离开 Source 节点的延迟。用户可以通过该指标判断 source 是否进入了 binlog 读取阶段:
-
即当该指标为 0 时,代表还在全量历史读取阶段;
-
当大于 0 时,则代表进入了 binlog 读取阶段。
-
正常读取数据:
debug一下结构变更之后:
name1 -> name2
Struct的完整结构:
Struct{source=Struct{version=1.5.4.Final,connector=mysql,name=mysql_binlog_source,ts_ms=1637117644411,db=bi_dev,table=test,server_id=1921684100,gtid=ef6f9e15-1218-11ec-997f-968db1336f14:2840388,file=mysql-bin.000056,pos=509499737,row=0},historyRecord={"source":{"file":"mysql-bin.000056","pos":509499737,"server_id":1921684100},"position":{"transaction_id":null,"ts_sec":1637117644,"file":"mysql-bin.000056","pos":509499965,"gtids":"ef6f9e15-1218-11ec-997f-968db1336f14:1-2840387","server_id":1921684100},"databaseName":"bi_dev","ddl":"ALTER TABLE `bi_dev`.`test` rnCHANGE COLUMN `name1` `name2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL AFTER `id`","tableChanges":[{"type":"ALTER","id":""bi_dev"."test"","table":{"defaultCharsetName":"utf8mb4","primaryKeyColumnNames":["id"],"columns":[{"name":"id","jdbcType":4,"typeName":"INT","typeexpression":"INT","charsetName":null,"length":11,"position":1,"optional":false,"autoIncremented":true,"generated":true},{"name":"name2","jdbcType":12,"typeName":"VARCHAR","typeexpression":"VARCHAR","charsetName":"utf8mb4","length":255,"position":2,"optional":true,"autoIncremented":false,"generated":false},{"name":"date4","jdbcType":91,"typeName":"DATE","typeexpression":"DATE","charsetName":null,"position":3,"optional":true,"autoIncremented":false,"generated":false},{"name":"datetime1","jdbcType":93,"typeName":"DATETIME","typeexpression":"DATETIME","charsetName":null,"position":4,"optional":true,"autoIncremented":false,"generated":false},{"name":"timestamp1","jdbcType":2014,"typeName":"TIMESTAMP","typeexpression":"TIMESTAMP","charsetName":null,"position":5,"optional":true,"autoIncremented":true,"generated":true}]}}]}}
修改语句为:
"databaseName":"bi_dev","ddl":"ALTER TABLE `bi_dev`.`test` CHANGE COLUMN `name1` `name2` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL AFTER `id`"
我们执行这个语句,稍微修改一下 ,将目前的name2 修改成name3:
查看效果:
修改之后的详细情况:
[
{
"type":"ALTER",
"id":""bi_dev"."test"",
"table":{
"defaultCharsetName":"utf8mb4",
"primaryKeyColumnNames":[
"id"
],
"columns":[
{
"name":"id",
"jdbcType":4,
"typeName":"INT",
"typeexpression":"INT",
"charsetName":null,
"length":11,
"position":1,
"optional":false,
"autoIncremented":true,
"generated":true
},
{
"name":"name2",
"jdbcType":12,
"typeName":"VARCHAR",
"typeexpression":"VARCHAR",
"charsetName":"utf8mb4",
"length":255,
"position":2,
"optional":true,
"autoIncremented":false,
"generated":false
},
{
"name":"date4",
"jdbcType":91,
"typeName":"DATE",
"typeexpression":"DATE",
"charsetName":null,
"position":3,
"optional":true,
"autoIncremented":false,
"generated":false
},
{
"name":"datetime1",
"jdbcType":93,
"typeName":"DATETIME",
"typeexpression":"DATETIME",
"charsetName":null,
"position":4,
"optional":true,
"autoIncremented":false,
"generated":false
},
{
"name":"timestamp1",
"jdbcType":2014,
"typeName":"TIMESTAMP",
"typeexpression":"TIMESTAMP",
"charsetName":null,
"position":5,
"optional":true,
"autoIncremented":true,
"generated":true
}
]
}
}
]
所以一定添加了对元数据修改的操作,数据解析也不一样了,要添加判断,后续代码会添加完整的解析代码, 然后监控元数据操作之后,针对下游的doris表进行元数据修改,未完。



