canal发送数据到kafka

  • Post author:
  • Post category:其他


1. canal安装

官方文档:

https://github.com/alibaba/canal/wiki/Canal-Kafka-RocketMQ-QuickStart

版本: canal 1.1.3 , JDK 1.8+ ,MySQL 5.7

软件下载之后解压缩,有2个配置文件需要更改:

canal.properties
example/instance.properties

再修改之前,先理解这2个文件的作用,canal.properties主要包含通用配置,实例配置,MQ/kafka配置,其他的配置保持默认即可。

通用配置我在下面做了注释,基本也就需要修改那些东西,那么什么是实例配置? 所谓实例的概念是一个canal可以同时获取多台机器的数据库binlog,那么很显然每个数据库的position,数据库名字都不一样,canal为了把配置分开,引入实例的概念。简单理解就是,你每抽取一个数据库的binlog,都可以作为一个实例有单独的配置文件。

比如我有2台机器,每台机器有4个库,现在我想通过一个canal来获取binlog, 也许你会说,那我用2个canal客户端不就可以,当然这是最好的办法,但是如果你想通过一个canal来解决,那就需要建立2个实例文件。

canal.destinations = example,example1

然后在conf目录也要有2个目录example和example1.

canal.properties:

#################################################
#########               common argument         ############# 
#################################################
#canal.manager.jdbc.url=jdbc:mysql://127.0.0.1:3306/canal_manager?useUnicode=true&characterEncoding=UTF-8
#canal.manager.jdbc.username=root
#canal.manager.jdbc.password=121212
canal.id = 1
canal.ip = 
canal.port = 11111
canal.metrics.pull.port = 11112

#zookeeper配置
canal.zkServers = 10.40.2.94:2181,10.40.2.95:2181,10.40.2.96:2181

# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false

# tcp, kafka, RocketMQ
#默认为TCP,也就是你通过官方的example可以在终端查看数据,我们修改为kafka
canal.serverMode = kafka

# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n)
canal.instance.memory.buffer.size = 16384
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024 
## meory store gets mode used MEMSIZE or ITEMSIZE
canal.instance.memory.batch.mode = MEMSIZE
canal.instance.memory.rawEntry = true

## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false

# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size =  1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60

# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30

# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
canal.instance.filter.transaction.entry = false

# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED 
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB

# binlog ddl isolation
canal.instance.get.ddl.isolation = false

# parallel parser config
canal.instance.parser.parallel = true
## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
#canal.instance.parser.parallelThreadSize = 16
## disruptor ringbuffer size, must be power of 2
canal.instance.parser.parallelBufferSize = 256

# table meta tsdb info
# 关于tsdb概念,建议看一下官方文档,大概意思是canal在获取DDL的时候有可能获取的是错误的,
# 那么为了解决这个问题,所有DDL都会直接解析,重新生成到meta里面,我们可以用MySQL去保存,默认是H2
# 

canal.instance.tsdb.enable = true
canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
#canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;

#保存meta的数据库地址和数据库名称,用户密码
canal.instance.tsdb.url = jdbc:mysql://127.0.0.1:3306/canal_tsdb
canal.instance.tsdb.dbUsername = admin
canal.instance.tsdb.dbPassword = internal

# dump snapshot interval, default 24 hour
canal.instance.tsdb.snapshot.interval = 24
# purge snapshot expire , default 360 hour(15 days)
canal.instance.tsdb.snapshot.expire = 360

# aliyun ak/sk , support rds/mq
canal.aliyun.accessKey =
canal.aliyun.secretKey =

#################################################
#########               destinations            ############# 
#################################################

#canal实例
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5

#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml

canal.instance.global.mode = spring
canal.instance.global.lazy = false
#canal.instance.global.manager.address = 127.0.0.1:1099
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml

##################################################
#########                    MQ                      #############
##################################################

#kafka地址
canal.mq.servers = 10.40.2.94:9092,10.40.2.94:9093,10.40.2.94:9094
canal.mq.retries = 0
canal.mq.batchSize = 16384
canal.mq.maxRequestSize = 1048576
canal.mq.lingerMs = 100
canal.mq.bufferMemory = 33554432
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
canal.mq.flatMessage = true
canal.mq.compressionType = none
canal.mq.acks = all
# use transaction for kafka flatMessage batch produce
canal.mq.transaction = false
#canal.mq.properties. =

example/instance.properties:


#################################################
## mysql serverId , v1.0.26+ will autoGen
# canal.instance.mysql.slaveId=0

# enable gtid use true/false
canal.instance.gtidon=false

# position info
canal.instance.master.address=10.40.2.175:3306
canal.instance.master.journal.name= mysql-bin.000006
canal.instance.master.position= 123439266
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
canal.instance.tsdb.dbUsername=admin
canal.instance.tsdb.dbPassword=internal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=admin
canal.instance.dbPassword=internal
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
# 数据库及表过滤,这里我只抽取sourcedb的日志

canal.instance.filter.regex=sourcedb\\..*
# table black regex
canal.instance.filter.black.regex=

# mq config
# MQ/KAFka TOPIC配置

canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*

上述2个文件修改好了之后启动服务:

bin/startup.sh 

如果你设置的canal.serverMode=TCP,那么默认端口为11111, 如果你设置servermode为kafka,这个端口也就不存在了。

2. 卡夫卡配置及注意事项

canal默认使用的是1.1.1 scala 2.11版本的kafka。

		<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka -->
		<dependency>
			<groupId>org.apache.kafka</groupId>
			<artifactId>kafka_2.11</artifactId>
			<version>1.1.1</version>
		</dependency>

我的环境默认是1.0.1,测试多次都失败,然后我又用最新版2.1.0版本,也是测试失败。 无奈只能安装和canal一样的开源版本1.1.1 scala 2.11,这个地方浪费我了很多时间去检查为什么,最后还是版本问题。

另外canal提供的example 是可以使用的,但是提供的代码完全不可用,估计是哪里写的有问题。

下面就是测试kafka是否收到数据,这个地方也卡了我好久,因为我一直用 bin/kafka-console-consumer.sh , 这个命令不报错,但是根本不会消费任何数据,最后使用如下命令才正常。

kafka消费命令:

bin/kafka-simple-consumer-shell.sh  --broker-list 10.40.2.94:9092  --topic example
{"data":null,"database":"sourcedb","es":1555595121000,"id":2,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table jlwang4(id int)/*oggddlversion=144*/","sqlType":null,"table":"jlwang4","ts":1555595325843,"type":"CREATE"}
{"data":null,"database":"sourcedb","es":1555595360000,"id":2,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table jlwang5(id int)/*oggddlversion=154*/","sqlType":null,"table":"jlwang5","ts":1555595361084,"type":"CREATE"}
{"data":null,"database":"sourcedb","es":1555595549000,"id":7,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table jlwang6(id int)/*oggddlversion=156*/","sqlType":null,"table":"jlwang6","ts":1555595549508,"type":"CREATE"}
{"data":null,"database":"sourcedb","es":1555596117000,"id":2,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table jlwan8(id int)/*oggddlversion=168*/","sqlType":null,"table":"jlwan8","ts":1555596117855,"type":"CREATE"}
{"data":null,"database":"sourcedb","es":1555597933000,"id":2,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table jlwan9(id int)/*oggddlversion=174*/","sqlType":null,"table":"jlwan9","ts":1555597933438,"type":"CREATE"}
{"data":null,"database":"sourcedb","es":1555599029000,"id":22,"isDdl":true,"mysqlType":null,"old":null,"pkNames":null,"sql":"create table jlwan10(id int)/*oggddlversion=176*/","sqlType":null,"table":"jlwan10","ts":1555599029681,"type":"CREATE"}
{"data":[{"id":"100"}],"database":"sourcedb","es":1555599089000,"id":24,"isDdl":false,"mysqlType":{"id":"int"},"old":null,"pkNames":null,"sql":"","sqlType":{"id":4},"table":"jlwan9","ts":1555599089375,"type":"INSERT"}

canal大部分配置不需要改,所以整体比较简单就能配置好,感觉这个东西还不如maxwell做的好。



版权声明:本文为tom_fans原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。