mysql mgr数据不一致_MGR主从不一致问题排查与修复

  • Post author:
  • Post category:mysql


运行环境

linux:CentOS release 6.8 (Final)

kernel:2.6.32-642.6.2.el6.x86_64

mysql Server version: 5.7.21-20-log Percona Server (GPL),MGR单主模式

1.故障现象

解析binlog

mysqlbinlog -v –base64-output=DECODE-ROWS mysql-bin.000042 –include-gtids=’1bb1b861-f776-11e6-3319-010204058070:113111655′

1bb1b861-f776-11e6-3319-010204058070:113111655 该GTID的binlog内容 ycc01 与ycc03服务器内容一致

ycc01.prod.bj2 mysql_3319

SET @@SESSION.GTID_NEXT= ‘1bb1b861-f776-11e6-3319-010204058070:113111655’/*!*/;

# at418534101#190412 12:41:44 server id 58070 end_log_pos 418534182 Query thread_id=4163376 exec_time=0 error_code=0SET TIMESTAMP=1555044104/*!*/;

BEGIN/*!*/;

# at418534182#190412 12:41:44 server id 58070 end_log_pos 418534262 Table_map: `ycc_cfcenter`.`config_center_client_list` mapped to number 108# at418534262#190412 12:41:44 server id 58070 end_log_pos 418534338 Delete_rows: table id 108flags: STMT_END_F

### DELETE FROM `ycc_cfcenter`.`config_center_client_list`

### WHERE

### @1=380183102### @2=206### @3=’10.1.97.131’### @4=’5566’### @5=1555044043209# at418534338#190412 12:41:44 server id 58070 end_log_pos 418534365 Xid = 604665111COMMIT/*!*/;

ycc03.prod.bj2 mysql_3319

SET @@SESSION.GTID_NEXT= ‘1bb1b861-f776-11e6-3319-010204058070:113111655’/*!*/;

# at747896436#190412 12:41:44 server id 58070 end_log_pos 747896500 Query thread_id=4163376 exec_time=0 error_code=0SET TIMESTAMP=1555044104/*!*/;

BEGIN/*!*/;

# at747896500#190412 12:41:44 server id 58070 end_log_pos 747896580 Table_map: `ycc_cfcenter`.`config_center_client_list` mapped to number 108# at747896580#190412 12:41:44 server id 58070 end_log_pos 747896656 Delete_rows: table id 108flags: STMT_END_F

### DELETE FROM `ycc_cfcenter`.`config_center_client_list`

### WHERE

### @1=380183102### @2=206### @3=’10.1.97.131’### @4=’5566’### @5=1555044043209# at747896656#190412 12:41:44 server id 58070 end_log_pos 747896683 Xid = 263747865COMMIT/*!*/;

mysqlbinlog -v –base64-output=DECODE-ROWS mysql-bin.000042 –include-gtids=’1bb1b861-f776-11e6-3319-010204058070:113111656′

1bb1b861-f776-11e6-3319-010204058070:113111656 该GTID的binlog内容 ycc01 与ycc03服务器内容不一致

ycc01.db.prod.bj2 mysql_3319

SET @@SESSION.GTID_NEXT= ‘1bb1b861-f776-11e6-3319-010204058070:113111656’/*!*/;

# at418534426#190412 12:41:44 server id 58070 end_log_pos 418534507 Query thread_id=4163392 exec_time=0 error_code=0SET TIMESTAMP=1555044104/*!*/;

BEGIN/*!*/;

# at418534507#190412 12:41:44 server id 58070 end_log_pos 418534587 Table_map: `ycc_cfcenter`.`config_center_client_list` mapped to number 108# at418534587#190412 12:41:44 server id 58070 end_log_pos 418534663 Delete_rows: table id 108flags: STMT_END_F

### DELETE FROM `ycc_cfcenter`.`config_center_client_list`

### WHERE

### @1=380183123### @2=136### @3=’10.1.94.137’### @4=’5566’### @5=1555044044493# at418534663#190412 12:41:44 server id 58070 end_log_pos 418534690 Xid = 604665114COMMIT/*!*/;

ycc03.db.prod.bj2 mysql_3319

SET @@SESSION.GTID_NEXT= ‘1bb1b861-f776-11e6-3319-010204058070:113111656’/*!*/;

# at747896744#190412 12:41:44 server id 58070 end_log_pos 747896808 Query thread_id=4163376 exec_time=1 error_code=0SET TIMESTAMP=1555044104/*!*/;

BEGIN/*!*/;

# at747896808#190412 12:41:44 server id 58070 end_log_pos 747896888 Table_map: `ycc_cfcenter`.`config_center_client_list` mapped to number 108# at747896888#190412 12:41:44 server id 58070 end_log_pos 747896964 Write_rows: table id 108flags: STMT_END_F

### INSERT INTO `ycc_cfcenter`.`config_center_client_list`

### SET

### @1=380183774### @2=206### @3=’10.1.97.131’### @4=’5566’### @5=1555044104617# at747896964#190412 12:41:44 server id 58070 end_log_pos 747896991 Xid = 263747869COMMIT/*!*/;

2.原因排查

官方已经反馈社区版MySQL 5.7.26和MySQL 8.0.16 中会修复,如果是企业版客户可以申请最新的hotfix版本。

在未升级 MySQL 版本前,若再发生此类故障,在修复时需要人工检查,检查切换时binlog中 GTID 信息与新主节点对应 GTID 的信息是否一致。

如果不一致需要人工修复至一致状态,一致才可以将被踢出的原主节点加回集群。

参考文档

3.修复过程

3.1 确定故障点

查看故障节点error.log

error.log

查看故障节点当前GTID

故障节点当前GTID

解析故障节点包含【1bb1b861-f776-11e6-3319-010204058070:1-113111661】 GTID 的binlog

##mysqlbinlog -vvv relay-bin-group_replication_applier.000053 –include-gtids=”1bb1b861-f776-11e6-3319-010204058070:113111661″

故障节点binlog解析

解析主节点binlog 确定相同内容binlog event 对应的GTID值

##mysqlbinlog -vvv mysql-bin.000042 –include-gtids=”1bb1b861-f776-11e6-3319-010204058070:113111660-113111665′

主库binlog解析

3.2 确定故障点结果汇总

1.故障节点提示 Duplicate entry ‘136-10.1.94.137-5566’ for key ‘uniq_idx_project_id’  有唯一约束冲突

2.故障节点当前GTID  【1bb1b861-f776-11e6-3319-010204058070:1-113111660】

3.故障节点与主节点GTID相差一个值,相同binlog event 主节点GTID【1bb1b861-f776-11e6-3319-010204058070:113111662】  故障节点【1bb1b861-f776-11e6-3319-010204058070:113111661】

3.3 故障节点修复过程

3.3.1 停止故障节点MGR服务

STOP group_replication;

SET GLOBAL super_read_only=0

3.3.2删除冲突键数据

DELETE FROM config_center_client_list WHERE project_id=136 AND client_ip=’10.1.94.137′ AND PORT=5566

3.3.3清除故障节点GTID信息

RESET MASTER

3.3.4重置故障节点GTID信息,故障节点因唯一约束冲突未重做的binlog event,在主节点对应的GTID为【1bb1b861-f776-11e6-3319-010204058070:113111662】。

SET @@GLOBAL.GTID_PURGED=’1bb1b861-f776-11e6-3319-010204058070:1-113111661′;

START group_replication;

SHOW MASTER STATUS

3.3.5 在故障节点执行,需要跳过的GTID只有一个,也可以用以下方法,这样就不需要reset masert。

STOP group_replication;

SET GLOBAL super_read_only=0

set sql_log_bin=0

DELETE FROM config_center_client_list WHERE project_id=136 AND client_ip=’10.1.94.137′ AND PORT=5566

set sql_log_bin=1

set session gtid_next=”1bb1b861-f776-11e6-3319-010204058070:113111661″;

begin;commit;

set session gtid_next=”AUTOMATIC”;

START group_replication;



版权声明:本文为weixin_39929723原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。