Mysql 5.7 Gtid內(nèi)部實(shí)際案例分析

139次閱讀

共計(jì) 7634 個(gè)字符，預(yù)計(jì)需要花費(fèi) 20 分鐘才能閱讀完成。

本篇內(nèi)容介紹了“Mysql 5.7 Gtid 內(nèi)部實(shí)際案例分析”的有關(guān)知識(shí)，在實(shí)際案例的操作過(guò)程中，不少人都會(huì)遇到這樣的困境，接下來(lái)就讓丸趣 TV 小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧！希望大家仔細(xì)閱讀，能夠?qū)W有所成！

一、觸發(fā)條件

本案列我測(cè)試過(guò) 4 個(gè)版本
percona Mysql 5.7.14
官方社區(qū) Mysql 5.7.17
percona Mysql 5.7.19
percona Mysql 5.7.15
其中 percona Mysql 5.7.14 和官方社區(qū) Mysql 5.7.17 有這個(gè)問(wèn)題。其他版本未知

已知 percona Mysql 5.7.14 或者官方社區(qū) Mysql 5.7.17。

mysqldump 備份沒(méi)有使用 -F, –flush-logs 選項(xiàng)

Gtid 打開(kāi)。

二、故障描述

本故障主要是新搭建的 Gtid 主從庫(kù)，運(yùn)行一段時(shí)間后重啟主從必然報(bào)錯(cuò)如下：

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log:  The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.

三、故障分析

為什么重啟后會(huì)報(bào)錯(cuò)找到不事物呢，然后發(fā)現(xiàn)這個(gè) Gtid 事物在主庫(kù)的 binlog 中已經(jīng)沒(méi)有了，應(yīng)該是很久以前的。其實(shí)這個(gè)問(wèn)題我們要回到 mysqldump 出來(lái)的文件如何進(jìn)行 Gtid 的初始化以及 mysql.gtid_executed 表中。
在 mysqldump 不使用 –set-gtid-purged 的時(shí)候必然會(huì)在 dump 出來(lái)的腳本中包含

-- GTID state at the beginning of the backup 
 SET @@GLOBAL.GTID_PURGED= e859a28b-b66d-11e7-8371-000c291f347d:1-41

這樣一個(gè)設(shè)置 GTID_PURGED 的語(yǔ)句，它包含了主庫(kù)上已經(jīng)執(zhí)行的全部 Gtid 事物。從第五節(jié)的源碼和總結(jié)部分我們知道這個(gè)語(yǔ)句至少做了三個(gè)更改(DBA 可見(jiàn)的只有三個(gè))：

mysql.gtid_executed 表的寫(xiě)入

gtid_executed 變量的修改

gtid_purged 變量的修改

而完成了這一步實(shí)際上 mysql.gtid_executed 表是包含了全部的執(zhí)行過(guò)的 Gtid 事物的，但是隨后我們看到 dump 腳本包含了如下語(yǔ)句

 210 -- Table structure for table `gtid_executed`
 211 --
 212
 213 DROP TABLE IF EXISTS `gtid_executed`;
 214 /*!40101 SET @saved_cs_client = @@character_set_client */;
 215 /*!40101 SET character_set_client = utf8 */;
 216 CREATE TABLE `gtid_executed` ( 217 `source_uuid` char(36) NOT NULL COMMENT  uuid of the source where the transaction was originally executed. ,
 218 `interval_start` bigint(20) NOT NULL COMMENT  First number of interval. ,
 219 `interval_end` bigint(20) NOT NULL COMMENT  Last number of interval. ,
 220 PRIMARY KEY (`source_uuid`,`interval_start`)
 221 ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
 222 /*!40101 SET character_set_client = @saved_cs_client */;
 223
 224 --
 225 -- Dumping data for table `gtid_executed`
 226 --
 227
 228 LOCK TABLES `gtid_executed` WRITE;
 229 /*!40000 ALTER TABLE `gtid_executed` DISABLE KEYS */;
 230 INSERT INTO `gtid_executed` VALUES (e859a28b-b66d-11e7-8371-000c291f347d ,1,32);
 231 /*!40000 ALTER TABLE `gtid_executed` ENABLE KEYS */;
 232 UNLOCK TABLES;

顯然這里我們?cè)?source 的時(shí)候從庫(kù)的 mysql.gtid_executed 將被重新初始化為：

e859a28b-b66d-11e7-8371-000c291f347d ,1,32

而實(shí)際的已經(jīng)執(zhí)行過(guò)的 Gtid 是：

e859a28b-b66d-11e7-8371-000c291f347d:1-41

如前文第五節(jié)我們通過(guò)源碼分析后總結(jié)如下：

mysql.gtid_executed 表修改時(shí)機(jī)
在 binlog 發(fā)生切換 (rotate) 的時(shí)候保存直到上一個(gè) binlog 文件執(zhí)行過(guò)的全部 Gtid，它不是實(shí)時(shí)更新的。

因此此時(shí)表中并沒(méi)有完全包含全部執(zhí)行過(guò)的 Gtid 事物，而在前文第六節(jié)的源碼分析中我們知道在 Gtid 模塊啟動(dòng)的時(shí)候必須要讀取兩個(gè) Gtid 持久化的介質(zhì)：

mysql.gtid_executed

binlog

來(lái)判斷 Gtid 的集合，顯然從庫(kù)不可能在 binlog 包含這個(gè) Gtid 事物，所以這樣的操作步驟就導(dǎo)致了數(shù)據(jù)庫(kù)從庫(kù)后的報(bào)錯(cuò)，而這里的正確的步驟是壓根不進(jìn)行 mysql.gtid_executed 的重建和導(dǎo)入，我發(fā)現(xiàn)在 percona Mysql 5.7.15 和 percona Mysql 5.7.19 正是這樣的。但是為了防范這個(gè)問(wèn)題，我在搭建的 Gtid 從庫(kù)導(dǎo)完數(shù)據(jù)后加入了兩個(gè)個(gè)步驟如下：

reset master；set global gtid_purged=e859a28b-b66d-11e7-8371-000c291f347d:1-41

這兩步也就是為了從新初始化 mysql.gtid_executed 表，讓其正確。
此問(wèn)題還可以在 mysqldump 的時(shí)候加入 -F, –flush-logs 選項(xiàng)規(guī)避，但是 - F 會(huì)加入如下的 MDL LOCK：

2017-12-18T08:16:39.580985Z 6 Query FLUSH /*!40101 LOCAL */ TABLES
2017-12-18T08:16:39.612598Z 6 Query FLUSH TABLES WITH READ LOCK
2017-12-18T08:16:39.613406Z 6 Refresh
/root/mysql5.7.14/percona-server-5.7.14-7/sql/mysqld, Version: 5.7.14-7-debug-log (Source distribution). started with:
Tcp port: 13001 Unix socket: /root/mysql5.7.14/percona-server-5.7.14-7/mysql-test/var/tmp/mysqld.1.sock
Time Id Command Argument
2017-12-18T08:16:39.965847Z 6 Query SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2017-12-18T08:16:39.966298Z 6 Query START TRANSACTION /*!40100 WITH CONSISTENT SNAPSHOT */
2017-12-18T08:16:39.966792Z 6 Query SHOW VARIABLES LIKE  gtid\_mode 
2017-12-18T08:16:39.969587Z 6 Query SELECT @@GLOBAL.GTID_EXECUTED
2017-12-18T08:16:39.970216Z 6 Query SHOW STATUS LIKE  binlog_snapshot_% 
2017-12-18T08:16:39.975242Z 6 Query UNLOCK TABLES

這把鎖是 GLOBAL 級(jí)別的 MDL_SHARED(S)鎖，它會(huì)等到你說(shuō)有的 SELECT/DML/DDL 語(yǔ)句結(jié)束后才能獲得，同時(shí)會(huì)堵塞全部的 SELECT/DML/DDL 雖然持有時(shí)間很短如下：

mysql  flush tables with read lock;
Query OK, 0 rows affected (0.01 sec)
2017-08-03T18:19:11.603911Z 3 [Note] (acquire_lock)THIS MDL LOCK acquire ok!
2017-08-03T18:19:11.603947Z 3 [Note] (MDL PRINT) Thread id is 3: 
2017-08-03T18:19:11.603971Z 3 [Note] (--- MDL PRINT) Namespace is:GLOBAL 
2017-08-03T18:19:11.603994Z 3 [Note] (----- MDL PRINT) Mdl type is:MDL_SHARED(S) 
2017-08-03T18:19:11.604045Z 3 [Note] (------ MDL PRINT) Mdl duration is:MDL_EXPLICIT 
2017-08-03T18:19:11.604073Z 3 [Note] (------- MDL PRINT) Mdl status is:EMPTY 
2017-08-03T18:19:11.604133Z 3 [Note] (acquire_lock)THIS MDL LOCK acquire ok!

當(dāng)然要了解 MDL LOCK 的朋友可以參考我的文章
http://blog.itpub.net/7728585/viewspace-2143093/
MYSQL METADATA LOCK(MDL LOCK) 學(xué)習(xí)(1) 理論知識(shí)和加鎖類(lèi)型測(cè)試

四、故障模擬

知道了原因后也是很好模擬我使用的版本是社區(qū)版 5.7.17，搭建過(guò)程就是前面說(shuō)的步驟。只是導(dǎo)完數(shù)據(jù)后不使用 reset master 和 set gtid_purged 表進(jìn)行重新初始化 mysql.gtid_executed 表。搭建完成后做幾個(gè)事物狀態(tài)正常如下：

mysql  show slave status \G
*************************** 1. row ***************************
 Master_Log_File: binlog.000002
 Read_Master_Log_Pos: 5077
 Relay_Log_File: test1-relay-bin.000002
 Relay_Log_Pos: 2498
 Relay_Master_Log_File: binlog.000002
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes
 Exec_Master_Log_Pos: 5077
 Relay_Log_Space: 2705
 Last_IO_Errno: 0
 Last_IO_Error: 
 Seconds_Behind_Master: 0
 Retrieved_Gtid_Set: e859a28b-b66d-11e7-8371-000c291f347d:42-49
 Executed_Gtid_Set: e859a28b-b66d-11e7-8371-000c291f347d:1-49
 Auto_Position: 1

但是這個(gè)時(shí)候我們發(fā)現(xiàn) mysql.gtid_executed 表已經(jīng)出現(xiàn)了問(wèn)題如下：

mysql  select * from gtid_executed;
+--------------------------------------+----------------+--------------+
| source_uuid | interval_start | interval_end |
+--------------------------------------+----------------+--------------+
| e859a28b-b66d-11e7-8371-000c291f347d | 1 | 32 |
| e859a28b-b66d-11e7-8371-000c291f347d | 42 | 42 |
| e859a28b-b66d-11e7-8371-000c291f347d | 43 | 43 |
| e859a28b-b66d-11e7-8371-000c291f347d | 44 | 44 |
| e859a28b-b66d-11e7-8371-000c291f347d | 45 | 45 |
| e859a28b-b66d-11e7-8371-000c291f347d | 46 | 46 |
| e859a28b-b66d-11e7-8371-000c291f347d | 47 | 47 |
| e859a28b-b66d-11e7-8371-000c291f347d | 48 | 48 |
| e859a28b-b66d-11e7-8371-000c291f347d | 49 | 49 |
+--------------------------------------+----------------+--------------+

很容易發(fā)現(xiàn) 33-41 之間是沒(méi)有持久化的。如果這個(gè)時(shí)候如果我們使用 purge binary logs to 來(lái)清理掉主庫(kù)的日志那么必將出現(xiàn)問(wèn)題，如果不清理也會(huì)出現(xiàn) Gtid 事物重新執(zhí)行的情況。我們做清理模擬線上錯(cuò)誤。主庫(kù)執(zhí)行：

mysql  show binary logs;
+---------------+-----------+
| Log_name | File_size |
+---------------+-----------+
| binlog.000001 | 9974 |
| binlog.000002 | 5121 |
| binlog.000003 | 194 |
+---------------+-----------+
3 rows in set (0.01 sec)
mysql  purge binary logs to  binlog.000003 
Query OK, 0 rows affected (0.04 sec)
mysql  show binary logs;
+---------------+-----------+
| Log_name | File_size |
+---------------+-----------+
| binlog.000003 | 194 |
+---------------+-----------+
1 row in set (0.00 sec)

備庫(kù)重啟后錯(cuò)誤重現(xiàn)：

mysql  show slave status \G
*************************** 1. row ***************************
 Slave_IO_State: 
 Master_Host: 192.168.190.62
 Master_User: repl
 Master_Port: 3308
 Connect_Retry: 60
 Master_Log_File: binlog.000003
 Read_Master_Log_Pos: 194
 Relay_Log_File: test1-relay-bin.000005
 Relay_Log_Pos: 4
 Relay_Master_Log_File: binlog.000003
 Slave_IO_Running: No
 Slave_SQL_Running: Yes
 Exec_Master_Log_Pos: 194
 Relay_Log_Space: 194
 Last_IO_Errno: 1236
 Last_IO_Error: Got fatal error 1236 from master when reading data from binary log:  The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.
 Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
 Retrieved_Gtid_Set: e859a28b-b66d-11e7-8371-000c291f347d:42-49
 Executed_Gtid_Set: e859a28b-b66d-11e7-8371-000c291f347d:1-32:42-49
 Auto_Position: 1

我們發(fā)現(xiàn) I /O thread 試圖獲取主庫(kù)的 33-41 的 Gtid 事物的事物，已經(jīng)不能獲取，實(shí)際上即使能獲取也會(huì)造成事物的重新執(zhí)行，我們看到 Executed_Gtid_Set 已經(jīng)出現(xiàn)了兩個(gè)連續(xù)的區(qū)間：

Executed_Gtid_Set: e859a28b-b66d-11e7-8371-000c291f347d:1-32:42-49

“Mysql 5.7 Gtid 內(nèi)部實(shí)際案例分析”的內(nèi)容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識(shí)可以關(guān)注丸趣 TV 網(wǎng)站，丸趣 TV 小編將為大家輸出更多高質(zhì)量的實(shí)用文章！

正文完

發(fā)表至：數(shù)據(jù)庫(kù)

2023-07-20

版權(quán)聲明：本站原創(chuàng)文章，由丸趣 2023-07-20發(fā)表，共計(jì)7634字。

轉(zhuǎn)載說(shuō)明：除特殊說(shuō)明外本站除技術(shù)相關(guān)以外文章皆由網(wǎng)絡(luò)搜集發(fā)布，轉(zhuǎn)載請(qǐng)注明出處。

navicat外鍵的設(shè)置方法

oracle數(shù)據(jù)庫(kù)備份和恢復(fù)的過(guò)程是怎樣的

MySQL優(yōu)化原理是什么

如何解決SQLite數(shù)據(jù)庫(kù)的內(nèi)存占用過(guò)高問(wèn)題

opatch auto如何將11.2.0.4 rac升級(jí)到11.2.0.4.160419

久久精品人人爽,华人av在线,亚洲性视频网站,欧美专区一二三

Mysql 5.7 Gtid內(nèi)部實(shí)際案例分析