MySQL中一條SQL使磁盤暴漲并導致MySQL Crash該怎么辦

138次閱讀

共計 11526 個字符，預計需要花費 29 分鐘才能閱讀完成。

今天就跟大家聊聊有關 MySQL 中一條 SQL 使磁盤暴漲并導致 MySQL Crash 該怎么辦，可能很多人都不太了解，為了讓大家更加了解，丸趣 TV 小編給大家總結了以下內容，希望大家根據這篇文章可以有所收獲。

收到一個 MySQL 實例的磁盤告警，看到監控圖，磁盤使用率增長速度非常快，在兩個多小時的時間里，已經漲了 170 多 G。
檢查到 binlog 目錄并不大，而 datadir 已經 180 多 G 了，發現 ibtmp1 非常大，并且持續增長。

趕緊上去看 processlist，線程不多，檢查到有一個 SELECT case … when … 的線程，狀態是 sending data，并且已經持續幾千秒了。
有點懵，沒有第一時間 kill 掉這個線程，再一次 show processlist 的時候……發現大概已經掛了………………

mysql show processlist;

ERROR 2006 (HY000): MySQL server has gone away

No connection. Trying to reconnect…

ERROR 2002 (HY000): Can t connect to local MySQL server through socket $datadir/mysqld.sock (2)

ERROR: Can t connect to the server

那好，既然已經掛了，那么開始找原因吧…

看一下 datadir 掛載的磁盤，已經釋放空了，果然已經被重啟了。
ext4 197G 4.6G 191G 3% $datadir

檢查一下 mysqld 進程，發現已經被 mysqld_safe 拉起來了。

檢查一下 error log：
紅字是關于 mysqld crash 的信息，很顯然，datadir 掛載的磁盤滿了。
沒有多余的空間寫 binlog 和 ibtmp1 臨時表空間文件（5.7 新增）。
而且 ibtmp1 文件最后達到了 201876045824bytes，將近 190G，而掛載的磁盤總大小才不到 200G。

藍字是關于 sql thread 的問題，在 mysqld 起來之后，sql thread 也出現了問題。
這個之后再修復。

[ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file $datadir/ibtmp1, desired size 67108864 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html

[ERROR] Disk is full writing $datadir/mysql-bin.000004 (Errcode: 15870576 – No space left on device). Waiting for someone to free space…

[ERROR] Retry in 60 secs. Message reprinted in 600 secs

[Warning] InnoDB: 1048576 bytes should have been written. Only 647168 bytes written. Retrying for the remaining bytes.

[Warning] InnoDB: Retry attempts for writing partial data failed.

[ERROR] InnoDB: Write to file $datadir/ibtmp1 failed at offset 201911697408, 1048576 bytes should have been written, only 647168 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.

[ERROR] InnoDB: Error number 28 means No space left on device

[Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html

[Warning] InnoDB: Error while writing 67108864 zeroes to $datadir/ibtmp1 starting at offset 201876045824

[ERROR] $basedir/bin/mysqld: The table $tmpdir/#sql_37c5_0 is full

[ERROR] InnoDB: posix_fallocate(): Failed to preallocate data for file ./thread_quartz/QRTZ_FIRED_TRIGGERS.ibd, desired size 32768 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html

[Warning] InnoDB: Retry attempts for writing partial data failed.

[Warning] InnoDB: Error while writing 32768 zeroes to ./thread_quartz/QRTZ_FIRED_TRIGGERS.ibd starting at offset 442362017-07-06T11:49:21.893377Z mysqld_safe Number of processes running now: 0

mysqld_safe mysqld restarted

……………………………………………………………………………………………………………………

[Note] InnoDB: Last MySQL binlog file position 0 690908428, file name mysql-bin.000004

[Note] InnoDB: Starting in background the rollback of uncommitted transactions

[Note] InnoDB: Rollback of non-prepared transactions completed

[Note] InnoDB: Removed temporary tablespace data file: ibtmp1

[Note] InnoDB: Creating shared tablespace for temporary tables

[Note] InnoDB: Setting file $datadir/ibtmp1 size to 12 MB. Physically writing the file full; Please wait …

[Note] InnoDB: File $datadir/ibtmp1 size is now 12 MB.

[Note] InnoDB: 96 redo rollback segment(s) found. 96 redo rollback segment(s) are active.

[Note] InnoDB: 32 non-redo rollback segment(s) are active.

[Note] InnoDB: Waiting for purge to start

[Note] InnoDB: 5.7.12 started; log sequence number 4828513952

[Note] InnoDB: page_cleaner: 1000ms intended loop took 7748ms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)

[Note] InnoDB: Loading buffer pool(s) from $datadir/ib_buffer_pool

[Note] Plugin FEDERATED is disabled.

[Note] InnoDB: Buffer pool(s) load completed at 170706 19:49:30

[Note] Recovering after a crash using $basedir/mysql-bin

[ERROR] Error in Log_event::read_log_event(): read error , data_len: 579, event_type: 2

[Note] Starting crash recovery…

[Note] InnoDB: Starting recovery for XA transactions…

[Note] InnoDB: Transaction 6729603 in prepared state after recovery

[Note] InnoDB: Transaction contains changes to 1 rows

[Note] InnoDB: 1 transactions in prepared state after recovery

[Note] Found 1 prepared transaction(s) in InnoDB

[Note] Crash recovery finished.

[Note] Crashed binlog file $basedir/mysql-bin.000004 size is 690909184, but recovered up to 690908428. Binlog trimmed to 690908428 bytes.

[Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key

……………………………………………………………………………………………………………………

[ERROR] Error in Log_event::read_log_event(): read error , data_len: 835, event_type: 2

[Warning] Error reading GTIDs from relaylog: -1

[Note] Slave I/O thread: Start asynchronous replication to master ***** in log mysql-bin.000014 at position 286486095

[Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the START SLAVE Syntax in the MySQL Manual for more information.

[Note] Slave I/O thread for channel : connected to master ***** ,replication started in log mysql-bin.000014 at position 286486095

[Warning] Slave SQL for channel : If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0

[Note] Slave SQL thread for channel initialized, starting replication in log mysql-bin.000014 at position 286485153, relay log $datadir/mysql-relay.000013 position: 286485326

[ERROR] Error in Log_event::read_log_event(): read error , data_len: 835, event_type: 2

[ERROR] Error reading relay log event for channel : slave SQL thread aborted because of I/O error

[ERROR] Slave SQL for channel : Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master s binary log is corrupted (you can check this by running mysqlbinlog on the binary log), the slave s relay log is corrupted (you can check this by running mysqlbinlog on the relay log), a network problem, or a bug in the master s or slave s MySQL code. If you want to check the master s binary log or slave s relay log, you will be able to know their names by issuing SHOW SLAVE STATUS on this slave. Error_code: 1594

[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with SLAVE START . We stopped at log mysql-bin.000014 position 286485153

[Note] Event Scheduler: Loaded 0 events

[Note] $basedir/mysqld: ready for connections.

Version: 5.7.12 socket: $datadir/mysqld.sock port: 3306 Source distribution

因為已經掛了，而且沒有來得及將當時執行的 sql 完整記錄下來，只能看到 show processlist 前面的一小部分。

聯系到業務，業務說這一臺是 slave，無讀的業務，但偶爾會手動執行報表相關的 SELECT 操作。

那么檢查一下 crash 前的兩個小時的 slow log：
發現反復有這么幾條 SQL：

……………………

00:00:11.392881 select * from tb ……………………

00:00:04.779748 select (case when ……………………

00:00:04.779748 select (case when t2.col1 ……………………

00:00:03.328248 select (case when t2.col1 ……………………

00:00:04.276773 select count(t1.id) from tb1

00:00:05.039027 select (case when t2.col ……………………

00:00:10.263063 select (case when t2.col ……………………

00:00:03.131713 select t2.* from tb1 t1 ……………………

00:00:15.909456 select t2.* from tb1 t1 ……………………

00:00:14.367047 select * from tb ……………………

……………………

雖然沒有一條超過 20 秒的，但不要忘了，在磁盤被打爆之前的一秒，有一條執行了幾千秒的 SQL。
并且 case when 只有一個 order by。
出于取出其中一條，檢查一下執行計劃：

+——–+—————+———+———+———————+——–+———-+———————————————-+

+——–+—————+———+———+———————+——–+———-+———————————————-+

+——–+—————+———+———+———————+——–+———-+———————————————-+

看起來不是很好，但是也不至于用到 190G 臨時表。
雖然臨時表是長期積累下來的，但看監控，磁盤使用率確實是在這一個多小時迅速積攢起來的。

首先這個從庫是沒有業務的，而是通過客戶那邊可能手動連接過來做一些查詢服務的。
知道這個信息之后，繼續對照 slow log，發現 slow log 關于 case when 的這一條 sql 并不完全一樣……
這可能意味著客戶仍然在不停地調試這條 SQL…
而那條執行了幾千秒的 case when 可能寫得十分爛。

可惜的是，在 mysqld crash 的最后一瞬間，該 sql 仍然沒有執行完成，導致沒有被記錄到 slow log。
那條 sql 估計也永遠無法再復現。

那么究竟是多爛的 SQL，或者多大的表，可以用到這么多基于磁盤的臨時表或文件排序呢？？
趕緊檢查一下庫表的情況：

① 表 1：100w，表 2：40w，也沒有 blob，text 等字段。
② 重啟后的 datadir 只有 3、4G 的數據量

所以我有個大但的想法：
可能客戶在調試的過程中，產生了一條類似這樣的 SQL：
我對照了已經記錄在 slow log 的幾條 sql，脫敏，格式化之后還原如下：

SELECT (CASE

WHEN t2.col= $info_000 THEN $a

WHEN t2.col= $info_001 THEN $b

WHEN t2.col= $info_002 THEN $c

WHEN t2.col= $info_003 THEN $d

WHEN t2.col= $info_004 THEN $e

WHEN t2.col= $info_005 THEN $f

WHEN t2.col= $info_006 THEN $g

ELSE t2.col

END ) AS 來源 XX,

(CASE

WHEN t2.clo2= $info_000 THEN $a

WHEN t2.clo2= $info_001 THEN $b

WHEN t2.clo2= $info_002 THEN $c

WHEN t2.clo2= $info_003 THEN $d

WHEN t2.clo2= $info_004 THEN $e

WHEN t2.clo2= $info_005 THEN $f

WHEN t2.clo2= $info_006 THEN $g

ELSE t2.col

END ) AS 目標 XX,

t2.col4 AS XX 時間,

t2.col5 AS 金額

FROM $tb1 t1 JOIN $tb2 t2

WHERE t1.col3 = 4 AND (t2.col LIKE $info% OR t2.clo2 LIKE $info%)

ORDER BY t1.col4 DESC;

對，這兩張表做了一個 JOIN，但是沒有條件。
產生了一個很大很大很大的笛卡爾積。

那么算一下：
列：13 + 19 = 32
行：1088181*440650 = 479,506,957,650 rows

再測試一下：
將兩張表 dump 成 sql 文件，看了一下大小：330MB，并不大。
導入自己的實驗環境，再將上述產生笛卡爾積的 SQL 跑一下。

經過一段時間等待：

已經執行了半個小時多了。并且該 SQL 仍然在執行：
說出來你們可能不信，此時的 ibtmp1 大小：

# du -sh *

4.0K auto.cnf

4.0K ib_buffer_pool

2.9G ibdata1

48M ib_logfile0

48M ib_logfile1

54G ibtmp1

12M mysql

1.1M performance_schema

676K sys

633M test

47M test_2

大概也好解釋，為什么兩張并不是很大的表，在執行某些查詢之后會產生這么大的臨時表了…

當然，在做實驗的等待時間里，已經將復制的 ERROR 給修復了。
當時復制的報錯：

……

Slave_IO_Running: Yes

Slave_SQL_Running: No

……

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 1594

Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master s binary log is corrupted (you can check this by running mysqlbinlog on the binary log), the slave s relay log is corrupted (you can check this by running mysqlbinlog on the relay log), a network problem, or a bug in the master s or slave s MySQL code. If you want to check the master s binary log or slave s relay log, you will be able to know their names by issuing SHOW SLAVE STATUS on this slave.

結合之前貼的 error log 可以看到更多信息，推斷是 relay log 有損壞。

那么就好解決了：

STOP SLAVE;

— master_log_file 和 master_log_pos 為報錯時所記錄的位置。

CHANGE MASTER master_log_file= xxx , master_log_pos=xxx;

START SLAVE;

此處不再贅述。

到目前為止，我實驗環境的 datadir 已經滿了，但是和此處正式環境有區別的是。
在磁盤寫滿之時，并沒有馬上導致 mysql crash，而是輸出了：

[ERROR] InnoDB: Write to file ./ibtmp1failed at offset 83579895808, 1048576 bytes should have been written, only 958464 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.

[ERROR] InnoDB: Error number 28 means No space left on device

[Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html

[Warning] InnoDB: Error while writing 67108864 zeroes to ./ibtmp1 starting at offset 83563118592

[ERROR] /data/mysql-base/mysql57/bin/mysqld: The table #sql_b26_0 is full

[ERROR] Disk is full writing /data/mysql-data/mysql57-3357/binlog/mysql-bin.000015 (Errcode: 15868064 – No space left on device). Waiting for someone to free space…

[ERROR] Retry in 60 secs. Message reprinted in 600 secs

實際上正常情況下，mysql 在磁盤滿了的情況下，會每隔 1 分鐘做一次檢查，每隔 10 分鐘將檢查信息輸出到錯誤日志。

既然 mysqld 沒有被干掉，我再次執行了一下那條可怕的 sql 呢？……

當然……error log 如愿輸出被 kill 的消息……
并且 mysqld 進程被殺掉，此時：

[ERROR] /data/mysql-base/mysql57/bin/mysqld: Binary logging not possible. Message: An error occurred during flush stage of the commit. binlog_error_action is set to ABORT_SERVER . Hence aborting the server.

14:42:48 UTC – mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

Attempting to collect some information that could help diagnose the problem.

As this is a crash and something is definitely wrong, the informati

有趣的是，“information”這個單詞都沒有輸出完，大概是真的一個字符都寫不下了吧。

試圖用 vim 在 error log 里添加幾個字符并保存，也報錯。

總結一下：
〇在線上環境執行的 sql，建議進行審核，報表等 OLAP 需求也需要正式一些。
〇考慮是否可以限制 ibtmp1 的大小，也就是設定 innodb_temp_data_file_path 的最大值。
〇可能需要定期重啟 mysqld 來收縮臨時表空間。
〇做好監控和及時響應，這次就是響應時間過長，達到閾值，然后到機子上檢查時，已經晚了，眼睜睜地看到 mysqld 在我眼前 crash。

看完上述內容，你們對 MySQL 中一條 SQL 使磁盤暴漲并導致 MySQL Crash 該怎么辦有進一步的了解嗎？如果還想了解更多知識或者相關內容，請關注丸趣 TV 行業資訊頻道，感謝大家的支持。

正文完