MySQL高可用架構之MHA的原理分析

131次閱讀

沒有評論

共計 57642 個字符，預計需要花費 145 分鐘才能閱讀完成。

這篇文章主要介紹了 MySQL 高可用架構之 MHA 的原理分析，具有一定借鑒價值，感興趣的朋友可以參考下，希望大家閱讀完這篇文章之后大有收獲，下面讓丸趣 TV 小編帶著大家一起了解一下。

MHA 角色部署

MHA 服務有兩種角色，MHA Manager（管理節點）和 MHA Node（數據節點）：

MHA Manager：通常單獨部署在一臺獨立的機器上或者直接部署在其中一臺 slave 上（不建議后者），管理多個 master/slave 集群，每個 master/slave 集群稱作一個 application；其作用有二：

（1）master 自動切換及故障轉移命令運行

（2）其他的幫助腳本運行：手動切換 master；master/slave 狀態檢測

MHA node：運行在每臺 MySQL 服務器上（master/slave/manager），它通過監控具備解析和清理 logs 功能的腳本來加快故障轉移。其作用有：

（1）復制主節點的 binlog 數據

（2）對比從節點的中繼日志文件

（3）無需停止從節點的 SQL 線程，定時刪除中繼日志

目前 MHA 主要支持一主多從的架構，要搭建 MHA, 要求一個復制集群中必須最少有三臺數據庫服務器，一主二從，即一臺充當 master，一臺充當備用 master，另外一臺充當從庫，因為至少需要三臺服務器，出于機器成本的考慮，淘寶也在該基礎上進行了改造，目前淘寶 TMHA 已經支持一主一從。

我們自己使用其實也可以使用 1 主 1 從，但是 master 主機宕機后無法切換，以及無法補全 binlog。master 的 mysqld 進程 crash 后，還是可以切換成功，以及補全 binlog 的。

官方介紹：
https://code.google.com/p/mysql-master-ha/

下圖展示了如何通過 MHA Manager 管理多組主從復制。可以將 MHA 工作原理總結為如下：

（1）從宕機崩潰的 master 保存二進制日志事件（binlog events）;

（2）識別含有最新更新的 slave；

（3）應用差異的中繼日志（relay log）到其他的 slave；

（4）應用從 master 保存的二進制日志事件（binlog events）；

（5）提升一個 slave 為新的 master；

（6）使其他的 slave 連接新的 master 進行復制；

MHA 組件

(1)、Manager 工具:

– masterha_check_ssh : 檢查 MHA 的 SSH 配置。

– masterha_check_repl : 檢查 MySQL 復制。

– masterha_manager : 啟動 MHA。

– masterha_check_status : 檢測當前 MHA 運行狀態。

– masterha_master_monitor : 監測 master 是否宕機。

– masterha_master_switch : 控制故障轉移 (自動或手動)。

– masterha_conf_host : 添加或刪除配置的 server 信息。

(2)、Node 工具 (這些工具通常由 MHAManager 的腳本觸發, 無需人手操作)。

– save_binary_logs : 保存和復制 master 的二進制日志。

– apply_diff_relay_logs : 識別差異的中繼日志事件并應用于其它 slave。

– filter_mysqlbinlog : 去除不必要的 ROLLBACK 事件 (MHA 已不再使用這個工具)。

– purge_relay_logs : 清除中繼日志 (不會阻塞 SQL 線程)。

(3)、自定義擴展：

-secondary_check_script：通過多條網絡路由檢測 master 的可用性；

-master_ip_failover_script：更新 application 使用的 masterip；（需要修改）

-shutdown_script：強制關閉 master 節點；

-report_script：發送報告；

-init_conf_load_script：加載初始配置參數；

-master_ip_online_change：更新 master 節點 ip 地址；（需要修改）

MHA 環境準備

OS：CentOS 6.8

MySQL：5.7.18

MHA 軟件包：MHA 0.57

角色 ip 地址主機名 server_id 類型
Master 10.180.2.163 MHA-M1 13306 寫入
S1 10.180.2.164 MHA-S1 23306 讀（其實可以一起部署監控，一組 MHA 可以多個監控節點）
S2 10.180.2.165 MHA-S2 33306 讀，監控復制組（監控一般不能部署到 master 節點，防止 Master 宕機不能切換）

安裝 MHA Node 包

（1）在所有節點安裝 MHA node 所需的 perl 模塊（DBD:mysql），并下載 MHA 軟件包

?

12yum
install
perl-DBD-MySQL -y (可能需要 epel 源)https://mega.nz/#F!G4oRjARB!SWzFS59bUv9VrKwdAeIGVw（MHA0.57）

（2）在所有的節點安裝 mha node（包括 Manager 節點）：

tar xf mha4mysql-node-0.57.tar.gz
cd mha4mysql-node-0.57perl Makefile.PLmake   make install

安裝完成將產生文件如下：

[root@MHA-S1 bin]# ll
total 48-r-xr-xr-x 1 root root 16381 Aug 7 14:06 apply_diff_relay_logs-r-xr-xr-x 1 root root 4807 Aug 7 14:06 filter_mysqlbinlog
lrwxrwxrwx 1 root root 26 Aug 8 17:10 mysql -  /usr/local/mysql/bin/mysql
lrwxrwxrwx 1 root root 32 Aug 8 17:09 mysqlbinlog -  /usr/local/mysql/bin/mysqlbinlog-r-xr-xr-x 1 root root 8261 Aug 7 14:06 purge_relay_logs-rwxr-xr-x 1 root root 314 Aug 8 16:21 purge_relay.sh-r-xr-xr-x 1 root root 7525 Aug 7 14:06 save_binary_logs
[root@MHA-S1 bin]# pwd/usr/local/bin

增加系統環境變量：

echo  export PATH=\$PATH:/usr/local/bin    /etc/profile 
source ~/.bash_profile

安裝 MHA Manager 包

tar xf mha4mysql-node-0.57.tar.gz
cd mha4mysql-node-0.57perl Makefile.PLmake   make install

安裝完成后會在 /usr/local/bin 目錄下面生成以下腳本文件

[root@MHA-S2 bin]# pwd/usr/local/bin
[root@MHA-S2 bin]# ll
total 140-r-xr-xr-x 1 root root 16381 Aug 7 14:07 apply_diff_relay_logs-r-xr-xr-x 1 root root 4807 Aug 7 14:07 filter_mysqlbinlog-rwxr-xr-x 1 root root 166 Aug 9 17:18 manager.sh-r-xr-xr-x 1 root root 1995 Aug 7 17:28 masterha_check_repl-r-xr-xr-x 1 root root 1779 Aug 7 17:28 masterha_check_ssh-r-xr-xr-x 1 root root 1865 Aug 7 17:28 masterha_check_status-r-xr-xr-x 1 root root 3201 Aug 7 17:28 masterha_conf_host-r-xr-xr-x 1 root root 2517 Aug 7 17:28 masterha_manager-r-xr-xr-x 1 root root 2165 Aug 7 17:28 masterha_master_monitor-r-xr-xr-x 1 root root 2373 Aug 7 17:28 masterha_master_switch-r-xr-xr-x 1 root root 5171 Aug 7 17:28 masterha_secondary_check-r-xr-xr-x 1 root root 1739 Aug 7 17:28 masterha_stop-rwxr-xr-x 1 root root 2169 Aug 9 10:49 master_ip_failover-rwxr-xr-x 1 root root 3648 Aug 7 17:30 master_ip_failover.old-rwxr-xr-x 1 root root 10369 Aug 12 21:33 master_ip_online_change-rwxr-xr-x 1 root root 9870 Aug 7 17:30 master_ip_online_change.old
lrwxrwxrwx 1 root root 26 Aug 8 17:10 mysql -  /usr/local/mysql/bin/mysql
lrwxrwxrwx 1 root root 32 Aug 8 17:09 mysqlbinlog -  /usr/local/mysql/bin/mysqlbinlog-rw------- 1 root root 0 Aug 12 20:04 nohup.out-rwxr-xr-x 1 root root 11867 Aug 7 17:30 power_manager-r-xr-xr-x 1 root root 8261 Aug 7 14:07 purge_relay_logs-rwxr-xr-x 1 root root 314 Aug 8 16:20 purge_relay.sh-r-xr-xr-x 1 root root 7525 Aug 7 14:07 save_binary_logs-rwxr-xr-x 1 root root 1360 Aug 7 17:30 send_report

復制相關腳本到 /usr/local/bin 目錄 (軟件包解壓縮后就有了，不是必須，因為這些腳本不完整，需要自己修改，這是軟件開發著留給我們自己發揮的, 如果開啟下面的任何一個腳本對應的參數，而對應這里的腳本又沒有修改，則會拋錯，自己被坑的很慘)

[root@MHA-S2 scripts]# ll
total 32
-rwxr-xr-x 1 root root 3443 Jan 8 2012 master_ip_failover # 自動切換時 vip 管理的腳本，不是必須，如果我們使用 keepalived 的，我們可以自己編寫腳本完成對 vip 的管理，比如監控 mysql，如果 mysql 異常，我們停止 keepalived 就行，這樣 vip 就會自動漂移
-rwxr-xr-x 1 root root 9186 Jan 8 2012 master_ip_online_change #在線切換時 vip 的管理，不是必須，同樣可以可以自行編寫簡單的 shell 完成
-rwxr-xr-x 1 root root 11867 Jan 8 2012 power_manager #故障發生后關閉主機的腳本，不是必須
-rwxr-xr-x 1 root root 1360 Jan 8 2012 send_report #因故障切換后發送報警的腳本，不是必須，可自行編寫簡單的 shell 完成。
[root@MHA-S2 scripts]# cp * /usr/local/bin/

配置 SSH 登錄無密碼驗證

ssh-keygenssh-copy-id root@xxx (XXX  請包括自己，要不然后面 check-ssh 那步要杯具的)

搭建主從復制環境

詳解之前雙主復制環境搭建文檔

保證兩臺 Slave 都搭建成功

Slave_IO_Running: Yes
Slave_SQL_Running: Yes

 兩臺 slave 服務器設置 read_only（從庫對外提供讀服務，只所以沒有寫進配置文件，是因為隨時 slave 會提升為 master）

root@localhost:mysql3306.sock [(none)] set global read_only=1

創建監控用戶（在 master 上執行）

grant all privileges on *.* to root@ %  identified by  123456

flush privileges;

至此，復制搭建完畢，后面配置 MHA

MHA 環境配置

（1）創建 MHA 工作目錄

mkdir -p /etc/mha

修改 app1.cnf 配置文件，修改后的文件內容如下：

[root@MHA-S2 ~]# /etc/mha/=/var/log/masterha/app1/=/var/log/masterha/=/data/mysql//=/usr/local/bin/=/usr/local/bin/===/===/usr/local/bin/=/usr/local/bin/masterha_secondary_check -s MHA-S1 -s MHA-===MHA-==// 設置為候選 master，如果設置該參數以后，發生主從切換以后將會將此從庫提升為主庫，即使這個主庫不是集群中事件最新的 slave
check_repl_delay=// 默認情況下如果一個 slave 落后 master 100M 的 relay logs 的話，MHA 將不會選擇該 slave 作為一個新的 master，因為對于這個 slave 的恢復需要花費很長時間，通過設置 check_repl_delay=0,MHA 觸發切換在選擇一個新的 master 的時候將會忽略復制延時，這個參數對于設置了 candidate_master= 1 的主機非常有用，因為這個候選主在切換的過程中一定是新的 master

=MHA-S1 
port=

=MHA-=

（2）設置 relay log 的清除方式（在每個 slave 節點上）：

?

1 set global relay_log_purge=0

注意：

MHA 在發生切換的過程中，從庫的恢復過程中依賴于 relay log 的相關信息，所以這里要將 relay log 的自動清除設置為 OFF，采用手動清除 relay log 的方式。在默認情況下，從服務器上的中繼日志會在 SQL 線程執行完畢后被自動刪除。但是在 MHA 環境中，這些中繼日志在恢復其他從服務器時可能會被用到，因此需要禁用中繼日志的自動刪除功能。定期清除中繼日志需要考慮到復制延時的問題。在 ext3 的文件系統下，刪除大的文件需要一定的時間，會導致嚴重的復制延時。為了避免復制延時，需要暫時為中繼日志創建硬鏈接，因為在 linux 系統中通過硬鏈接刪除大文件速度會很快。（在 mysql 數據庫中，刪除大表時，通常也采用建立硬鏈接的方式）

MHA 節點中包含了 pure_relay_logs 命令工具，它可以為中繼日志創建硬鏈接，執行 SET GLOBAL relay_log_purge=1, 等待幾秒鐘以便 SQL 線程切換到新的中繼日志，再執行 SET GLOBAL relay_log_purge=0。

pure_relay_logs 腳本參數如下所示：

--user mysql  用戶名 --password mysql  密碼 --port  端口號 --workdir  指定創建 relay log 的硬鏈接的位置，默認是 /var/tmp，由于系統不同分區創建硬鏈接文件會失敗，故需要執行硬鏈接具體位置，成功執行腳本后，硬鏈接的中繼日志文件被刪除 --disable_relay_log_purge  默認情況下，如果 relay_log_purge=1，腳本會什么都不清理，自動退出，通過設定這個參數，當 relay_log_purge= 1 的情況下會將 relay_log_purge 設置為 0。清理 relay log 之后，最后將參數設置為 OFF。

（3）設置定期清理 relay 腳本 (例如每天一次，所有服務器）

[root@MHA-S2 bin]# purge_relay.!/bin/====== [ ! -
 $log_dir ---user=$user --password=$ --disable_relay_log_purge --port=$port --workdir=$work_dir   $log_dir/purge_relay_logs.log

 添加到 crontab
[root@MHA-S2 bin]# crontab -l0 4 * * * /bin/bash /root/purge_relay_log.sh

可以手工執行以下是否會報錯

檢查 SSH 配置

[root@MHA-S2 bin]# masterha_check_ssh --conf=/etc/mha/app1.cnf 
Mon Aug 14 18:07:02 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Aug 14 18:07:02 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Aug 14 18:07:02 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Mon Aug 14 18:07:02 2017 - [info] Starting SSH connection tests..
Mon Aug 14 18:07:03 2017 - [debug] 
Mon Aug 14 18:07:02 2017 - [debug] Connecting via SSH from root@MHA-M1(10.180.2.163:22) to root@MHA-S1(10.180.2.164:22)..
Mon Aug 14 18:07:02 2017 - [debug] ok.
Mon Aug 14 18:07:02 2017 - [debug] Connecting via SSH from root@MHA-M1(10.180.2.163:22) to root@MHA-S2(10.180.2.165:22)..
Mon Aug 14 18:07:03 2017 - [debug] ok.
Mon Aug 14 18:07:03 2017 - [debug] 
Mon Aug 14 18:07:03 2017 - [debug] Connecting via SSH from root@MHA-S1(10.180.2.164:22) to root@MHA-M1(10.180.2.163:22)..
Mon Aug 14 18:07:03 2017 - [debug] ok.
Mon Aug 14 18:07:03 2017 - [debug] Connecting via SSH from root@MHA-S1(10.180.2.164:22) to root@MHA-S2(10.180.2.165:22)..
Mon Aug 14 18:07:03 2017 - [debug] ok.
Mon Aug 14 18:07:04 2017 - [debug] 
Mon Aug 14 18:07:03 2017 - [debug] Connecting via SSH from root@MHA-S2(10.180.2.165:22) to root@MHA-M1(10.180.2.163:22)..
Mon Aug 14 18:07:03 2017 - [debug] ok.
Mon Aug 14 18:07:04 2017 - [debug] Connecting via SSH from root@MHA-S2(10.180.2.165:22) to root@MHA-S1(10.180.2.164:22)..
Mon Aug 14 18:07:04 2017 - [debug] ok.
Mon Aug 14 18:07:04 2017 - [info] All SSH connection tests passed successfully.

檢查整個復制環境狀況

發現有報錯，

Tue Aug 8 17:46:31 2017 - [info] Checking master_ip_failover_script status:
Tue Aug 8 17:46:31 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306 Bareword  FIXME_xxx  not allowed while  strict subs  in use at /usr/local/bin/master_ip_failover line 93.
Execution of /usr/local/bin/master_ip_failover aborted due to compilation errors.
Tue Aug 8 17:46:31 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln229] Failed to get master_ip_failover_script status with return code 255:0.Tue Aug 8 17:46:31 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/bin/masterha_check_repl line 48Tue Aug 8 17:46:31 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Tue Aug 8 17:46:31 2017 - [info] Got exit code 1 (Not master dead).

原來 Failover 兩種方式：一種是虛擬 IP 地址，一種是全局配置文件。MHA 并沒有限定使用哪一種方式，而是讓用戶自己選擇，虛擬 IP 地址的方式會牽扯到其它的軟件, 比如 keepalive 軟件，而且還要修改腳本 master_ip_failover。這里先把 app1.cnf 里面 master_ip_failover_script= /usr/local/bin/master_ip_failover 這個選項屏蔽才可以通過。

#master_ip_failover_script= /usr/local/bin/master_ip_failover 
Tue Aug 8 17:49:40 2017 - [info] Got exit code 0 (Not master dead). 
 
MySQL Replication Health is OK.

檢查 MHA Manager 的狀態

[root@MHA-S2 mha]# masterha_check_status --conf=/etc/mha/app1.cnf 
app1 is stopped(2:NOT_RUNNING).

手動啟動

[root@MHA-S2 mha]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover   /dev/null   /var/log/masterha/app1/manager.log 2 1  [1] 16774[root@MHA-S2 mha]# ps -ef|grep masterha
root 16774 15297 4 17:52 pts/3 00:00:00 perl /usr/local/bin/masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover
[root@MHA-S2 mha]# masterha_check_status --conf=/etc/mha/app1.cnf 
app1 (pid:16774) is running(0:PING_OK), master:MHA-M1

–remove_dead_master_conf 該參數代表當發生主從切換后，老的主庫的 ip 將會從配置文件中移除。（如果發生異常切換之后修復了舊的 master，要加進去新的 MHA 的話，必須記得 app1.cnf 回補 server1 的信息）

–manger_log 日志存放位置

–ignore_last_failover 在缺省情況下，如果 MHA 檢測到連續發生宕機，且兩次宕機間隔不足 8 小時的話，則不會進行 Failover，之所以這樣限制是為了避免 ping-pong 效應。該參數代表忽略上次 MHA 觸發切換產生的文件，默認情況下，MHA 發生切換后會在日志目錄，也就是上面我設置的 /data 產生 app1.failover.complete 文件，下次再次切換的時候如果發現該目錄下存在該文件將不允許觸發切換，除非在第一次切換后收到刪除該文件，為了方便，這里設置為 –ignore_last_failover。

檢查啟動日志

[root@MHA-S2 app1]# vi manager.log 
Tue Aug 8 17:52:37 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Aug 8 17:52:37 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Tue Aug 8 17:52:37 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Tue Aug 8 17:52:37 2017 - [info] MHA::MasterMonitor version 0.57.
Tue Aug 8 17:52:38 2017 - [info] GTID failover mode = 1Tue Aug 8 17:52:38 2017 - [info] Dead Servers:
Tue Aug 8 17:52:38 2017 - [info] Alive Servers:
Tue Aug 8 17:52:38 2017 - [info] MHA-M1(10.180.2.163:3306)
Tue Aug 8 17:52:38 2017 - [info] MHA-S1(10.180.2.164:3306)
Tue Aug 8 17:52:38 2017 - [info] MHA-S2(10.180.2.165:3306)
Tue Aug 8 17:52:38 2017 - [info] Alive Slaves:
Tue Aug 8 17:52:38 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Tue Aug 8 17:52:38 2017 - [info] GTID ON
Tue Aug 8 17:52:38 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Tue Aug 8 17:52:38 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Aug 8 17:52:38 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Tue Aug 8 17:52:38 2017 - [info] GTID ON
Tue Aug 8 17:52:38 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Tue Aug 8 17:52:38 2017 - [info] Current Alive Master: MHA-M1(10.180.2.163:3306)
Tue Aug 8 17:52:38 2017 - [info] Checking slave configurations..
Tue Aug 8 17:52:38 2017 - [info] Checking replication filtering settings..
Tue Aug 8 17:52:38 2017 - [info] binlog_do_db= , binlog_ignore_db=Tue Aug 8 17:52:38 2017 - [info] Replication filtering check ok.
Tue Aug 8 17:52:38 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Tue Aug 8 17:52:38 2017 - [info] Checking SSH publickey authentication settings on the current master..
Tue Aug 8 17:52:38 2017 - [info] HealthCheck: SSH to MHA-M1 is reachable.
Tue Aug 8 17:52:38 2017 - [info]
MHA-M1(10.180.2.163:3306) (current master) +--MHA-S1(10.180.2.164:3306) +--MHA-S2(10.180.2.165:3306)
Tue Aug 8 17:52:38 2017 - [warning] master_ip_failover_script is not defined.
Tue Aug 8 17:52:38 2017 - [warning] shutdown_script is not defined.
Tue Aug 8 17:52:38 2017 - [info] Set master ping interval 1 seconds.
Tue Aug 8 17:52:38 2017 - [info] Set secondary check script: /usr/local/bin/masterha_secondary_check -s MHA-S1 -s MHA-S2
Tue Aug 8 17:52:38 2017 - [info] Starting ping health check on MHA-M1(10.180.2.163:3306)..Tue Aug 8 17:52:38 2017 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn t respond..

配置 VIP

vip 配置可以采用兩種方式，一種通過 keepalived 的方式管理虛擬 ip 的浮動；另外一種通過腳本方式啟動虛擬 ip 的方式（即不需要 keepalived 或者 heartbeat 類似的軟件）。

這里僅演示使用腳本管理 VIP 的方式，修改 master_ip_failover 腳本，使用腳本管理 VIP

[root@MHA-M1 ~]# /sbin/ifconfig eth2:1 10.180.2.168/19

腳本：

[root@MHA-S2 bin]# cat master_ip_failover
#!/usr/bin/env perluse strict;
use warnings FATAL =   all 
use Getopt::Long;
my (
 $command, $ssh_user, $orig_master_host, $orig_master_ip,
 $orig_master_port, $new_master_host, $new_master_ip, $new_master_port
my $vip =  10.180.2.168/19 
my $key =  1 
my $ssh_start_vip =  /sbin/ifconfig eth2:$key $vip 
my $ssh_stop_vip =  /sbin/ifconfig eth2:$key down 
GetOptions(  command=s  =  \$command,  ssh_user=s  =  \$ssh_user,  orig_master_host=s  =  \$orig_master_host,  orig_master_ip=s  =  \$orig_master_ip,  orig_master_port=i  =  \$orig_master_port,  new_master_host=s  =  \$new_master_host,  new_master_ip=s  =  \$new_master_ip,  new_master_port=i  =  \$new_master_port,
exit  main();
sub main { print  \n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n  if ( $command eq  stop  || $command eq  stopssh  ) {
 my $exit_code = 1;
 eval { print  Disabling the VIP on old master: $orig_master_host \n   stop_vip();
 $exit_code = 0;
 }; if ($@) {
 warn  Got Error: $@\n 
 exit $exit_code;
 }
 exit $exit_code;
 }
 elsif ( $command eq  start  ) {
 my $exit_code = 10;
 eval { print  Enabling the VIP - $vip on the new master - $new_master_host \n   start_vip();
 $exit_code = 0;
 }; if ($@) {
 warn $@;
 exit $exit_code;
 }
 exit $exit_code;
 }
 elsif ( $command eq  status  ) {
 print  Checking the Status of the script.. OK \n 
 exit 0;
 } else {  usage();
 exit 1;
 }
sub start_vip() { `ssh $ssh_user\@$new_master_host \  $ssh_start_vip \}
sub stop_vip() { return 0 unless ($ssh_user);
 `ssh $ssh_user\@$orig_master_host \  $ssh_stop_vip \ }
sub usage { print  Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n}

在 app1.cnf 文件中取消剛剛對 master_ip_online_failover 的注釋并測試：

 再次檢查 MHA check
[root@MHA-S2 bin]# masterha_check_repl --conf=/etc/mha/app1.cnf 
Wed Aug 9 10:49:42 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Aug 9 10:49:42 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Wed Aug 9 10:49:42 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Wed Aug 9 10:49:42 2017 - [info] MHA::MasterMonitor version 0.57.
Wed Aug 9 10:49:43 2017 - [info] GTID failover mode = 1Wed Aug 9 10:49:43 2017 - [info] Dead Servers:
Wed Aug 9 10:49:43 2017 - [info] Alive Servers:
Wed Aug 9 10:49:43 2017 - [info] MHA-M1(10.180.2.163:3306)
Wed Aug 9 10:49:43 2017 - [info] MHA-S1(10.180.2.164:3306)
Wed Aug 9 10:49:43 2017 - [info] MHA-S2(10.180.2.165:3306)
Wed Aug 9 10:49:43 2017 - [info] Alive Slaves:
Wed Aug 9 10:49:43 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 10:49:43 2017 - [info] GTID ON
Wed Aug 9 10:49:43 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 10:49:43 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 9 10:49:43 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 10:49:43 2017 - [info] GTID ON
Wed Aug 9 10:49:43 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 10:49:43 2017 - [info] Current Alive Master: MHA-M1(10.180.2.163:3306)
Wed Aug 9 10:49:43 2017 - [info] Checking slave configurations..
Wed Aug 9 10:49:43 2017 - [info] Checking replication filtering settings..
Wed Aug 9 10:49:43 2017 - [info] binlog_do_db= , binlog_ignore_db= Wed Aug 9 10:49:43 2017 - [info] Replication filtering check ok.
Wed Aug 9 10:49:43 2017 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Wed Aug 9 10:49:43 2017 - [info] Checking SSH publickey authentication settings on the current master..
Wed Aug 9 10:49:43 2017 - [info] HealthCheck: SSH to MHA-M1 is reachable.
Wed Aug 9 10:49:43 2017 - [info] 
MHA-M1(10.180.2.163:3306) (current master) +--MHA-S1(10.180.2.164:3306) +--MHA-S2(10.180.2.165:3306)
Wed Aug 9 10:49:43 2017 - [info] Checking replication health on MHA-S1..
Wed Aug 9 10:49:43 2017 - [info] ok.
Wed Aug 9 10:49:43 2017 - [info] Checking replication health on MHA-S2..
Wed Aug 9 10:49:43 2017 - [info] ok.
Wed Aug 9 10:49:43 2017 - [info] Checking master_ip_failover_script status:
Wed Aug 9 10:49:43 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306

IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/19===

Checking the Status of the script.. OK

Wed Aug 9 10:49:43 2017 – [info] OK.
Wed Aug 9 10:49:43 2017 – [warning] shutdown_script is not defined.
Wed Aug 9 10:49:43 2017 – [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

以上就是 MHA 安裝配置的全過程，以下進行簡單的測試。

（1）failover 測試

手動 kill 了 master 上面的 mysqld 進程，查看切換狀態

[root@MHA-S2 tmp]# more manager.log 
Wed Aug 9 17:47:11 2017 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Wed Aug 9 17:47:11 2017 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s MHA-S1 -s MHA-S2 --user=root --master_host=MHA-M1 --master_ip=10.180.2.163 --master_port=3306 --master_user=root --master_password=123456 --ping_type=SELECT
Wed Aug 9 17:47:11 2017 - [info] Executing SSH check script: exit 0Wed Aug 9 17:47:11 2017 - [info] HealthCheck: SSH to MHA-M1 is reachable.
Monitoring server MHA-S1 is reachable, Master is not reachable from MHA-S1. OK.
Monitoring server MHA-S2 is reachable, Master is not reachable from MHA-S2. OK.
Wed Aug 9 17:47:11 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Wed Aug 9 17:47:12 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at  reading initial communication packet , system error: 111)
Wed Aug 9 17:47:12 2017 - [warning] Connection failed 2 time(s)..
Wed Aug 9 17:47:13 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at  reading initial communication packet , system error: 111)
Wed Aug 9 17:47:13 2017 - [warning] Connection failed 3 time(s)..
Wed Aug 9 17:47:14 2017 - [warning] Got error on MySQL connect: 2013 (Lost connection to MySQL server at  reading initial communication packet , system error: 111)
Wed Aug 9 17:47:14 2017 - [warning] Connection failed 4 time(s)..
Wed Aug 9 17:47:14 2017 - [warning] Master is not reachable from health checker!Wed Aug 9 17:47:14 2017 - [warning] Master MHA-M1(10.180.2.163:3306) is not reachable!Wed Aug 9 17:47:14 2017 - [warning] SSH is reachable.
Wed Aug 9 17:47:14 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/app1.cnf again, and tryin
g to connect to all servers to check server status..
Wed Aug 9 17:47:14 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Aug 9 17:47:14 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Wed Aug 9 17:47:14 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Wed Aug 9 17:47:14 2017 - [info] GTID failover mode = 1Wed Aug 9 17:47:14 2017 - [info] Dead Servers:
Wed Aug 9 17:47:14 2017 - [info] MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Alive Servers:
Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306)
Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306)
Wed Aug 9 17:47:14 2017 - [info] Alive Slaves:
Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Checking slave configurations..
Wed Aug 9 17:47:14 2017 - [info] Checking replication filtering settings..
Wed Aug 9 17:47:14 2017 - [info] Replication filtering check ok.
Wed Aug 9 17:47:14 2017 - [info] Master is down!Wed Aug 9 17:47:14 2017 - [info] Terminating monitoring script.
Wed Aug 9 17:47:14 2017 - [info] Got exit code 20 (Master dead).
Wed Aug 9 17:47:14 2017 - [info] MHA::MasterFailover version 0.57.
Wed Aug 9 17:47:14 2017 - [info] Starting master failover.
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 1: Configuration Check Phase..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] GTID failover mode = 1Wed Aug 9 17:47:14 2017 - [info] Dead Servers:
Wed Aug 9 17:47:14 2017 - [info] MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Checking master reachability via MySQL(double check)...
Wed Aug 9 17:47:14 2017 - [info] ok.
Wed Aug 9 17:47:14 2017 - [info] Alive Servers:
Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306)
Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306)
Wed Aug 9 17:47:14 2017 - [info] Alive Slaves:
Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Starting GTID based failover.
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Aug 9 17:47:14 2017 - [info] Executing master IP deactivation script:
Wed Aug 9 17:47:14 2017 - [info] /usr/local/bin/master_ip_failover --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306 --command=stopssh --ssh_user=root 
IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/24===Disabling the VIP on old master: MHA-M1 
SIOCSIFFLAGS: Cannot assign requested address
Wed Aug 9 17:47:14 2017 - [info] done.
Wed Aug 9 17:47:14 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Aug 9 17:47:14 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 3: Master Recovery Phase..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] The latest binary log file/position on all slaves is 3306-binlog.000003:194Wed Aug 9 17:47:14 2017 - [info] Retrieved Gtid Set: a5757eae-7981-11e7-82c7-005056b662d3:6-32210Wed Aug 9 17:47:14 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] The oldest binary log file/position on all slaves is 3306-binlog.000003:194Wed Aug 9 17:47:14 2017 - [info] Retrieved Gtid Set: a5757eae-7981-11e7-82c7-005056b662d3:6-32210Wed Aug 9 17:47:14 2017 - [info] Oldest slaves:
Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 9 17:47:14 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 3.3: Determining New Master Phase..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] Searching new master from slaves..
Wed Aug 9 17:47:14 2017 - [info] Candidate masters from the configuration file:
Wed Aug 9 17:47:14 2017 - [info] MHA-S1(10.180.2.164:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Wed Aug 9 17:47:14 2017 - [info] GTID ON
Wed Aug 9 17:47:14 2017 - [info] Replicating from MHA-M1(10.180.2.163:3306)
Wed Aug 9 17:47:14 2017 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Aug 9 17:47:14 2017 - [info] Non-candidate masters:
Wed Aug 9 17:47:14 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Aug 9 17:47:14 2017 - [info] New master is MHA-S1(10.180.2.164:3306)
Wed Aug 9 17:47:14 2017 - [info] Starting master failover..
Wed Aug 9 17:47:14 2017 - [info] 
From:
MHA-M1(10.180.2.163:3306) (current master) +--MHA-S1(10.180.2.164:3306) +--MHA-S2(10.180.2.165:3306)
MHA-S1(10.180.2.164:3306) (new master) +--MHA-S2(10.180.2.165:3306)
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 3.3: New Master Recovery Phase..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] Waiting all logs to be applied.. 
Wed Aug 9 17:47:14 2017 - [info] done.
Wed Aug 9 17:47:14 2017 - [info] Getting new master s binlog name and position..Wed Aug 9 17:47:14 2017 - [info] 3306-binlog.000003:61944788Wed Aug 9 17:47:14 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST= MHA-S1 or 10.180.2.164 , MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MAS
TER_USER= repl , MASTER_PASSWORD= xxx 
Wed Aug 9 17:47:14 2017 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: 3306-binlog.000003, 61944788, 1c2dc99f-7b57-11e7-a280-005056b665cb:1-2,
a5757eae-7981-11e7-82c7-005056b662d3:1-32210Wed Aug 9 17:47:14 2017 - [info] Executing master IP activate script:
Wed Aug 9 17:47:14 2017 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=MHA-M1 --orig_master_ip=10.180.2.163 --orig_master_port=3306 --new_master_host=MHA-S1 --new_master_ip=10.180.2.164 --new_master_port=3306 --new_master_user= root  --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/24===Enabling the VIP - 10.180.2.168/24 on the new master - MHA-S1 
Wed Aug 9 17:47:14 2017 - [info] OK.
Wed Aug 9 17:47:14 2017 - [info] Setting read_only=0 on MHA-S1(10.180.2.164:3306)..
Wed Aug 9 17:47:14 2017 - [info] ok.
Wed Aug 9 17:47:14 2017 - [info] ** Finished master recovery successfully.
Wed Aug 9 17:47:14 2017 - [info] * Phase 3: Master Recovery Phase completed.
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 4: Slaves Recovery Phase..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] * Phase 4.1: Starting Slaves in parallel..
Wed Aug 9 17:47:14 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] -- Slave recovery on host MHA-S2(10.180.2.165:3306) started, pid: 18757. Check tmp log /var/log/masterha/app1/MHA-S2_3306_20170809174714.log if it takes time..
Wed Aug 9 17:47:15 2017 - [info] 
Wed Aug 9 17:47:15 2017 - [info] Log messages from MHA-S2 ...
Wed Aug 9 17:47:15 2017 - [info] 
Wed Aug 9 17:47:14 2017 - [info] Resetting slave MHA-S2(10.180.2.165:3306) and starting replication from the new master MHA-S1(10.180.2.164:3306)..
Wed Aug 9 17:47:14 2017 - [info] Executed CHANGE MASTER.
Wed Aug 9 17:47:15 2017 - [info] Slave started.
Wed Aug 9 17:47:15 2017 - [info] gtid_wait(1c2dc99f-7b57-11e7-a280-005056b665cb:1-2,
a5757eae-7981-11e7-82c7-005056b662d3:1-32210) completed on MHA-S2(10.180.2.165:3306). Executed 0 events.
Wed Aug 9 17:47:15 2017 - [info] End of log messages from MHA-S2.
Wed Aug 9 17:47:15 2017 - [info] -- Slave on host MHA-S2(10.180.2.165:3306) started.
Wed Aug 9 17:47:15 2017 - [info] All new slave servers recovered successfully.
Wed Aug 9 17:47:15 2017 - [info] 
Wed Aug 9 17:47:15 2017 - [info] * Phase 5: New master cleanup phase..
Wed Aug 9 17:47:15 2017 - [info] 
Wed Aug 9 17:47:15 2017 - [info] Resetting slave info on the new master..
Wed Aug 9 17:47:15 2017 - [info] MHA-S1: Resetting slave info succeeded.
Wed Aug 9 17:47:15 2017 - [info] Master failover to MHA-S1(10.180.2.164:3306) completed successfully.
Wed Aug 9 17:47:15 2017 - [info] Deleted server1 entry from /etc/mha/app1.cnf .
Wed Aug 9 17:47:15 2017 - [info] 
----- Failover Report -----app1: MySQL Master failover MHA-M1(10.180.2.163:3306) to MHA-S1(10.180.2.164:3306) succeeded
Master MHA-M1(10.180.2.163:3306) is down!Check MHA Manager logs at MHA-S2:/var/log/masterha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on MHA-M1(10.180.2.163:3306)
Selected MHA-S1(10.180.2.164:3306) as a new master.
MHA-S1(10.180.2.164:3306): OK: Applying all logs succeeded.
MHA-S1(10.180.2.164:3306): OK: Activated master IP address.
MHA-S2(10.180.2.165:3306): OK: Slave started, replicating from MHA-S1(10.180.2.164:3306)
MHA-S1(10.180.2.164:3306): Resetting slave info succeeded.
Master failover to MHA-S1(10.180.2.164:3306) completed successfully.
Wed Aug 9 17:47:15 2017 - [info] Sending mail..
Unknown option: conf

以上是切換的全日志過程，我們可以看到 MHA 切換主要經歷以下步驟：

1. 配置文件檢查階段，這個階段會檢查整個集群配置文件配置

2. 宕機的 master 處理，這個階段包括虛擬 ip 摘除操作，主機關機操作（這個我這里還沒有實現，需要研究）

3. 復制 dead maste 和最新 slave 相差的 relay log，并保存到 MHA Manger 具體的目錄下

4. 識別含有最新更新的 slave

5. 應用從 master 保存的二進制日志事件（binlog events）

6. 提升一個 slave 為新的 master 進行復制

7. 使其他的 slave 連接新的 master 進行復制

注意：

1. 切換完之后你會發現 MHA Manager 監控程序會自動死掉，官網有如下解釋和解決方式：

Running MHA Manager from daemontoolsCurrently MHA Manager process does not run as a daemon. If failover completed successfully or the master process was killed by accident, the manager stops working. To run as a daemon, daemontool. or any external daemon program can be used. Here is an example to run from daemontools.

這里我們用 shell 腳本的方式去執行就不會發生監控程序死掉的情況

[root@MHA-S2 bin]# more manager.sh #!/bin/shnohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover   /dev/null   /var/log/masterha/app1/manager.log 2 1

2. 當你修復完死掉的 master 想重新加入先有的兩節點 MHA 也是可以的

舊 Master：

root@localhost:mysql3306.sock [tt] show master status\G*************************** 1. row ***************************
 File: 3306-binlog.000004
 Position: 194
 Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: a5757eae-7981-11e7-82c7-005056b662d3:1-322101 row in set (0.00 sec)

現有 master：

root@localhost:mysql3306.sock [tt] show master status\G*************************** 1. row ***************************
 File: 3306-binlog.000003
 Position: 61945043
 Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 1c2dc99f-7b57-11e7-a280-005056b665cb:1-3,
a5757eae-7981-11e7-82c7-005056b662d3:1-322101 row in set (0.00 sec)

由于有 GTID，我們可以直接就 change master 切換過去，先對比一下數據：

舊 master：

root@localhost:mysql3306.sock [tt] select * from t1;+----+------+
| id | c1 |
+----+------+
| 1 | a1 |
| 2 | a2 |
| 3 | a3 |
| 4 | a4 |
+----+------+4 rows in set (0.02 sec)

新 master：

root@localhost:mysql3306.sock [tt] select * from t1;+----+------+
| id | c1 |
+----+------+
| 1 | a1 |
| 2 | a2 |
| 3 | a3 |
| 4 | a4 |
| 5 | a5 |+----+------+

舊 master 直接 change master to：

change master to master_host= MHA-S1 ,master_user= repl ,master_password= 123456 ,master_port=3306,master_auto_position=1;

start slave 看輸出：

root@localhost:mysql3306.sock [tt] show slave status\G 
*************************** 1. row *************************** 
 Slave_IO_State: Waiting for master to send event 
 Master_Host: MHA-S1 
 Master_User: repl 
 Master_Port: 3306 
 Connect_Retry: 60 
 Master_Log_File: 3306-binlog.000003 
 Read_Master_Log_Pos: 61945043 
 Relay_Log_File: MHA-M1-relay-bin.000004 
 Relay_Log_Pos: 715 
 Relay_Master_Log_File: 3306-binlog.000003 
 Slave_IO_Running: Yes 
 Slave_SQL_Running: Yes

看是否會補全數據：

root@localhost:mysql3306.sock [tt] select * from t1;+----+------+
| id | c1 |
+----+------+
| 1 | a1 |
| 2 | a2 |
| 3 | a3 |
| 4 | a4 |
| 5 | a5 |
+----+------+

發現數據補全了，加入復制沒問題。

最后還得修改 app1.cnf 把 server1 補上

[server1]hostname=MHA-M1
port=3306

重啟監控程序并查看 MHA 狀態

[root@MHA-S2 tmp]# masterha_check_repl --conf=/etc/mha/app1.cnf 
 Sat Aug 12 20:37:01 2017 - [info] Replication filtering check ok.
 Sat Aug 12 20:37:01 2017 - [error][/usr/local/share/perl5/MHA/Server.pm, ln398] MHA-M1(10.180.2.163:3306): User repl does not exist or does not have REPLICATION SLAVE privilege! Other slaves can not start replication from this host.
 Sat Aug 12 20:37:01 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/local/share/perl5/MHA/ServerManager.pm line 1403
 Sat Aug 12 20:37:01 2017 - [error][/usr/local/share/perl5/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
 Sat Aug 12 20:37:01 2017 - [info] Got exit code 1 (Not master dead).

發現權限有問題，趕緊修復一下：

MHA-M1：

set session sql_log_bin=OFF;
grant replication slave on *.* to repl@ %  identified by  123456 
set session sql_log_bin=ON;

再次執行 MHA 狀態檢查：

masterha_check_repl --conf=/etc/mha/app1.cnf
Sat Aug 12 20:41:14 2017 - [info] Checking replication health on MHA-M1..
Sat Aug 12 20:41:14 2017 - [info] ok.
Sat Aug 12 20:41:14 2017 - [info] Checking replication health on MHA-S2..
Sat Aug 12 20:41:14 2017 - [info] ok.
Sat Aug 12 20:41:14 2017 - [info] Checking master_ip_failover_script status:
Sat Aug 12 20:41:14 2017 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=MHA-S1 --orig_master_ip=10.180.2.164 --orig_master_port=3306 IN SCRIPT TEST====/sbin/ifconfig eth2:1 down==/sbin/ifconfig eth2:1 10.180.2.168/24===Checking the Status of the script.. OK 
Sat Aug 12 20:41:15 2017 - [info] OK.
Sat Aug 12 20:41:15 2017 - [warning] shutdown_script is not defined.
Sat Aug 12 20:41:15 2017 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.

最后啟動監控程序

[root@MHA-S2 bin]# nohup monitor.sh  [root@MHA-S2 bin]# masterha_check_status --conf=/etc/mha/app1.cnf 
app1 (pid:32084) is running(0:PING_OK), master:MHA-S1

（2) 手動在線切換測試

在許多情況下，需要將現有的主服務器遷移到另外一臺服務器上。比如主服務器硬件故障，RAID 控制卡需要重建，將主服務器移到性能更好的服務器上等等。維護主服務器引起性能下降，導致停機時間至少無法寫入數據。另外，阻塞或殺掉當前運行的會話會導致主主之間數據不一致的問題發生。MHA 提供快速切換和優雅的阻塞寫入，這個切換過程只需要 0.5-2s 的時間，這段時間內數據是無法寫入的。在很多情況下，0.5-2s 的阻塞寫入是可以接受的。因此切換主服務器不需要計劃分配維護時間窗口。

MHA 在線切換的大概過程：
1. 檢測復制設置和確定當前主服務器
2. 確定新的主服務器
3. 阻塞寫入到當前主服務器
4. 等待所有從服務器趕上復制
5. 授予寫入到新的主服務器
6. 重新設置從服務器

注意，在線切換的時候應用架構需要考慮以下兩個問題：

1. 自動識別 master 和 slave 的問題（master 的機器可能會切換），如果采用了 vip 的方式，基本可以解決這個問題。

2. 負載均衡的問題（可以定義大概的讀寫比例，每臺機器可承擔的負載比例，當有機器離開集群時，需要考慮這個問題）

為了保證數據完全一致性，在最快的時間內完成切換，MHA 的在線切換必須滿足以下條件才會切換成功，否則會切換失敗。

1. 所有 slave 的 IO 線程都在運行

2. 所有 slave 的 SQL 線程都在運行

3. 所有的 show slave status 的輸出中 Seconds_Behind_Master 參數小于或者等于 running_updates_limit 秒，如果在切換過程中不指定 running_updates_limit, 那么默認情況下 running_updates_limit 為 1 秒。

4. 在 master 端，通過 show processlist 輸出，沒有一個更新花費的時間大于 running_updates_limit 秒。

在線切換步驟如下：

先停止監控程序

[root@MHA-S2 app1]# masterha_stop --conf=/etc/mha/app1.cnf 
Stopped app1 successfully.

修改 master_ip_online_change 腳本如下：

[root@MHA-S2 bin]# more master_ip_online_change
#!/usr/bin/env perl# Copyright (C) 2011 DeNA Co.,Ltd.
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL =   all 
use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;
my $_tstart;
my $_running_interval = 0.1;
my (
 $command, $orig_master_host, $orig_master_ip,
 $orig_master_port, $orig_master_user, 
 $new_master_host, $new_master_ip, $new_master_port,
 $new_master_user, 
my $vip =  10.180.2.168/19  # Virtual IP 
my $key =  1  
my $ssh_start_vip =  /sbin/ifconfig eth2:$key $vip 
my $ssh_stop_vip =  /sbin/ifconfig eth2:$key down 
my $ssh_user =  root 
my $new_master_password= 123456 
my $orig_master_password= 123456 
GetOptions(  command=s  =  \$command,
 # ssh_user=s  =  \$ssh_user, 
  orig_master_host=s  =  \$orig_master_host,  orig_master_ip=s  =  \$orig_master_ip,  orig_master_port=i  =  \$orig_master_port,  orig_master_user=s  =  \$orig_master_user,
 # orig_master_password=s  =  \$orig_master_password,  new_master_host=s  =  \$new_master_host,  new_master_ip=s  =  \$new_master_ip,  new_master_port=i  =  \$new_master_port,  new_master_user=s  =  \$new_master_user,
 # new_master_password=s  =  \$new_master_password,
exit  main();
sub current_time_us { my ( $sec, $microsec ) = gettimeofday();
 my $curdate = localtime($sec);
 return $curdate .     . sprintf(  %06d , $microsec );
sub sleep_until { my $elapsed = tv_interval($_tstart); if ( $_running_interval   $elapsed ) { sleep( $_running_interval - $elapsed );
 }
sub get_threads_util {
 my $dbh = shift;
 my $my_connection_id = shift;
 my $running_time_threshold = shift;
 my $type = shift;
 $running_time_threshold = 0 unless ($running_time_threshold);
 $type = 0 unless ($type);
 my @threads;
 my $sth = $dbh- prepare( SHOW PROCESSLIST 
 $sth- execute(); while ( my $ref = $sth- fetchrow_hashref() ) { my $id = $ref- {Id};
 my $user = $ref- {User};
 my $host = $ref- {Host};
 my $command = $ref- {Command};
 my $state = $ref- {State};
 my $query_time = $ref- {Time};
 my $info = $ref- {Info};
 $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
 next if ( $my_connection_id == $id );
 next if ( defined($query_time)   $query_time   $running_time_threshold );
 next if ( defined($command)   $command eq  Binlog Dump  );
 next if ( defined($user)   $user eq  system user  );
 next if ( defined($command)   $command eq  Sleep 
   defined($query_time)   $query_time  = 1 ); if ( $type  = 1 ) { next if ( defined($command)   $command eq  Sleep  );
 next if ( defined($command)   $command eq  Connect  );
 } if ( $type  = 2 ) { next if ( defined($info)   $info =~ m/^select/i );
 next if ( defined($info)   $info =~ m/^show/i );
 }
 push @threads, $ref;
 }
 return @threads;
sub main { if ( $command eq  stop  ) {
 ## Gracefully killing connections on the current master
 # 1. Set read_only= 1 on the new master
 # 2. DROP USER so that no app user can establish new connections
 # 3. Set read_only= 1 on the current master
 # 4. Kill current queries
 # * Any database access failure will result in script die.
 my $exit_code = 1;
 eval { ## Setting read_only=1 on the new master (to avoid accident)
 my $new_master_handler = new MHA::DBHelper();
 # args: hostname, port, user, password, raise_error(die_on_error)_or_not
 $new_master_handler- connect( $new_master_ip, $new_master_port,
 $new_master_user, $new_master_password, 1 );
 print current_time_us() .   Set read_only on the new master..  
 $new_master_handler- enable_read_only(); if ( $new_master_handler- is_read_only() ) {
 print  ok.\n 
 } else {
 die  Failed!\n 
 }
 $new_master_handler- disconnect();
 # Connecting to the orig master, die if any database error happens
 my $orig_master_handler = new MHA::DBHelper();
 $orig_master_handler- connect( $orig_master_ip, $orig_master_port,
 $orig_master_user, $orig_master_password, 1 );
 ## Drop application user so that nobody can connect. Disabling per-session binlog beforehand
 #$orig_master_handler- disable_log_bin_local();
 #print current_time_us() .   Drpping app user on the orig master..\n 
 #FIXME_xxx_drop_app_user($orig_master_handler);
 ## Waiting for N * 100 milliseconds so that current connections can exit
 my $time_until_read_only = 15;
 $_tstart = [gettimeofday];
 my @threads = get_threads_util( $orig_master_handler- {dbh},
 $orig_master_handler- {connection_id} ); while ( $time_until_read_only   0   $#threads  = 0 ) { if ( $time_until_read_only % 5 == 0 ) { printf %s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n ,
 current_time_us(), $#threads + 1, $time_until_read_only * 100; if ( $#threads   5 ) { print Data::Dumper- new( [$_] )- Indent(0)- Terse(1)- Dump .  \n 
 foreach (@threads);
 }
 }
 sleep_until();
 $_tstart = [gettimeofday];
 $time_until_read_only--;
 @threads = get_threads_util( $orig_master_handler- {dbh},
 $orig_master_handler- {connection_id} );
 }
 ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
 print current_time_us() .   Set read_only=1 on the orig master..  
 $orig_master_handler- enable_read_only(); if ( $orig_master_handler- is_read_only() ) {
 print  ok.\n 
 } else {
 die  Failed!\n 
 }
 ## Waiting for M * 100 milliseconds so that current update queries can complete
 my $time_until_kill_threads = 5;
 @threads = get_threads_util( $orig_master_handler- {dbh},
 $orig_master_handler- {connection_id} ); while ( $time_until_kill_threads   0   $#threads  = 0 ) { if ( $time_until_kill_threads % 5 == 0 ) { printf %s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n ,
 current_time_us(), $#threads + 1, $time_until_kill_threads * 100; if ( $#threads   5 ) { print Data::Dumper- new( [$_] )- Indent(0)- Terse(1)- Dump .  \n 
 foreach (@threads);
 }
 }
 sleep_until();
 $_tstart = [gettimeofday];
 $time_until_kill_threads--;
 @threads = get_threads_util( $orig_master_handler- {dbh},
 $orig_master_handler- {connection_id} );
 }
 print  Disabling the VIP on old master: $orig_master_host \n   stop_vip(); 
 ## Terminating all threads
 print current_time_us() .   Killing all application threads..\n 
 $orig_master_handler- kill_threads(@threads) if ( $#threads  = 0 );
 print current_time_us() .   done.\n 
 #$orig_master_handler- enable_log_bin_local();
 $orig_master_handler- disconnect();
 ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK
 $exit_code = 0;
 }; if ($@) {
 warn  Got Error: $@\n 
 exit $exit_code;
 }
 exit $exit_code;
 }
 elsif ( $command eq  start  ) {
 ## Activating master ip on the new master
 # 1. Create app user with write privileges
 # 2. Moving backup script if needed
 # 3. Register new master s ip to the catalog database# We don t return error even though activating updatable accounts/ip failed so that we don t interrupt slaves  recovery.# If exit code is 0 or 10, MHA does not abort
 my $exit_code = 10;
 eval { my $new_master_handler = new MHA::DBHelper();
 # args: hostname, port, user, password, raise_error_or_not
 $new_master_handler- connect( $new_master_ip, $new_master_port,
 $new_master_user, $new_master_password, 1 );
 ## Set read_only=0 on the new master
 #$new_master_handler- disable_log_bin_local();
 print current_time_us() .   Set read_only=0 on the new master.\n 
 $new_master_handler- disable_read_only();
 ## Creating an app user on the new master
 #print current_time_us() .   Creating app user on the new master..\n 
 #FIXME_xxx_create_app_user($new_master_handler);
 #$new_master_handler- enable_log_bin_local();
 $new_master_handler- disconnect();
 ## Update master ip on the catalog database, etc
 print  Enabling the VIP - $vip on the new master - $new_master_host \n   start_vip();
 $exit_code = 0;
 }; if ($@) {
 warn  Got Error: $@\n 
 exit $exit_code;
 }
 exit $exit_code;
 }
 elsif ( $command eq  status  ) {
 # do nothing
 exit 0;
 } else {  usage();
 exit 1;
 }
# A simple system call that enable the VIP on the new master 
sub start_vip() { `ssh $ssh_user\@$new_master_host \  $ssh_start_vip \}
# A simple system call that disable the VIP on the old_master
sub stop_vip() { `ssh $ssh_user\@$orig_master_host \  $ssh_stop_vip \}
sub usage {
 print Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n 
 die;
}

執行切換

[root@MHA-S2 tmp]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=MHA-M1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000

其中參數的意思：

–orig_master_is_new_slave 切換時加上此參數是將原 master 變為 slave 節點，如果不加此參數，原來的 master 將不啟動

–running_updates_limit=10000, 故障切換時, 候選 master 如果有延遲的話，mha 切換不能成功，加上此參數表示延遲在此時間范圍內都可切換（單位為 s），但是切換的時間長短是由 recover 時 relay 日志的大小決定

查看切換后各機器的狀態：

S2：

root@localhost:mysql3306.sock [tt] show slave status\G 
*************************** 1. row *************************** 
 Slave_IO_State: Waiting for master to send event 
 Master_Host: MHA-M1 
 Master_User: repl 
 Master_Port: 3306 
 Connect_Retry: 60 
 Master_Log_File: 3306-binlog.000004 
 Read_Master_Log_Pos: 748 
 Relay_Log_File: MHA-S2-relay-bin.000002 
 Relay_Log_Pos: 420 
 Relay_Master_Log_File: 3306-binlog.000004 
 Slave_IO_Running: Yes 
 Slave_SQL_Running: Yes

S1：

root@localhost:mysql3306.sock [tt] show slave status\G*************************** 1. row ***************************
 Slave_IO_State: Waiting for master to send event
 Master_Host: MHA-M1
 Master_User: repl
 Master_Port: 3306
 Connect_Retry: 60
 Master_Log_File: 3306-binlog.000004
 Read_Master_Log_Pos: 748
 Relay_Log_File: MHA-S1-relay-bin.000002
 Relay_Log_Pos: 420
 Relay_Master_Log_File: 3306-binlog.000004
 Slave_IO_Running: Yes
 Slave_SQL_Running: Yes

M1：

root@localhost:mysql3306.sock [tt] show slave status\G
Empty set (0.00 sec)

在線切換的日志：

[root@MHA-S2 tmp]# more sw.log 
[root@MHA-S2 bin]# masterha_master_switch --conf=/etc/mha/app1.cnf --master_state=alive --new_master_host=MHA-M1 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=10000Sat Aug 12 21:34:54 2017 - [info] MHA::MasterRotate version 0.57.
Sat Aug 12 21:34:54 2017 - [info] Starting online master switch..
Sat Aug 12 21:34:54 2017 - [info] 
Sat Aug 12 21:34:54 2017 - [info] * Phase 1: Configuration Check Phase..
Sat Aug 12 21:34:54 2017 - [info] 
Sat Aug 12 21:34:54 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sat Aug 12 21:34:54 2017 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Sat Aug 12 21:34:54 2017 - [info] Reading server configuration from /etc/mha/app1.cnf..
Sat Aug 12 21:34:54 2017 - [info] GTID failover mode = 1Sat Aug 12 21:34:54 2017 - [info] Current Alive Master: MHA-S1(10.180.2.164:3306)
Sat Aug 12 21:34:54 2017 - [info] Alive Slaves:
Sat Aug 12 21:34:54 2017 - [info] MHA-M1(10.180.2.163:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Sat Aug 12 21:34:54 2017 - [info] GTID ON
Sat Aug 12 21:34:54 2017 - [info] Replicating from MHA-S1(10.180.2.164:3306)
Sat Aug 12 21:34:54 2017 - [info] MHA-S2(10.180.2.165:3306) Version=5.7.18-log (oldest major version between slaves) log-bin:enabled
Sat Aug 12 21:34:54 2017 - [info] GTID ON
Sat Aug 12 21:34:54 2017 - [info] Replicating from MHA-S1(10.180.2.164:3306)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on MHA-S1(10.180.2.164:3306)? (YES/no): yes
Sat Aug 12 21:35:07 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Sat Aug 12 21:35:07 2017 - [info] ok.
Sat Aug 12 21:35:07 2017 - [info] Checking MHA is not monitoring or doing failover..
Sat Aug 12 21:35:07 2017 - [info] Checking replication health on MHA-M1..
Sat Aug 12 21:35:07 2017 - [info] ok.
Sat Aug 12 21:35:07 2017 - [info] Checking replication health on MHA-S2..
Sat Aug 12 21:35:07 2017 - [info] ok.
Sat Aug 12 21:35:07 2017 - [info] MHA-M1 can be new master.
Sat Aug 12 21:35:07 2017 - [info] 
From:
MHA-S1(10.180.2.164:3306) (current master) +--MHA-M1(10.180.2.163:3306) +--MHA-S2(10.180.2.165:3306)
MHA-M1(10.180.2.163:3306) (new master) +--MHA-S2(10.180.2.165:3306) +--MHA-S1(10.180.2.164:3306)
Starting master switch from MHA-S1(10.180.2.164:3306) to MHA-M1(10.180.2.163:3306)? (yes/NO): yes
Sat Aug 12 21:35:15 2017 - [info] Checking whether MHA-M1(10.180.2.163:3306) is ok for the new master..
Sat Aug 12 21:35:15 2017 - [info] ok.
Sat Aug 12 21:35:15 2017 - [info] MHA-S1(10.180.2.164:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Sat Aug 12 21:35:15 2017 - [info] MHA-S1(10.180.2.164:3306): Resetting slave pointing to the dummy host.
Sat Aug 12 21:35:15 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Sat Aug 12 21:35:15 2017 - [info] 
Sat Aug 12 21:35:15 2017 - [info] * Phase 2: Rejecting updates Phase..
Sat Aug 12 21:35:15 2017 - [info] 
Sat Aug 12 21:35:15 2017 - [info] Executing master ip online change script to disable write on the current master:
Sat Aug 12 21:35:15 2017 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=MHA-S1 --orig_master_ip=10.180.2.164 --orig_master_port=3306 --orig_master_user= root  --new_master_
host=MHA-M1 --new_master_ip=10.180.2.163 --new_master_port=3306 --new_master_user= root  --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_mas
ter_password=xxx
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user
Unknown option: orig_master_is_new_slave
Unknown option: orig_master_password
Unknown option: new_master_password
Sat Aug 12 21:35:15 2017 568580 Set read_only on the new master.. ok.
Sat Aug 12 21:35:15 2017 573508 Waiting all running 2 threads are disconnected.. (max 1500 milliseconds)
{Time  =   272878 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   40 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-S2:46970}{Time  =   3738 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   55 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-M1:51506}
Sat Aug 12 21:35:16 2017 075020 Waiting all running 2 threads are disconnected.. (max 1000 milliseconds)
{Time  =   272879 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   40 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-S2:46970}{Time  =   3739 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   55 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-M1:51506}
Sat Aug 12 21:35:16 2017 576059 Waiting all running 2 threads are disconnected.. (max 500 milliseconds)
{Time  =   272879 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   40 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-S2:46970}{Time  =   3739 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   55 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-M1:51506}
Sat Aug 12 21:35:17 2017 076940 Set read_only=1 on the orig master.. ok.
Sat Aug 12 21:35:17 2017 079645 Waiting all running 2 queries are disconnected.. (max 500 milliseconds)
{Time  =   272880 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   40 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-S2:46970}{Time  =   3740 , Command  =   Binlog Dump GTID , db  =  undef, Id  =   55 , Info  =  undef, User  =   repl , State  =   Master has sent all binlog to slave; waiting for more updates , Host  =   MHA-M1:51506}
Disabling the VIP on old master: MHA-S1 
Sat Aug 12 21:35:17 2017 683769 Killing all application threads..
Sat Aug 12 21:35:17 2017 686090 done.
Sat Aug 12 21:35:17 2017 - [info] ok.
Sat Aug 12 21:35:17 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Sat Aug 12 21:35:17 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Sat Aug 12 21:35:17 2017 - [info] ok.
Sat Aug 12 21:35:17 2017 - [info] Orig master binlog:pos is 3306-binlog.000003:61945043.
Sat Aug 12 21:35:17 2017 - [info] Waiting to execute all relay logs on MHA-M1(10.180.2.163:3306)..
Sat Aug 12 21:35:17 2017 - [info] master_pos_wait(3306-binlog.000003:61945043) completed on MHA-M1(10.180.2.163:3306). Executed 0 events.
Sat Aug 12 21:35:17 2017 - [info] done.
Sat Aug 12 21:35:17 2017 - [info] Getting new master s binlog name and position..Sat Aug 12 21:35:17 2017 - [info] 3306-binlog.000004:748Sat Aug 12 21:35:17 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST= MHA-M1 or 10.180.2.163 , MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MAS
TER_USER= repl , MASTER_PASSWORD= xxx 
Sat Aug 12 21:35:17 2017 - [info] Executing master ip online change script to allow write on the new master:
Sat Aug 12 21:35:17 2017 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=MHA-S1 --orig_master_ip=10.180.2.164 --orig_master_port=3306 --orig_master_user= root  --new_master
_host=MHA-M1 --new_master_ip=10.180.2.163 --new_master_port=3306 --new_master_user= root  --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_ma
ster_password=xxx
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user
Unknown option: orig_master_is_new_slave
Unknown option: orig_master_password
Unknown option: new_master_password
Sat Aug 12 21:35:17 2017 865209 Set read_only=0 on the new master.
Enabling the VIP - 10.180.2.168/19 on the new master - MHA-M1 
Sat Aug 12 21:35:17 2017 - [info] ok.
Sat Aug 12 21:35:17 2017 - [info] 
Sat Aug 12 21:35:17 2017 - [info] * Switching slaves in parallel..
Sat Aug 12 21:35:17 2017 - [info] 
Sat Aug 12 21:35:17 2017 - [info] -- Slave switch on host MHA-S2(10.180.2.165:3306) started, pid: 2327Sat Aug 12 21:35:17 2017 - [info] 
Sat Aug 12 21:35:18 2017 - [info] Log messages from MHA-S2 ...
Sat Aug 12 21:35:18 2017 - [info] 
Sat Aug 12 21:35:18 2017 - [info] Waiting to execute all relay logs on MHA-S2(10.180.2.165:3306)..
Sat Aug 12 21:35:18 2017 - [info] master_pos_wait(3306-binlog.000003:61945043) completed on MHA-S2(10.180.2.165:3306). Executed 0 events.
Sat Aug 12 21:35:18 2017 - [info] done.
Sat Aug 12 21:35:18 2017 - [info] Resetting slave MHA-S2(10.180.2.165:3306) and starting replication from the new master MHA-M1(10.180.2.163:3306)..
Sat Aug 12 21:35:18 2017 - [info] Executed CHANGE MASTER.
Sat Aug 12 21:35:18 2017 - [info] Slave started.
Sat Aug 12 21:35:18 2017 - [info] End of log messages from MHA-S2 ...
Sat Aug 12 21:35:18 2017 - [info] 
Sat Aug 12 21:35:18 2017 - [info] -- Slave switch on host MHA-S2(10.180.2.165:3306) succeeded.
Sat Aug 12 21:35:18 2017 - [info] Unlocking all tables on the orig master:
Sat Aug 12 21:35:18 2017 - [info] Executing UNLOCK TABLES..
Sat Aug 12 21:35:18 2017 - [info] ok.
Sat Aug 12 21:35:18 2017 - [info] Starting orig master as a new slave..
Sat Aug 12 21:35:18 2017 - [info] Resetting slave MHA-S1(10.180.2.164:3306) and starting replication from the new master MHA-M1(10.180.2.163:3306)..
Sat Aug 12 21:35:18 2017 - [info] Executed CHANGE MASTER.
Sat Aug 12 21:35:19 2017 - [info] Slave started.
Sat Aug 12 21:35:19 2017 - [info] All new slave servers switched successfully.
Sat Aug 12 21:35:19 2017 - [info] 
Sat Aug 12 21:35:19 2017 - [info] * Phase 5: New master cleanup phase..
Sat Aug 12 21:35:19 2017 - [info] 
Sat Aug 12 21:35:19 2017 - [info] MHA-M1: Resetting slave info succeeded.
Sat Aug 12 21:35:19 2017 - [info] Switching master to MHA-M1(10.180.2.163:3306) completed successfully.

感謝你能夠認真閱讀完這篇文章，希望丸趣 TV 小編分享的“MySQL 高可用架構之 MHA 的原理分析”這篇文章對大家有幫助，同時也希望大家多多支持丸趣 TV，關注丸趣 TV 行業資訊頻道，更多相關知識等著你來學習!

正文完