Oracle集群心跳及其參數misscount/disktimeout/reboottime分析

134次閱讀

共計 4439 個字符，預計需要花費 12 分鐘才能閱讀完成。

行業資訊
數據庫
關系型數據庫
Oracle 集群心跳及其參數 misscount/disktimeout/reboottime 分析

這篇文章主要講解了“Oracle 集群心跳及其參數 misscount/disktimeout/reboottime 分析”，文中的講解內容簡單清晰，易于學習與理解，下面請大家跟著丸趣 TV 小編的思路慢慢深入，一起來研究和學習“Oracle 集群心跳及其參數 misscount/disktimeout/reboottime 分析”吧！

一、OCSSD 與 CSS
OCSSD 是一個管理及提供 Cluster Synchronization Services (CSS) 服務的 Linux 或者 Unix 進程。使用 Oracle 用戶來執行該進程并提供節點成員管理功能，一旦該進程失敗。將導致節點重新啟動。CSS 服務提供 2 種心跳機制。一種為網絡心跳。一種為磁盤心跳。兩種心跳都有最大延時，網絡心跳的延時叫 MC(Misscount)，磁盤心跳延時叫作 IOT (I/O Timeout)。

這 2 個參數都以秒為單位。缺省時情況下 Misscount Disktimeout。

以下分別描寫敘述這 2 種心跳機制。

二、網絡心跳
故名思義即是通過私有網絡來檢測節點的狀態。假設私有網絡硬件、軟件導致集群節點間私有網絡在一定時間內無法進行正常通信。由此而導致腦裂。由于集群環境中的存儲為共享存儲，因此此時必須要將故障節點從集群隔離出來，以避免數據災難。關于這個網絡心跳的詳細動作描寫敘述例如以下：
Every one second, a sending thread in the cssd sends a network tcp heartbeat to itself and all nodes. The receiving thread of the ocssd.bin receives the heartbeat.
If the package network is dropped or has error, the error correction mechanism on tcp would retransmit the package.
Oracle does not retransmit. From the ocssd.log, you will see a WARNING message about missing of heartbeat if a node does not receive a heartbeat from another node for 15 seconds (50% of miscount). Another warning is reported in ocssd.log if the same node is missing for 22 seconds (75% of miscount)..another warning continues from the same node for 27 seconds (90% miscount). When the heartbeat is missing 100% ..30 seconds miscount, the node is evicted

這個網絡心跳的延遲稱之為 misscount，能夠通過 crsctl 工具查詢及改動。
[grid@Linux-01 ~]$ crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.

上面的查詢結果表明，假設集群各節點間內聯網絡延遲大于 30s，Oracle 覺得節點間發生了腦裂，須要將故障節點逐出集群。

怎樣尋找故障節點。Oracle 則通過投票算法來決定，以下是一個算法描寫敘述演示樣例，描寫敘述參考大話 Oracle RAC。
集群中各個節點須要心跳機制來通報彼此的健康狀態。假設每收到一個節點的通報代表一票。對于三個節點的集群。正常執行時，每一個節點都會有 3 票。當結點 A 心跳出現故障但節點 A 還在執行，這時整個集群就會分裂成 2 個小的 partition。

節點 A 是一個。剩下的 2 個是一個。

這是必須剔除一個 partition 才干保障集群的健康執行。對于這 3 個節點的集群，A 心跳出現故障后，B 和 C 是一個 partion，有 2 票，A 僅僅有 1 票。

依照投票算法。B 和 C 組成的集群獲得控制權。A 被剔除。假設僅僅有 2 個節點，投票算法就失效了。

由于每一個節點上都僅僅有 1 票。這時就須要引入第三個設備：Quorum Device. Quorum Device 通常採用的是共享磁盤，這個磁盤也叫作 Quorum disk。這個 Quorum Disk 也代表一票。當 2 個結點的心跳出現故障時，2 個節點同一時候去爭取 Quorum Disk 這一票，最早到達的請求被最先滿足。

故最先獲得 Quorum Disk 的節點就獲得 2 票。還有一個節點就會被剔除。

節點一旦被隔離之后，在 11gR2 之前一般是重新啟動故障節點。

而在 11gR2 中。ClusterWare 會首先嘗試關閉該節點的全部資源，嘗試對集群中失敗的組建進行清理，即重新啟動失敗的組件。

假設清理失敗的組件未成功，為了強制清理，則再對節點進行重新啟動。

三、磁盤心跳
A thread in ocssd.bin updates the voting disk every second.
If a node does not update the voting disks for 200 seconds, it s evicted.
However, the ocssd.bin on the local node has the logic that it will bring down the node if it has an I/O error more than majority of the voting disks. Also there is a CRS reconfiguration is happening when misscount is 27 second and the local node is rebooted. As a result, you rarely see an eviction due to failure of the voting disk on 10.2.0.4 (this is more common in 10.2.0.1)) because the ocssd.bin will abort the node before it get evicted by another node if writing to the voting disk is the problem.
如上所述，每一個節點會每一秒鐘更新一次表決磁盤。共享的表決磁盤用于檢查磁盤心跳。

假設 ocssd 進程更新表決磁盤的時間超過 200s，即 disktimeout 設定的值。Oracle 會覺得該表決磁盤脫機，同一時候在 Clusterware 的告警日志中生成表決磁盤脫機記錄。假設當前節點表決磁盤脫機的個數小于在線表決磁盤的個數，該節點能夠幸存，假設脫機表決磁盤的個數大于或等于在線表決磁盤的個數，則 clusterware 覺得磁盤心跳出現故障。故障節點會被逐出集群。執行自己主動修復過程。

比方有 3 個表決磁盤。節點 A 有表決磁盤出現了脫機。此時脫機磁盤 (1 個) 在線磁盤 (2)。clusterware 會在告警日志中生成脫機記錄，但不採取不論什么行動。假設當前節點有 2 個或 2 個以上表決磁盤脫機，此時脫機磁盤 (2 個) 在線磁盤 (1 個)。那節點 A 被踢出集群。

四、RebootTime 參數
注意這個 RebootTime 參數。也非常重要，缺省情況下為 3s。
Default 3 seconds -the amount of time allowed for a node to complete a reboot
after the CSS daemon has been evicted.
crsctl get css reboottime

五、心跳參數的調整
1) 10.2.0.2 to 11.1.0.7 版本號的改動方法
a) Shut down CRS on all but one node. For exact steps use note 309542.1
b) Execute crsctl as root to modify the misscount:
$CRS_HOME/bin/crsctl set css misscount n #### where n is the maximum private network latency in seconds
$CRS_HOME/bin/crsctl set css reboottime r [-force] #### (r is seconds)
$CRS_HOME/bin/crsctl set css disktimeout d [-force] #### (d is seconds)
c) Reboot the node where adjustment was made
d) Start all other nodes which was shutdown in step 1
e) Execute crsctl as root to confirm the change:
$CRS_HOME/bin/crsctl get css misscount
$CRS_HOME/bin/crsctl get css reboottime
$CRS_HOME/bin/crsctl get css disktimeout

2) 11gR2 的改動方法
With 11gR2, these settings can be changed online without taking any node down:

a) Execute crsctl as root to modify the misscount:
$CRS_HOME/bin/crsctl set css misscount n #### where n is the maximum private network latency in seconds
$CRS_HOME/bin/crsctl set css reboottime r [-force] #### (r is seconds)
$CRS_HOME/bin/crsctl set css disktimeout d [-force] #### (d is seconds)
b) Execute crsctl as root to confirm the change:
$CRS_HOME/bin/crsctl get css misscount
$CRS_HOME/bin/crsctl get css reboottime
$CRS_HOME/bin/crsctl get css disktimeout

感謝各位的閱讀，以上就是“Oracle 集群心跳及其參數 misscount/disktimeout/reboottime 分析”的內容了，經過本文的學習后，相信大家對 Oracle 集群心跳及其參數 misscount/disktimeout/reboottime 分析這一問題有了更深刻的體會，具體使用情況還需要大家實踐驗證。這里是丸趣 TV，丸趣 TV 小編將為大家推送更多相關知識點的文章，歡迎關注！

正文完