共計 3769 個字符,預計需要花費 10 分鐘才能閱讀完成。
今天就跟大家聊聊有關頻繁添加刪除 osd 導致 osd 無法 up 怎么辦,可能很多人都不太了解,為了讓大家更加了解,丸趣 TV 小編給大家總結了以下內容,希望大家根據這篇文章可以有所收獲。
### 環境介紹
預上線系統,手工已經設置好 crushmap,并且已經指定了 osd.139 所在的 location
集群開啟了 noout(ceph osd set noout)
ceph 版本:0.94.5
osd 設置了 osd crush update on start = false, 避免 osd 啟動以后改變 crushmap
### 故障現象 在模擬單節點故障發生的過程中,多次手工添加和刪除同一個 osd(只刪除數據和 keyring,不動 crushmap 內容),最后發現新加的 osd 進程雖然已經啟動,并且啟動日志也無報錯,但是始終無法進入 up 狀態。
2016-04-01 11:19:16.868837 7fee3654b900 0 ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43), process ceph-osd, pid 104255
.....
2016-04-01 11:19:19.295992 7fee3654b900 0 osd.139 12789 crush map has features 2200130813952, adjusting msgr requires for clients
2016-04-01 11:19:19.296008 7fee3654b900 0 osd.139 12789 crush map has features 2200130813952 was 8705, adjusting msgr requires for mons
2016-04-01 11:19:19.296016 7fee3654b900 0 osd.139 12789 crush map has features 2200130813952, adjusting msgr requires for osds
2016-04-01 11:19:19.296052 7fee3654b900 0 osd.139 12789 load_pgs
2016-04-01 11:19:19.296094 7fee3654b900 0 osd.139 12789 load_pgs opened 0 pgs
2016-04-01 11:19:19.296878 7fee3654b900 -1 osd.139 12789 log_to_monitors {default=true}
2016-04-01 11:19:19.305091 7fee246f1700 0 osd.139 12789 ignoring osdmap until we have initialized
2016-04-01 11:19:19.305239 7fee246f1700 0 osd.139 12789 ignoring osdmap until we have initialized
2016-04-01 11:19:19.305425 7fee3654b900 0 osd.139 12789 done with init, starting boot process
開啟 debug osd=20 以后發現始終進行如下操作
2016-04-01 11:46:23.300790 7f9219d15700 20 osd.139 12813 update_osd_stat osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:23.300821 7f9219d15700 5 osd.139 12813 heartbeat: osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:25.200613 7f9231e86700 5 osd.139 12813 tick
2016-04-01 11:46:25.200644 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:25.200648 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:25.600974 7f9219d15700 20 osd.139 12813 update_osd_stat osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:25.601002 7f9219d15700 5 osd.139 12813 heartbeat: osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:26.200759 7f9231e86700 5 osd.139 12813 tick
2016-04-01 11:46:26.200784 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:26.200788 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:27.200867 7f9231e86700 5 osd.139 12813 tick
2016-04-01 11:46:27.200892 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:27.200895 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:28.201002 7f9231e86700 5 osd.139 12813 tick
2016-04-01 11:46:28.201022 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:28.201030 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:29.101147 7f9219d15700 20 osd.139 12813 update_osd_stat osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:29.101180 7f9219d15700 5 osd.139 12813 heartbeat: osd_stat(538 MB used, 3723 GB avail, 3724 GB total, peers []/[] op hist [])
2016-04-01 11:46:29.201115 7f9231e86700 5 osd.139 12813 tick
2016-04-01 11:46:29.201128 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:29.201132 7f9231e86700 10 osd.139 12813 do_waiters -- finish
2016-04-01 11:46:30.201237 7f9231e86700 5 osd.139 12813 tick
2016-04-01 11:46:30.201267 7f9231e86700 10 osd.139 12813 do_waiters -- start
2016-04-01 11:46:30.201271 7f9231e86700 10 osd.139 12813 do_waiters -- finish
### 解決方法 1. 在 crush 中刪除對應的 osd 信息
ceph osd crush remove osd.139 # 注意可能會導致數據遷移
2. 啟動 osd 服務, 將 osd 添加回 crushmap 內。
ceph osd crush add 139 1.0 host=xxx
看完上述內容,你們對頻繁添加刪除 osd 導致 osd 無法 up 怎么辦有進一步的了解嗎?如果還想了解更多知識或者相關內容,請關注丸趣 TV 行業資訊頻道,感謝大家的支持。
正文完