Kubernetes中Node異常時(shí)Pod狀態(tài)是怎樣的

193次閱讀

共計(jì) 5293 個(gè)字符，預(yù)計(jì)需要花費(fèi) 14 分鐘才能閱讀完成。

這篇文章主要講解了“Kubernetes 中 Node 異常時(shí) Pod 狀態(tài)是怎樣的”，文中的講解內(nèi)容簡(jiǎn)單清晰，易于學(xué)習(xí)與理解，下面請(qǐng)大家跟著丸趣 TV 小編的思路慢慢深入，一起來(lái)研究和學(xué)習(xí)“Kubernetes 中 Node 異常時(shí) Pod 狀態(tài)是怎樣的”吧！

Kubelet 進(jìn)程異常，Pod 狀態(tài)變化

一個(gè)節(jié)點(diǎn)上運(yùn)行著 pod 前提下，這個(gè)時(shí)候把 kubelet 進(jìn)程停掉。里面的 pod 會(huì)被干掉嗎？會(huì)在其他節(jié)點(diǎn) recreate 嗎？

結(jié)論：

（1）Node 狀態(tài)變?yōu)?NotReady（2）Pod 5 分鐘之內(nèi)狀態(tài)無(wú)變化，5 分鐘之后的狀態(tài)變化：Daemonset 的 Pod 狀態(tài)變?yōu)?Nodelost，Deployment、Statefulset 和 Static Pod 的狀態(tài)先變?yōu)?NodeLost，然后馬上變?yōu)?Unknown。Deployment 的 pod 會(huì) recreate，但是 Deployment 如果是 node selector 停掉 kubelet 的 node，則 recreate 的 pod 會(huì)一直處于 Pending 的狀態(tài)。Static Pod 和 Statefulset 的 Pod 會(huì)一直處于 Unknown 狀態(tài)。

Kubelet 恢復(fù)，Pod 行為

如果 kubelet 10 分鐘后又起來(lái)了，node 和 pod 會(huì)怎樣？

結(jié)論：

（1）Node 狀態(tài)變?yōu)?Ready。（2）Daemonset 的 pod 不會(huì) recreate，舊 pod 狀態(tài)直接變?yōu)?Running。（3）Deployment 的則是將 kubelet 進(jìn)程停止的 Node 刪除（原因可能是因?yàn)榕f Pod 狀態(tài)在集群中有變化，但是 Pod 狀態(tài)在變化時(shí)發(fā)現(xiàn)集群中 Deployment 的 Pod 實(shí)例數(shù)已經(jīng)夠了，所以對(duì)舊 Pod 做了刪除處理）（4）Statefulset 的 Pod 會(huì)重新 recreate。（5）Staic Pod 沒(méi)有重啟，但是 Pod 的運(yùn)行時(shí)間會(huì)在 kubelet 起來(lái)的時(shí)候置為 0。

在 kubelet 停止后，statefulset 的 pod 會(huì)變成 nodelost，接著就變成 unknown，但是不會(huì)重啟，然后等 kubelet 起來(lái)后，statefulset 的 pod 才會(huì) recreate。

還有一個(gè)就是 Static Pod 在 kubelet 重啟以后應(yīng)該沒(méi)有重啟，但是集群中查詢 Static Pod 的狀態(tài)時(shí)，Static Pod 的運(yùn)行時(shí)間變了

StatefulSet Pod 為何在 Node 異常時(shí)沒(méi)有 Recreate

Node down 后，StatefulSet Pods 並沒(méi)有重建，為什麼？

我們?cè)?node controller 中發(fā)現(xiàn)，除了 daemonset pods 外，都會(huì)調(diào)用 delete pod api 刪除 pod。

但并不是調(diào)用了 delete pod api 就會(huì)從 apiserver/etcd 中刪除 pod object，僅僅是設(shè)置 pod 的 deletionTimestamp，標(biāo)記該 pod 要被刪除。真正刪除 Pod 的行為是 kubelet，kubelet grace terminate 該 pod 后去真正刪除 pod object。這個(gè)時(shí)候 statefulset controller 發(fā)現(xiàn)某個(gè) replica 缺失就會(huì)去 recreate 這個(gè) pod。

但此時(shí)由于 kubelet 掛了，無(wú)法與 master 通信，導(dǎo)致 Pod Object 一直無(wú)法從 etcd 中刪除。如果能成功刪除 Pod Object，就可以在其他 Node 重建 Pod。

另外，要注意，statefulset 只會(huì)針對(duì) isFailed Pod，（但現(xiàn)在 Pods 是 Unkown 狀態(tài)）才會(huì)去 delete Pod。

// delete and recreate failed pods
 if isFailed(replicas[I]) {
 ssc.recorder.Eventf(set, v1.EventTypeWarning,  RecreatingFailedPod ,
 StatefulSetPlus %s/%s is recreating failed Pod %s ,
 set.Namespace,
 set.Name,
 replicas[I].Name)
 if err := ssc.podControl.DeleteStatefulPlusPod(set, replicas[I]); err != nil {
 return  status, err
 if getPodRevision(replicas[I]) == currentRevision.Name {
 status.CurrentReplicas—
 if getPodRevision(replicas[I]) == updateRevision.Name {
 status.UpdatedReplicas—
 status.Replicas—
 replicas[I] = newVersionedStatefulSetPlusPod(
 currentSet,
 updateSet,
 currentRevision.Name,
 updateRevision.Name,
 }

優(yōu)化 StatefulSet Pod 的行為

所以針對(duì) node 異常的情況，有狀態(tài)應(yīng)用 (Non-Quorum) 的保障，應(yīng)該補(bǔ)充以下行為：

監(jiān)測(cè) node 的網(wǎng)絡(luò)、kubelet 進(jìn)程、操作系統(tǒng)等是否異常，區(qū)別對(duì)待。

比如，如果是網(wǎng)絡(luò)異常，Pod 無(wú)法正常提供服務(wù)，那么需要 kubectl delete pod -f —grace-period= 0 進(jìn)行強(qiáng)制從 etcd 中刪除該 pod。

強(qiáng)制刪除后，statefulset controller 就會(huì)自動(dòng)觸發(fā)在其他 Node 上 recreate pod。

亦或者，更粗暴的方法，就是放棄 GracePeriodSeconds，StatefulSet Pod GracePeriodSeconds 為 nil 或者 0，則就會(huì)直接從 etcd 中刪除該 object。

// BeforeDelete tests whether the object can be gracefully deleted.
// If graceful is set, the object should be gracefully deleted. If gracefulPending
// is set, the object has already been gracefully deleted (and the provided grace
// period is longer than the time to deletion). An error is returned if the
// condition cannot be checked or the gracePeriodSeconds is invalid. The options
// argument may be updated with default values if graceful is true. Second place
// where we set deletionTimestamp is pkg/registry/generic/registry/store.go.
// This function is responsible for setting deletionTimestamp during gracefulDeletion,
// other one for cascading deletions.
func BeforeDelete(strategy RESTDeleteStrategy, ctx context.Context, obj runtime.Object, options *metav1.DeleteOptions) (graceful, gracefulPending bool, err error) {objectMeta, gvk, kerr := objectMetaAndKind(strategy, obj)
 if kerr != nil {
 return false, false, kerr
 if errs := validation.ValidateDeleteOptions(options); len(errs)   0 {return false, false, errors.NewInvalid(schema.GroupKind{Group: metav1.GroupName, Kind:  DeleteOptions},  , errs)
 // Checking the Preconditions here to fail early. They ll be enforced later on when we actually do the deletion, too.
 if options.Preconditions != nil   options.Preconditions.UID != nil   *options.Preconditions.UID != objectMeta.GetUID() {return false, false, errors.NewConflict(schema.GroupResource{Group: gvk.Group, Resource: gvk.Kind}, objectMeta.GetName(), fmt.Errorf( the UID in the precondition (%s) does not match the UID in record (%s). The object might have been deleted and then recreated , *options.Preconditions.UID, objectMeta.GetUID()))
 gracefulStrategy, ok := strategy.(RESTGracefulDeleteStrategy)
 if !ok {
 // If we re not deleting gracefully there s no point in updating Generation, as we won t update
 // the obcject before deleting it.
 return false, false, nil
 // if the object is already being deleted, no need to update generation.
 if objectMeta.GetDeletionTimestamp() != nil {
 // if we are already being deleted, we may only shorten the deletion grace period
 // this means the object was gracefully deleted previously but deletionGracePeriodSeconds was not set,
 // so we force deletion immediately
 // IMPORTANT:
 // The deletion operation happens in two phases.
 // 1. Update to set DeletionGracePeriodSeconds and DeletionTimestamp
 // 2. Delete the object from storage.
 // If the update succeeds, but the delete fails (network error, internal storage error, etc.),
 // a resource was previously left in a state that was non-recoverable. We
 // check if the existing stored resource has a grace period as 0 and if so
 // attempt to delete immediately in order to recover from this scenario.
 if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {return false, false, nil}

感謝各位的閱讀，以上就是“Kubernetes 中 Node 異常時(shí) Pod 狀態(tài)是怎樣的”的內(nèi)容了，經(jīng)過(guò)本文的學(xué)習(xí)后，相信大家對(duì) Kubernetes 中 Node 異常時(shí) Pod 狀態(tài)是怎樣的這一問(wèn)題有了更深刻的體會(huì)，具體使用情況還需要大家實(shí)踐驗(yàn)證。這里是丸趣 TV，丸趣 TV 小編將為大家推送更多相關(guān)知識(shí)點(diǎn)的文章，歡迎關(guān)注！

正文完