Kubernetes Node Controller怎么啟動

256次閱讀

共計(jì) 8076 個(gè)字符，預(yù)計(jì)需要花費(fèi) 21 分鐘才能閱讀完成。

本篇內(nèi)容介紹了“Kubernetes Node Controller 怎么啟動”的有關(guān)知識，在實(shí)際案例的操作過程中，不少人都會遇到這樣的困境，接下來就讓丸趣 TV 小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧！希望大家仔細(xì)閱讀，能夠?qū)W有所成！

Node Controller 的啟動

if ctx.IsControllerEnabled(nodeControllerName) {
 //  解析得到 Cluster CIDR， # clusterCIDR is CIDR Range for Pods in cluster.
 _, clusterCIDR, err := net.ParseCIDR(s.ClusterCIDR)
 //  解析得到 Service CIDR，# serviceCIDR is CIDR Range for Services in cluster.
 _, serviceCIDR, err := net.ParseCIDR(s.ServiceCIDR)
 //  創(chuàng)建 NodeController 實(shí)例
 nodeController, err := nodecontroller.NewNodeController(sharedInformers.Core().V1().Pods(),
 sharedInformers.Core().V1().Nodes(),
 sharedInformers.Extensions().V1beta1().DaemonSets(),
 cloud,
 clientBuilder.ClientOrDie(node-controller),
 s.PodEvictionTimeout.Duration,
 s.NodeEvictionRate,
 s.SecondaryNodeEvictionRate,
 s.LargeClusterSizeThreshold,
 s.UnhealthyZoneThreshold,
 s.NodeMonitorGracePeriod.Duration,
 s.NodeStartupGracePeriod.Duration,
 s.NodeMonitorPeriod.Duration,
 clusterCIDR,
 serviceCIDR,
 int(s.NodeCIDRMaskSize),
 s.AllocateNodeCIDRs,
 s.EnableTaintManager,
 utilfeature.DefaultFeatureGate.Enabled(features.TaintBasedEvictions),
 //  執(zhí)行 Run 方法啟動該 Controller
 nodeController.Run()
 // sleep 一個(gè)隨機(jī)時(shí)間，該時(shí)間大小為  “ControllerStartInterval + rand.Float64()*1.0*float64(ControllerStartInterval))”，其中 ControllerStartInterval 可以通過配置 kube-controller-manager 的 --controller-start-interval”參數(shù)指定。time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter))
}

因此，很清晰地，關(guān)鍵就在以下兩步：

nodeController, err := nodecontroller.NewNodeController 創(chuàng)建 NodeController 實(shí)例。

nodeController.Run() 執(zhí)行 Run 方法啟動該 Controller。

NodeController 的定義

在分析 NodeController 的原理之前，我們有必要先看看 NodeController 是如何定義的，其完整的定義如下：

type NodeController struct {
 allocateNodeCIDRs bool
 cloud cloudprovider.Interface
 clusterCIDR *net.IPNet
 serviceCIDR *net.IPNet
 knownNodeSet map[string]*v1.Node
 kubeClient clientset.Interface
 // Method for easy mocking in unittest.
 lookupIP func(host string) ([]net.IP, error)
 // Value used if sync_nodes_status=False. NodeController will not proactively
 // sync node status in this case, but will monitor node status updated from kubelet. If
 // it doesn t receive update for this amount of time, it will start posting  NodeReady==
 // ConditionUnknown . The amount of time before which NodeController start evicting pods
 // is controlled via flag  pod-eviction-timeout .
 // Note: be cautious when changing the constant, it must work with nodeStatusUpdateFrequency
 // in kubelet. There are several constraints:
 // 1. nodeMonitorGracePeriod must be N times more than nodeStatusUpdateFrequency, where
 // N means number of retries allowed for kubelet to post node status. It is pointless
 // to make nodeMonitorGracePeriod be less than nodeStatusUpdateFrequency, since there
 // will only be fresh values from Kubelet at an interval of nodeStatusUpdateFrequency.
 // The constant must be less than podEvictionTimeout.
 // 2. nodeMonitorGracePeriod can t be too large for user experience - larger value takes
 // longer for user to see up-to-date node status.
 nodeMonitorGracePeriod time.Duration
 // Value controlling NodeController monitoring period, i.e. how often does NodeController
 // check node status posted from kubelet. This value should be lower than nodeMonitorGracePeriod.
 // TODO: Change node status monitor to watch based.
 nodeMonitorPeriod time.Duration
 // Value used if sync_nodes_status=False, only for node startup. When node
 // is just created, e.g. cluster bootstrap or node creation, we give a longer grace period.
 nodeStartupGracePeriod time.Duration
 // per Node map storing last observed Status together with a local time when it was observed.
 // This timestamp is to be used instead of LastProbeTime stored in Condition. We do this
 // to aviod the problem with time skew across the cluster.
 nodeStatusMap map[string]nodeStatusData
 now func() metav1.Time
 // Lock to access evictor workers
 evictorLock sync.Mutex
 // workers that evicts pods from unresponsive nodes.
 zonePodEvictor map[string]*RateLimitedTimedQueue
 // workers that are responsible for tainting nodes.
 zoneNotReadyOrUnreachableTainer map[string]*RateLimitedTimedQueue
 podEvictionTimeout time.Duration
 // The maximum duration before a pod evicted from a node can be forcefully terminated.
 maximumGracePeriod time.Duration
 recorder record.EventRecorder
 nodeLister corelisters.NodeLister
 nodeInformerSynced cache.InformerSynced
 daemonSetStore extensionslisters.DaemonSetLister
 daemonSetInformerSynced cache.InformerSynced
 podInformerSynced cache.InformerSynced
 // allocate/recycle CIDRs for node if allocateNodeCIDRs == true
 cidrAllocator CIDRAllocator
 // manages taints
 taintManager *NoExecuteTaintManager
 forcefullyDeletePod func(*v1.Pod) error
 nodeExistsInCloudProvider func(types.NodeName) (bool, error)
 computeZoneStateFunc func(nodeConditions []*v1.NodeCondition) (int, zoneState)
 enterPartialDisruptionFunc func(nodeNum int) float32
 enterFullDisruptionFunc func(nodeNum int) float32
 zoneStates map[string]zoneState
 evictionLimiterQPS float32
 secondaryEvictionLimiterQPS float32
 largeClusterThreshold int32
 unhealthyZoneThreshold float32
 // if set to true NodeController will start TaintManager that will evict Pods from
 // tainted nodes, if they re not tolerated.
 runTaintManager bool
 // if set to true NodeController will taint Nodes with  TaintNodeNotReady  and  TaintNodeUnreachable 
 // taints instead of evicting Pods itself.
 useTaintBasedEvictions bool
}

NodeController 的行為配置

整個(gè) NodeController 結(jié)構(gòu)體非常復(fù)雜，包含 30+ 項(xiàng)，我們將重點(diǎn)關(guān)注：

clusterCIDR – 通過 –cluster-cidr 來設(shè)置，表示 CIDR Range for Pods in cluster。

serivceCIDR – 通過 –service-cluster-ip-range 來設(shè)置，表示 CIDR Range for Services in cluster。

knownNodeSet – 用來記錄 NodeController observed 節(jié)點(diǎn)的集合。

nodeMonitorGracePeriod – 通過 –node-monitor-grace-period 來設(shè)置，默認(rèn)為 40s，表示在標(biāo)記某個(gè) Node 為 unhealthy 前，允許 40s 內(nèi)該 Node unresponsive。

nodeMonitorPeriod – 通過 –node-monitor-period 來設(shè)置，默認(rèn)為 5s，表示在 NodeController 中同步 NodeStatus 的周期。

nodeStatusMap – 用來記錄每個(gè) Node 最近一次觀察到的 Status。

zonePodEvictor – workers that evicts pods from unresponsive nodes.

zoneNotReadyOrUnreachableTainer – workers that are responsible for tainting nodes.

podEvictionTimeout – 通過 –pod-eviction-timeout 設(shè)置，默認(rèn)為 5min，表示在強(qiáng)制刪除 Pod 時(shí)，允許的最大的 Pod eviction 時(shí)間。

maximumGracePeriod – The maximum duration before a pod evicted from a node can be forcefully terminated. 不可配置，代碼中寫死為 5min。

nodeLister – 用來獲取 Node 數(shù)據(jù)的 Interface。

daemonSetStore – 用來獲取 daemonSet 數(shù)據(jù)的 Interface。在通過 Eviction 方式刪除 Pods 時(shí)，會跳過該 Node 上所有的 daemonSet 對應(yīng)的 Pods。

taintManager – 它是一個(gè) NoExecuteTaintManager 對象，當(dāng) runTaintManager(默認(rèn) true) 為 true 時(shí):

PodInformer 和 NodeInformer 將監(jiān)聽到 PodAdd,PodDelete,PodUpdate 和 NodeAdd,NodeDelete,NodeUpdate 事件后，

觸發(fā) TraintManager 執(zhí)行對應(yīng)的 NoExecuteTaintManager.PodUpdated 和 NoExecuteTaintManager.NodeUpdated 方法，

將事件加入到對應(yīng)的 queue(podUpdateQueue and nodeUpdateQueue)，TaintController 會從這些 queue 中消費(fèi)這些消息，

TaintController 分別調(diào)用 handlePodUpdate 和 handleNodeUpdate 處理。

具體的 TaintController 的處理邏輯，后續(xù)再單獨(dú)分析。

forcefullyDeletePod – 該方法用來 NodeController 調(diào)用 apiserver 接口強(qiáng)制刪除該 Pod。用來刪除那些被調(diào)度到 kubelet version 小于 v1.1.0 Node 上的 Pod，因?yàn)?kubelet v1.1.0 之前的版本不支持 graceful termination。

computeZoneStateFunc – 該方法返回 Zone 中 NotReadyNodes 數(shù)量以及該 Zone 的 state。

如果沒有一個(gè) Ready Node，則該 node state 為 FullDisruption；

如果 unhealthy Nodes 所占的比例大于等于 unhealthyZoneThreshold, 則該 node state 為 PartialDisruption;

否則該 node state 就是 Narmal。

enterPartialDisruptionFunc – 該方法用當(dāng)前 node num 對比 largeClusterThreshold：

如果 nodeNum largeClusterThreshold 則返回 secondaryEvictionLimiterQPS（默認(rèn)為 0.01）；

否則返回 0，表示停止 evict 操作。

enterFullDisruptionFunc – 用來獲取 evictionLimiterQPS（默認(rèn)為 0.1）的方法，關(guān)于 evictionLimiterQPS 的理解見下。

zoneStates – 表示各個(gè) zone 的狀態(tài)，狀態(tài)值可以為

Initial;

Normal;

FullDisruption;

PartialDisruption;

evictionLimiterQPS – 通過 –node-eviction-rate 設(shè)置，默認(rèn)為 0.1，表示當(dāng)某個(gè) Zone status 為 healthy 時(shí)，每秒應(yīng)該剔除的 Nodes 數(shù)量，即每 10s 剔除 1 個(gè) Node。

secondaryEvictionLimiterQPS – 通過 –secondary-node-eviction-rate 設(shè)置，默認(rèn)為 0.01，表示當(dāng)某個(gè) Zone status 為 unhealthy 時(shí)，每秒應(yīng)該剔除的 Nodes 數(shù)量，即每 100s 剔除 1 個(gè) Node。

largeClusterThreshold – 通過 –large-cluster-size-threshold 設(shè)置，默認(rèn)為 50，表示當(dāng)健康 nodes 組成的集群規(guī)模小于等于 50 時(shí)，secondary-node-eviction-rate 將被設(shè)置為 0。

unhealthyZoneThreshold – 通過 –unhealthy-zone-threshold 設(shè)置，默認(rèn)為 0.55，表示當(dāng)某個(gè) Zone 中 unhealthy Nodes（最少為 3）所占的比例達(dá)到 0.55 時(shí)，就認(rèn)為該 Zone 的狀態(tài)為 unhealthy。

runTaintManager – 在 –enable-taint-manager 中指定，默認(rèn)為 true。如果為 true，則表示 NodeController 將會啟動 TaintManager，由 TaintManager 負(fù)責(zé)將不能容忍該 Taint 的 Nodes 上的 Pods 進(jìn)行 evict 操作。

useTaintBasedEvictions – 在 –feature-gates 中指定，默認(rèn) TaintBasedEvictions=false, 仍屬于 Alpha 特性。如果為 true，則表示將通過 Taint Nodes 的方式來 Evict Pods。

“Kubernetes Node Controller 怎么啟動”的內(nèi)容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識可以關(guān)注丸趣 TV 網(wǎng)站，丸趣 TV 小編將為大家輸出更多高質(zhì)量的實(shí)用文章！

正文完