共計(jì) 8076 個(gè)字符,預(yù)計(jì)需要花費(fèi) 21 分鐘才能閱讀完成。
本篇內(nèi)容介紹了“Kubernetes Node Controller 怎么啟動”的有關(guān)知識,在實(shí)際案例的操作過程中,不少人都會遇到這樣的困境,接下來就讓丸趣 TV 小編帶領(lǐng)大家學(xué)習(xí)一下如何處理這些情況吧!希望大家仔細(xì)閱讀,能夠?qū)W有所成!
Node Controller 的啟動
if ctx.IsControllerEnabled(nodeControllerName) {
// 解析得到 Cluster CIDR, # clusterCIDR is CIDR Range for Pods in cluster.
_, clusterCIDR, err := net.ParseCIDR(s.ClusterCIDR)
// 解析得到 Service CIDR,# serviceCIDR is CIDR Range for Services in cluster.
_, serviceCIDR, err := net.ParseCIDR(s.ServiceCIDR)
// 創(chuàng)建 NodeController 實(shí)例
nodeController, err := nodecontroller.NewNodeController(sharedInformers.Core().V1().Pods(),
sharedInformers.Core().V1().Nodes(),
sharedInformers.Extensions().V1beta1().DaemonSets(),
cloud,
clientBuilder.ClientOrDie(node-controller),
s.PodEvictionTimeout.Duration,
s.NodeEvictionRate,
s.SecondaryNodeEvictionRate,
s.LargeClusterSizeThreshold,
s.UnhealthyZoneThreshold,
s.NodeMonitorGracePeriod.Duration,
s.NodeStartupGracePeriod.Duration,
s.NodeMonitorPeriod.Duration,
clusterCIDR,
serviceCIDR,
int(s.NodeCIDRMaskSize),
s.AllocateNodeCIDRs,
s.EnableTaintManager,
utilfeature.DefaultFeatureGate.Enabled(features.TaintBasedEvictions),
// 執(zhí)行 Run 方法啟動該 Controller
nodeController.Run()
// sleep 一個(gè)隨機(jī)時(shí)間,該時(shí)間大小為 “ControllerStartInterval + rand.Float64()*1.0*float64(ControllerStartInterval))”,其中 ControllerStartInterval 可以通過配置 kube-controller-manager 的 --controller-start-interval”參數(shù)指定。time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter))
}
因此,很清晰地,關(guān)鍵就在以下兩步:
nodeController, err := nodecontroller.NewNodeController 創(chuàng)建 NodeController 實(shí)例。
nodeController.Run() 執(zhí)行 Run 方法啟動該 Controller。
NodeController 的定義
在分析 NodeController 的原理之前,我們有必要先看看 NodeController 是如何定義的,其完整的定義如下:
type NodeController struct {
allocateNodeCIDRs bool
cloud cloudprovider.Interface
clusterCIDR *net.IPNet
serviceCIDR *net.IPNet
knownNodeSet map[string]*v1.Node
kubeClient clientset.Interface
// Method for easy mocking in unittest.
lookupIP func(host string) ([]net.IP, error)
// Value used if sync_nodes_status=False. NodeController will not proactively
// sync node status in this case, but will monitor node status updated from kubelet. If
// it doesn t receive update for this amount of time, it will start posting NodeReady==
// ConditionUnknown . The amount of time before which NodeController start evicting pods
// is controlled via flag pod-eviction-timeout .
// Note: be cautious when changing the constant, it must work with nodeStatusUpdateFrequency
// in kubelet. There are several constraints:
// 1. nodeMonitorGracePeriod must be N times more than nodeStatusUpdateFrequency, where
// N means number of retries allowed for kubelet to post node status. It is pointless
// to make nodeMonitorGracePeriod be less than nodeStatusUpdateFrequency, since there
// will only be fresh values from Kubelet at an interval of nodeStatusUpdateFrequency.
// The constant must be less than podEvictionTimeout.
// 2. nodeMonitorGracePeriod can t be too large for user experience - larger value takes
// longer for user to see up-to-date node status.
nodeMonitorGracePeriod time.Duration
// Value controlling NodeController monitoring period, i.e. how often does NodeController
// check node status posted from kubelet. This value should be lower than nodeMonitorGracePeriod.
// TODO: Change node status monitor to watch based.
nodeMonitorPeriod time.Duration
// Value used if sync_nodes_status=False, only for node startup. When node
// is just created, e.g. cluster bootstrap or node creation, we give a longer grace period.
nodeStartupGracePeriod time.Duration
// per Node map storing last observed Status together with a local time when it was observed.
// This timestamp is to be used instead of LastProbeTime stored in Condition. We do this
// to aviod the problem with time skew across the cluster.
nodeStatusMap map[string]nodeStatusData
now func() metav1.Time
// Lock to access evictor workers
evictorLock sync.Mutex
// workers that evicts pods from unresponsive nodes.
zonePodEvictor map[string]*RateLimitedTimedQueue
// workers that are responsible for tainting nodes.
zoneNotReadyOrUnreachableTainer map[string]*RateLimitedTimedQueue
podEvictionTimeout time.Duration
// The maximum duration before a pod evicted from a node can be forcefully terminated.
maximumGracePeriod time.Duration
recorder record.EventRecorder
nodeLister corelisters.NodeLister
nodeInformerSynced cache.InformerSynced
daemonSetStore extensionslisters.DaemonSetLister
daemonSetInformerSynced cache.InformerSynced
podInformerSynced cache.InformerSynced
// allocate/recycle CIDRs for node if allocateNodeCIDRs == true
cidrAllocator CIDRAllocator
// manages taints
taintManager *NoExecuteTaintManager
forcefullyDeletePod func(*v1.Pod) error
nodeExistsInCloudProvider func(types.NodeName) (bool, error)
computeZoneStateFunc func(nodeConditions []*v1.NodeCondition) (int, zoneState)
enterPartialDisruptionFunc func(nodeNum int) float32
enterFullDisruptionFunc func(nodeNum int) float32
zoneStates map[string]zoneState
evictionLimiterQPS float32
secondaryEvictionLimiterQPS float32
largeClusterThreshold int32
unhealthyZoneThreshold float32
// if set to true NodeController will start TaintManager that will evict Pods from
// tainted nodes, if they re not tolerated.
runTaintManager bool
// if set to true NodeController will taint Nodes with TaintNodeNotReady and TaintNodeUnreachable
// taints instead of evicting Pods itself.
useTaintBasedEvictions bool
}
NodeController 的行為配置
整個(gè) NodeController 結(jié)構(gòu)體非常復(fù)雜,包含 30+ 項(xiàng),我們將重點(diǎn)關(guān)注:
clusterCIDR – 通過 –cluster-cidr 來設(shè)置,表示 CIDR Range for Pods in cluster。
serivceCIDR – 通過 –service-cluster-ip-range 來設(shè)置,表示 CIDR Range for Services in cluster。
knownNodeSet – 用來記錄 NodeController observed 節(jié)點(diǎn)的集合。
nodeMonitorGracePeriod – 通過 –node-monitor-grace-period 來設(shè)置,默認(rèn)為 40s,表示在標(biāo)記某個(gè) Node 為 unhealthy 前,允許 40s 內(nèi)該 Node unresponsive。
nodeMonitorPeriod – 通過 –node-monitor-period 來設(shè)置,默認(rèn)為 5s,表示在 NodeController 中同步 NodeStatus 的周期。
nodeStatusMap – 用來記錄每個(gè) Node 最近一次觀察到的 Status。
zonePodEvictor – workers that evicts pods from unresponsive nodes.
zoneNotReadyOrUnreachableTainer – workers that are responsible for tainting nodes.
podEvictionTimeout – 通過 –pod-eviction-timeout 設(shè)置,默認(rèn)為 5min,表示在強(qiáng)制刪除 Pod 時(shí),允許的最大的 Pod eviction 時(shí)間。
maximumGracePeriod – The maximum duration before a pod evicted from a node can be forcefully terminated. 不可配置,代碼中寫死為 5min。
nodeLister – 用來獲取 Node 數(shù)據(jù)的 Interface。
daemonSetStore – 用來獲取 daemonSet 數(shù)據(jù)的 Interface。在通過 Eviction 方式刪除 Pods 時(shí),會跳過該 Node 上所有的 daemonSet 對應(yīng)的 Pods。
taintManager – 它是一個(gè) NoExecuteTaintManager 對象,當(dāng) runTaintManager(默認(rèn) true) 為 true 時(shí):
PodInformer 和 NodeInformer 將監(jiān)聽到 PodAdd,PodDelete,PodUpdate 和 NodeAdd,NodeDelete,NodeUpdate 事件后,
觸發(fā) TraintManager 執(zhí)行對應(yīng)的 NoExecuteTaintManager.PodUpdated 和 NoExecuteTaintManager.NodeUpdated 方法,
將事件加入到對應(yīng)的 queue(podUpdateQueue and nodeUpdateQueue),TaintController 會從這些 queue 中消費(fèi)這些消息,
TaintController 分別調(diào)用 handlePodUpdate 和 handleNodeUpdate 處理。
具體的 TaintController 的處理邏輯,后續(xù)再單獨(dú)分析。
forcefullyDeletePod – 該方法用來 NodeController 調(diào)用 apiserver 接口強(qiáng)制刪除該 Pod。用來刪除那些被調(diào)度到 kubelet version 小于 v1.1.0 Node 上的 Pod,因?yàn)?kubelet v1.1.0 之前的版本不支持 graceful termination。
computeZoneStateFunc – 該方法返回 Zone 中 NotReadyNodes 數(shù)量以及該 Zone 的 state。
如果沒有一個(gè) Ready Node,則該 node state 為 FullDisruption;
如果 unhealthy Nodes 所占的比例大于等于 unhealthyZoneThreshold, 則該 node state 為 PartialDisruption;
否則該 node state 就是 Narmal。
enterPartialDisruptionFunc – 該方法用當(dāng)前 node num 對比 largeClusterThreshold:
如果 nodeNum largeClusterThreshold 則返回 secondaryEvictionLimiterQPS(默認(rèn)為 0.01);
否則返回 0,表示停止 evict 操作。
enterFullDisruptionFunc – 用來獲取 evictionLimiterQPS(默認(rèn)為 0.1)的方法,關(guān)于 evictionLimiterQPS 的理解見下。
zoneStates – 表示各個(gè) zone 的狀態(tài),狀態(tài)值可以為
Initial;
Normal;
FullDisruption;
PartialDisruption;
evictionLimiterQPS – 通過 –node-eviction-rate 設(shè)置,默認(rèn)為 0.1,表示當(dāng)某個(gè) Zone status 為 healthy 時(shí),每秒應(yīng)該剔除的 Nodes 數(shù)量,即每 10s 剔除 1 個(gè) Node。
secondaryEvictionLimiterQPS – 通過 –secondary-node-eviction-rate 設(shè)置,默認(rèn)為 0.01,表示當(dāng)某個(gè) Zone status 為 unhealthy 時(shí),每秒應(yīng)該剔除的 Nodes 數(shù)量,即每 100s 剔除 1 個(gè) Node。
largeClusterThreshold – 通過 –large-cluster-size-threshold 設(shè)置,默認(rèn)為 50,表示當(dāng)健康 nodes 組成的集群規(guī)模小于等于 50 時(shí),secondary-node-eviction-rate 將被設(shè)置為 0。
unhealthyZoneThreshold – 通過 –unhealthy-zone-threshold 設(shè)置,默認(rèn)為 0.55,表示當(dāng)某個(gè) Zone 中 unhealthy Nodes(最少為 3)所占的比例達(dá)到 0.55 時(shí),就認(rèn)為該 Zone 的狀態(tài)為 unhealthy。
runTaintManager – 在 –enable-taint-manager 中指定,默認(rèn)為 true。如果為 true,則表示 NodeController 將會啟動 TaintManager,由 TaintManager 負(fù)責(zé)將不能容忍該 Taint 的 Nodes 上的 Pods 進(jìn)行 evict 操作。
useTaintBasedEvictions – 在 –feature-gates 中指定,默認(rèn) TaintBasedEvictions=false, 仍屬于 Alpha 特性。如果為 true,則表示將通過 Taint Nodes 的方式來 Evict Pods。
“Kubernetes Node Controller 怎么啟動”的內(nèi)容就介紹到這里了,感謝大家的閱讀。如果想了解更多行業(yè)相關(guān)的知識可以關(guān)注丸趣 TV 網(wǎng)站,丸趣 TV 小編將為大家輸出更多高質(zhì)量的實(shí)用文章!