Predicates Policies有什么用

260次閱讀

沒有評論

共計 5525 個字符，預計需要花費 14 分鐘才能閱讀完成。

本篇內容介紹了“Predicates Policies 有什么用”的有關知識，在實際案例的操作過程中，不少人都會遇到這樣的困境，接下來就讓丸趣 TV 小編帶領大家學習一下如何處理這些情況吧！希望大家仔細閱讀，能夠學有所成！

##Predicates Policies 分析在 /plugin/pkg/scheduler/algorithm/predicates.go 中實現了以下的預選策略：

NoDiskConflict：檢查在此主機上是否存在卷沖突。如果這個主機已經掛載了卷，其它同樣使用這個卷的 Pod 不能調度到這個主機上。GCE,Amazon EBS, and Ceph RBD 使用的規則如下：

GCE 允許同時掛載多個卷，只要這些卷都是只讀的。

Amazon EBS 不允許不同的 Pod 掛載同一個卷。

Ceph RBD 不允許任何兩個 pods 分享相同的 monitor，match pool 和 image。

NoVolumeZoneConflict：檢查給定的 zone 限制前提下，檢查如果在此主機上部署 Pod 是否存在卷沖突。假定一些 volumes 可能有 zone 調度約束，VolumeZonePredicate 根據 volumes 自身需求來評估 pod 是否滿足條件。必要條件就是任何 volumes 的 zone-labels 必須與節點上的 zone-labels 完全匹配。節點上可以有多個 zone-labels 的約束（比如一個假設的復制卷可能會允許進行區域范圍內的訪問）。目前，這個只對 PersistentVolumeClaims 支持，而且只在 PersistentVolume 的范圍內查找標簽。處理在 Pod 的屬性中定義的 volumes（即不使用 PersistentVolume）有可能會變得更加困難，因為要在調度的過程中確定 volume 的 zone，這很有可能會需要調用云提供商。

PodFitsResources：檢查主機的資源是否滿足 Pod 的需求。根據實際已經分配的資源量做調度，而不是使用已實際使用的資源量做調度。

PodFitsHostPorts：檢查 Pod 內每一個容器所需的 HostPort 是否已被其它容器占用。如果有所需的 HostPort 不滿足需求，那么 Pod 不能調度到這個主機上。

HostName：檢查主機名稱是不是 Pod 指定的 HostName。

MatchNodeSelector：檢查主機的標簽是否滿足 Pod 的 nodeSelector 屬性需求。

MaxEBSVolumeCount：確保已掛載的 EBS 存儲卷不超過設置的最大值。默認值是 39。它會檢查直接使用的存儲卷，和間接使用這種類型存儲的 PVC。計算不同卷的總目，如果新的 Pod 部署上去后卷的數目會超過設置的最大值，那么 Pod 不能調度到這個主機上。

MaxGCEPDVolumeCount：確保已掛載的 GCE 存儲卷不超過設置的最大值。默認值是 16。規則同上。

下面是 NoDiskConflict 的代碼實現，其他 Predicates Policies 實現類似，都得如下函數原型：type FitPredicate func(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (bool, []PredicateFailureReason, error)

func NoDiskConflict(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
 for _, v := range pod.Spec.Volumes {for _, ev := range nodeInfo.Pods() {if isVolumeConflict(v, ev) {return false, []algorithm.PredicateFailureReason{ErrDiskConflict}, nil
 return true, nil, nil

func isVolumeConflict(volume v1.Volume, pod *v1.Pod) bool {
 // fast path if there is no conflict checking targets.
 if volume.GCEPersistentDisk == nil   volume.AWSElasticBlockStore == nil   volume.RBD == nil   volume.ISCSI == nil {
 return false
 for _, existingVolume := range pod.Spec.Volumes {
 if volume.RBD != nil   existingVolume.RBD != nil {
 mon, pool, image := volume.RBD.CephMonitors, volume.RBD.RBDPool, volume.RBD.RBDImage
 emon, epool, eimage := existingVolume.RBD.CephMonitors, existingVolume.RBD.RBDPool, existingVolume.RBD.RBDImage
 // two RBDs images are the same if they share the same Ceph monitor, are in the same RADOS Pool, and have the same image name
 // only one read-write mount is permitted for the same RBD image.
 // same RBD image mounted by multiple Pods conflicts unless all Pods mount the image read-only
 if haveSame(mon, emon)   pool == epool   image == eimage   !(volume.RBD.ReadOnly   existingVolume.RBD.ReadOnly) {
 return true
 return false
}

##Priorities Policies 分析

現在支持的優先級函數包括以下幾種：

LeastRequestedPriority：如果新的 pod 要分配給一個節點，這個節點的優先級就由節點空閑的那部分與總容量的比值（即（總容量 - 節點上 pod 的容量總和 - 新 pod 的容量）/ 總容量）來決定。CPU 和 memory 權重相當，比值最大的節點的得分最高。需要注意的是，這個優先級函數起到了按照資源消耗來跨節點分配 pods 的作用。計算公式如下：cpu((capacity – sum(requested)) * 10 / capacity) + memory((capacity – sum(requested)) * 10 / capacity) / 2

BalancedResourceAllocation：盡量選擇在部署 Pod 后各項資源更均衡的機器。BalancedResourceAllocation 不能單獨使用，而且必須和 LeastRequestedPriority 同時使用，它分別計算主機上的 cpu 和 memory 的比重，主機的分值由 cpu 比重和 memory 比重的“距離”決定。計算公式如下：score = 10 – abs(cpuFraction-memoryFraction)*10

SelectorSpreadPriority：對于屬于同一個 service、replication controller 的 Pod，盡量分散在不同的主機上。如果指定了區域，則會盡量把 Pod 分散在不同區域的不同主機上。調度一個 Pod 的時候，先查找 Pod 對于的 service 或者 replication controller，然后查找 service 或 replication controller 中已存在的 Pod，主機上運行的已存在的 Pod 越少，主機的打分越高。

CalculateAntiAffinityPriority：對于屬于同一個 service 的 Pod，盡量分散在不同的具有指定標簽的主機上。

ImageLocalityPriority：根據主機上是否已具備 Pod 運行的環境來打分。ImageLocalityPriority 會判斷主機上是否已存在 Pod 運行所需的鏡像，根據已有鏡像的大小返回一個 0 -10 的打分。如果主機上不存在 Pod 所需的鏡像，返回 0；如果主機上存在部分所需鏡像，則根據這些鏡像的大小來決定分值，鏡像越大，打分就越高。

NodeAffinityPriority（Kubernetes1.2 實驗中的新特性）：Kubernetes 調度中的親和性機制。Node Selectors（調度時將 pod 限定在指定節點上），支持多種操作符（In, NotIn, Exists, DoesNotExist, Gt, Lt），而不限于對節點 labels 的精確匹配。另外，Kubernetes 支持兩種類型的選擇器，一種是“hard（requiredDuringSchedulingIgnoredDuringExecution）”選擇器，它保證所選的主機必須滿足所有 Pod 對主機的規則要求。這種選擇器更像是之前的 nodeselector，在 nodeselector 的基礎上增加了更合適的表現語法。另一種是“soft（preferresDuringSchedulingIgnoredDuringExecution）”選擇器，它作為對調度器的提示，調度器會盡量但不保證滿足 NodeSelector 的所有要求。

下面是 ImageLocalityPriority 的代碼實現，其他 Priorities Policies 實現類似，都得如下函數原型：type PriorityMapFunction func(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error)

func ImageLocalityPriorityMap(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error) {node := nodeInfo.Node()
 if node == nil {return schedulerapi.HostPriority{}, fmt.Errorf(node not found)
 var sumSize int64
 for i := range pod.Spec.Containers {sumSize += checkContainerImageOnNode(node,  pod.Spec.Containers[i])
 return schedulerapi.HostPriority{
 Host: node.Name,
 Score: calculateScoreFromSize(sumSize),
 }, nil
func calculateScoreFromSize(sumSize int64) int {
 var score int
 switch {
 case sumSize == 0 || sumSize   minImgSize:
 // score == 0 means none of the images required by this pod are present on this
 // node or the total size of the images present is too small to be taken into further consideration.
 score = 0
 // If existing images  total size is larger than max, just make it highest priority.
 case sumSize  = maxImgSize:
 score = 10
 default:
 score = int((10 * (sumSize - minImgSize) / (maxImgSize - minImgSize)) + 1)
 // Return which bucket the given size belongs to
 return score
}

其計算每個 Node 的 Score 算法為：score = int((10 * (sumSize – minImgSize) / (maxImgSize – minImgSize)) + 1)

其中：minImgSize int64 = 23 * mb, maxImgSize int64 = 1000 * mb, sumSize 為 Pod 中定義的 container Images size 的總和。

可見，Node 上該 Pod 要求的容器鏡像大小之和越大，得分越高，越有可能是目標 Node。

“Predicates Policies 有什么用”的內容就介紹到這里了，感謝大家的閱讀。如果想了解更多行業相關的知識可以關注丸趣 TV 網站，丸趣 TV 小編將為大家輸出更多高質量的實用文章！

正文完