- Kubernetes基于list-watch机制的控制器架构,实现组件间交互的解耦。
- 其他组件监控自己负责的资源,当这些资源发生变化时,kube-apiserver会通知这些组件,这个过程类似于发布与订阅。
//帮助 [root@master ~]# kubectl explain deploy.spec.template.spec.containers.resources KIND: Deployment VERSION: apps/v1 RESOURCE: resourcesPod中影响调度的主要属性 资源限制对Pod调度的影响
容器资源限制:
- resources.limits.cpu
- resources.limits.memory
容器使用的最小资源需求,作为容器调度时资源分配的依据:
- resources.requests.cpu
- resources.requests.memory
CPU单位:可以写m也可以写浮点数。例如0.5=500m, 1=1000m
示例//K8s会根据Request的值去查找有足够资源的Node来调度此Pod
[root@master ~]# cat test.yml
apiVersion: v1
kind: Pod
metadata:
name: nginx
namespace: default
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
[root@master ~]# kubectl apply -f test.yml
pod/nginx created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 40s 10.244.1.55 node1
[root@master ~]# kubectl describe node node1
Name: node1
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
disk=haproxy
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"2a:ad:12:4f:11:45"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.47.120
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sat, 18 Dec 2021 02:08:33 +0800
Taints:
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime:
RenewTime: Fri, 24 Dec 2021 01:19:22 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Fri, 24 Dec 2021 01:04:42 +0800 Fri, 24 Dec 2021 01:04:42 +0800 FlannelIsUp Flannel is running on this node
MemoryPressure False Fri, 24 Dec 2021 01:14:44 +0800 Sat, 18 Dec 2021 02:08:33 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 24 Dec 2021 01:14:44 +0800 Sat, 18 Dec 2021 02:08:33 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 24 Dec 2021 01:14:44 +0800 Sat, 18 Dec 2021 02:08:33 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 24 Dec 2021 01:14:44 +0800 Sat, 18 Dec 2021 13:43:35 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.47.120
Hostname: node1
Capacity:
cpu: 2
ephemeral-storage: 17394Mi
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1843864Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 16415037823
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1741464Ki
pods: 110
System Info:
Machine ID: b1c77fc62d3a44bfa60bfc8a24ad7c9f
System UUID: d6114d56-cbc6-b63c-2f90-808558da550e
Boot ID: eae7e22f-84f3-4b1e-bb44-110cb2f9ef7d
Kernel Version: 4.18.0-193.el8.x86_64
OS Image: Red Hat Enterprise Linux 8.2 (Ootpa)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://20.10.12
Kubelet Version: v1.20.0
Kube-Proxy Version: v1.20.0
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default nginx 250m (12%) 500m (25%) 64Mi (3%) 128Mi (7%) 69s
kube-system kube-flannel-ds-v7chx 100m (5%) 100m (5%) 50Mi (2%) 50Mi (2%) 5d23h
kube-system kube-proxy-slkcg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d23h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 350m (17%) 600m (30%)
memory 114Mi (6%) 178Mi (10%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 15m kubelet Starting kubelet.
Normal NodeAllocatableEnforced 15m kubelet Updated Node Allocatable limit across pods
Normal NodeHasSufficientMemory 15m (x5 over 15m) kubelet Node node1 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 15m (x5 over 15m) kubelet Node node1 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 15m (x5 over 15m) kubelet Node node1 status is now: NodeHasSufficientPID
Warning Rebooted 14m kubelet Node node1 has been rebooted, boot id: eae7e22f-84f3-4b1e-bb44-110cb2f9ef7d
Normal Starting 14m kube-proxy Starting kube-proxy.
nodeSelector & nodeAffinity
nodeSelector:用于将Pod调度到匹配Label的Node上,如果没有匹配的标签会调度失败。
作用:
- 约束Pod到特定的节点运行·完全匹配节点标签
应用场景:
- 专用节点:根据业务线将Node分组管理
- 配备特殊硬件:部分Node配有SSD硬盘、GPU
示例:确保Pod分配到具有SSD硬盘的节点上
格式: kubectl label nodes示例= 例如: kubectl label nodes node2 app=nginx 验证: kubectl get nodes node2 --show-labels 删除: kubectl label nodes node2 app- 验证: kubectl get pod -o wide 12345
调度成功案例
[root@master ~]# kubectl get nodes node2 --show-labels NAME STATUS ROLES AGE VERSION LABELS node2 Ready5d23h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux [root@master ~]# vi test.yml [root@master ~]# cat test.yml apiVersion: v1 kind: Pod metadata: name: nginx namespace: default spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: app: nginx [root@master ~]# kubectl apply -f test.yml pod/nginx created [root@master ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 4s 10.244.2.50 node2
调度失败案例
//取消标签 [root@master ~]# kubectl delete -f test.yml pod "nginx" deleted [root@master ~]# kubectl label nodes node2 app- node/node2 labeled [root@master ~]# kubectl get nodes node2 --show-labels NAME STATUS ROLES AGE VERSION LABELS node2 Ready5d23h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux //这种情况属于等待(也就是说等待某个节点中有app=nginx,一直等,等到那个节点有就给哪个节点) [root@master ~]# kubectl apply -f test.yml pod/nginx created [root@master ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 0/1 Pending 0 5s //我现在给node2上加标签 [root@master ~]# kubectl get nodes node2 --show-labels NAME STATUS ROLES AGE VERSION LABELS node2 Ready 5d23h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux [root@master ~]# kubectl label nodes node2 app=nginx node/node2 labeled //自动创建 [root@master ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 71s 10.244.2.51 node2
nodeAffinity:节点亲和性,与nodeSelector作用一样。但相比更灵活,满足更多条件
- 匹配有更多的逻辑组合,不只是字符串的完全相等
- 调度分为软策略和硬策略,而不是硬性要求
- 硬(required):必须满足
- 软(preferred):尝试满足,但不保证
- 操作符:ln、NotIn、Exists、DoesNotExist、Gt.Lt
//帮助 [root@master ~]# kubectl explain pod.spec.affinity.nodeAffinity KIND: Pod VERSION: v1 RESOURCE: nodeAffinity示例
第一种(只会在node1)
node1 打两个标签(app=nginx gpu=nvdia)
node2 打一个标签(app=nginx)
- required:必须满足
- preferred:尝试满足,但不保证
//node1 打两个标签(app=nginx gpu=nvdia) [root@master ~]# kubectl label nodes node1 app=nginx gpu=nvdia node/node1 labeled //node2 打一个标签(app=nginx) [root@master ~]# kubectl label nodes node2 app=nginx node/node2 labeled //查看 [root@master ~]# kubectl get nodes node1 node2 --show-labels NAME STATUS ROLES AGE VERSION LABELS node1 Ready5d23h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disk=haproxy,gpu=nvdia,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux node2 Ready 5d23h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux [root@master ~]# cat test.yml apiVersion: v1 kind: Pod metadata: name: test namespace: default spec: containers: - name: b1 image: busybox imagePullPolicy: IfNotPresent command: ["bin/sh","-c","sleep 45"] affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchexpressions: - key: app operator: In values: - nginx preferredDuringSchedulingIgnoredDuringExecution: - weight: 3 preference: matchexpressions: - key: gpu operator: In values: - nvdia [root@master ~]# kubectl apply -f test.yml pod/test created [root@master ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Running 0 18s 10.244.1.57 node1
第二种(遵循默认规则,公平竞争)
node1 打一个标签(app=nginx )
node2 打一个标签(app=nginx)
- required:必须满足
- preferred:尝试满足,但不保证
//node1 打一个标签(app=nginx) [root@master ~]# kubectl label nodes node1 app=nginx node/node1 labeled //node2 打一个标签(app=nginx) [root@master ~]# kubectl label nodes node2 app=nginx node/node2 labeled [root@master ~]# kubectl get nodes node1 node2 --show-labels NAME STATUS ROLES AGE VERSION LABELS node1 ReadyTaint(污点)& Tolerations(污点容忍)5d5h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux node2 Ready 5d5h v1.20.0 app=nginx,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux [root@master ~]# cat yy.yml apiVersion: v1 kind: Pod metadata: name: test namespace: default spec: containers: - name: b1 image: busybox imagePullPolicy: IfNotPresent command: ["bin/sh","-c","sleep 45"] affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchexpressions: - key: app operator: In values: - nginx preferredDuringSchedulingIgnoredDuringExecution: - weight: 3 preference: matchexpressions: - key: gpu operator: In values: - nvdia [root@master ~]# kubectl delete -f test.yml pod "test" deleted [root@master ~]# kubectl apply -f test.yml pod/test created [root@master ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Running 0 5s 10.244.1.58 node1
Taints: 避免Pod调度到特定Node上
TolerationsI: 允许Pod调度到持有Taints的Node上
应用场景:
- 专用节点:根据业务线将Node分组管理,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
- 配备特殊硬件:部分Node配有SSD硬盘、GPU,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
- 基于Taint的驱逐
给节点添加污点
格式:kubectl taint node [node] key=value:[effect] 例如:kubectl taint node node1 gpu=yes:NoSchedule 验证:kubectl describe node node1 lgrep Taint 去掉污点:kubectl taint node [node] key:[effect]- //查看污点 [root@master ~]# kubectl describe node node1 node2 master | grep -i taint Taints:Taints: Taints: node-role.kubernetes.io/master:NoSchedule
其中[effect]可取值
- NoSchedule :一定不能被调度
- PreferNoSchedule:尽量不要调度,非必须配置容忍
- NoExecute:不仅不会调度,还会驱逐Node上已有的Pod
添加污点容忍(tolrations)字段到Pod配置中
apiVersion: v1
kind: Pod
metadata:
name: pod-taints
spec:
containers:
- name: pod-taints
image: busybox:latest
tolerations:
- key: "gpu"
operator: "Equal"
value: "yes"
effect: "NoSchedule"
示例
第一种(NoSchedule)
- 不能被调度
//给node1加污点
[root@master ~]# kubectl taint node node1 node1:NoSchedule
node/node1 tainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints: node1:NoSchedule
[root@master ~]# cat test.yml
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: default
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
[root@master ~]# kubectl apply -f test.yml
pod/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 0 11s 10.244.2.52 node2
//清除污点
[root@master ~]# kubectl taint node node1 node1:NoSchedule-
node/node1 untainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints:
第二种(PreferNoSchedule)
- 尽量不要调度,也有可能调度
[root@master ~]# kubectl taint node node1 node1:PreferNoSchedule
node/node1 tainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints: node1:PreferNoSchedule
[root@master ~]# vi test.yml
[root@master ~]# kubectl delete -f test.yml
pod "test" deleted
[root@master ~]# cat test.yml
apiVersion: v1
kind: Pod
metadata:
name: test
namespace: default
spec:
containers:
- name: b1
image: busybox
imagePullPolicy: IfNotPresent
command: ["bin/sh","-c","sleep 45"]
[root@master ~]# kubectl apply -f test.yml
pod/test created
[root@master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 11m 10.244.2.51 node2
test 1/1 Running 0 14s 10.244.2.53 node2
[root@master ~]# kubectl taint node node1 node1:PreferNoSchedule-
node/node1 untainted
[root@master ~]# kubectl describe node node1 | grep -i taint
Taints:
第三种(NoExecute)
- 驱逐
- 不仅不会调度,还会驱逐Node上已有的Pod
[root@master ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-d6cbf6c57-btwf7 1/1 Running 1 80s 10.244.2.55 node2web1-99dd54ccd-c5bff 1/1 Running 2 45h 10.244.2.49 node2 web2-d9c9695cf-4gw66 1/1 Running 2 45h 10.244.2.48 node2 //给node2 添加污点后 [root@master ~]# kubectl taint node node2 node2:NoExecute node/node2 tainted [root@master ~]# kubectl describe node node2 | grep -i taint Taints: node2:NoExecute //此时在node2上 [root@master ~]# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-d6cbf6c57-pdhwh 1/1 Running 1 53s 10.244.1.62 node1 web1-99dd54ccd-dmlgx 1/1 Running 0 53s 10.244.1.60 node1 web2-d9c9695cf-pl9kd 1/1 Running 0 53s 10.244.1.61 node1



