- nodeSelector(节点选择器)
- nodeAffinity(节点亲和性)
- Taint(污点)与Toleration(污点容忍)
nodeSelector: 用于将Pod调度到匹配label的Node上,如果没有匹配的标签会调度失败
作用:
- 约束Pod到特定的节点运行
- 完全匹配节点标签
应用场景:
- 专用节点:根据业务将Node分组管理
- 配备特殊硬件:部分Node配有SSD硬盘、GPU
给节点打标签语法格式:
格式:kubectl label nodes= 验证:kubectl get nodes --show-labels 删除:kubectl label nodes -
举例演示使用nodeSelector
//给node1节点打标签 [root@master manifest]# kubectl label nodes node1.example.com disktype=ssd //查看标签 [root@master manifest]# kubectl get nodes --show-labels| grep disktype node1.example.com ReadynodeAffinity(节点亲和性)22h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1.example.com,kubernetes.io/os=linux //资源清单文件 [root@master manifest]# cat test.yml --- apiVersion: v1 kind: Pod metadata: name: test spec: nodeSelector: disktype: ssd containers: - name: test1 image: busybox imagePullPolicy: IfNotPresent command: ["/bin/sh","-c","sleep 10000"] //测试 [root@master manifest]# kubectl apply -f test.yml pod/test created [root@master manifest]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Running 0 27s 10.244.1.2 node1.example.com //删除标签disktype [root@master manifest]# kubectl label nodes node1.example.com disktype- [root@master manifest]# kubectl get nodes --show-labels| grep node1 node1.example.com Ready 23h v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1.example.com,kubernetes.io/os=linux //已经没有之前的标签 //删除之前创建的pod,再次创建pod,看删除标签后pod能否创建成功,并且在哪台节点上 [root@master manifest]# kubectl apply -f test.yml pod/test created [root@master manifest]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 0/1 Pending 0 55s //可以看到目前pod的状态是Pending状态,等待状态,没有匹配到标签,调度失败,如果此时给节点加上标签,就会调度成功 [root@master manifest]# kubectl label nodes node1.example.com disktype=ssd node/node1.example.com labeled [root@master manifest]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Running 0 2m54s 10.244.1.3 node1.example.com //加完标完以后状态已处于Running
nodeAffinity: 节点亲和性,与nodeSelector作用一样,但相比更灵活,满足更多条件,诸如:
- 匹配有更多的逻辑组合,不只是字符串的完全相等
- 调度分为软策略和硬策略,而不是硬性要求
- 硬(required):必须满足
- 软(preferred):尝试满足,但不保证
操作符:In、NotIn、Exists、DoesNotExist、Gt、Lt
测试使用nodeAffinity
//给节点添加标签
[root@master manifest]# kubectl label nodes node1.example.com disktype=ssd gpu=nvdia
node/node1.example.com labeled
[root@master manifest]# kubectl label nodes node2.example.com disktype=ssd
node/node2.example.com labeled
//资源清单文件
[root@master manifest]# cat test.yml
---
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: //硬要求
nodeSelectorTerms:
- matchexpressions:
- key: disktype
operator: In
values:
- ssd
preferredDuringSchedulingIgnoredDuringExecution: //软要求
- weight: 3
preference:
matchexpressions:
- key: gpu
operator: In
values:
- nvdia
containers:
- name: test1
image: busybox
imagePullPolicy: IfNotPresent
command: ["/bin/sh","-c","sleep 10000"]
//在node1和node2上同样拥有标签disktype的情况下,但是node1比node2多了软要求标签gpu,看下创建pod会选择哪台
[root@master manifest]# kubectl apply -f test.yml
pod/test created
[root@master manifest]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 0 11s 10.244.1.4 node1.example.com
//如果删除node1上的gpu标签,node1和node2拥有同样的标签,再次创建pod时就会恢复系统默认的调度原则
Taint(污点)与Toleration(污点容忍)
Taints: 避免Pod调度到特定的Node上
Toleration: 允许Pod调度到持有Taint的Node上
应用场景:
- 专用节点:根据业务线将Node分组管理,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
- 配备特殊硬件:部分Node配有ssd硬盘、GPU,希望在默认情况下不调度该节点,只有配置了污点容忍才允许分配
- 基于Taint的驱逐
查看污点:
master主机默认是配有Taint的
[root@master manifest]# kubectl describe node master.example.com |grep -i taint Taints: node-role.kubernetes.io/master:NoSchedule //不允许调度 //节点主机是默认没有taint的 [root@master manifest]# kubectl describe node node1.example.com |grep -i taint Taints:[root@master manifest]# kubectl describe node node2.example.com |grep -i taint Taints:
节点添加污点格式
格式:kubectl taint node [node-name] key=value:[effect] 验证:kubectl describe node [node-name] |grep -i taint 去掉污点:kubectl taint node [node-name] key:[effect]- 其中[effect]可取值: NoSchedule:一定不能被调度 PreferNoSchedule:尽量不要调度,非必须配置容忍 NoExecute:不仅不会调度,还会驱逐Node上已有的Pod
测试使用污点
//给node1添加污点 [root@master manifest]# kubectl taint node node1.example.com node1:NoSchedule node/node1.example.com tainted [root@master manifest]# kubectl describe node node1.example.com |grep -i taint Taints: node1:NoSchedule //创建pods [root@master manifest]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Running 0 16s 10.244.2.9 node2.example.com//因为node1上有污点,新创建的pod一定在其他节点运行 //修改污点为PreferNoSchedule [root@master manifest]# kubectl taint node node1.example.com node1- node/node1.example.com untainted [root@master manifest]# kubectl taint node node1.example.com node1:PreferNoSchedule node/node1.example.com tainted [root@master manifest]# kubectl describe node node1.example.com |grep -i taint Taints: node1:PreferNoSchedule //创建新的pod测试在哪台节点主机 [root@master manifest]# kubectl apply -f test.yml pod/test created [root@master manifest]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Running 0 3s 10.244.2.10 node2.example.com //尽量不要调度到node1节点,除非其他节点主机实在是不能被调度才考虑 //测试污点NoExecute:不仅不会调度,还会驱逐Node上已有的Pod //提前运行一个pod [root@master manifest]# kubectl apply -f test.yml pod/test created [root@master manifest]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Running 0 4s 10.244.2.11 node2.example.com //为node2添加污点类型NoExecute [root@master manifest]# kubectl taint node node2.example.com node2:NoExecute node/node2.example.com tainted [root@master manifest]# kubectl describe node node2.example.com |grep -i taint Taints: node2:NoExecute //查看pod状态,已处于Terminating,正在驱逐,稍后就会停止被杀掉(这里使用的自主式pod,驱逐就被杀掉了,如果使用的deployment类型的pod,驱逐pod以后会在其他节点上运行一个新的) [root@master manifest]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test 1/1 Terminating 0 2m40s 10.244.2.11 node2.example.com [root@master manifest]# kubectl get pods -o wide No resources found in default namespace.
测试使用Toleration
//给node1节点添加污点
[root@master manifest]# kubectl taint node node1.example.com node1=yes:NoSchedule
node/node1.example.com tainted
[root@master manifest]# kubectl describe node node1.example.com |grep -i taint
Taints: node1=yes:NoSchedule
//添加Toleration
[root@master manifest]# cat test.yml
---
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- name: test1
image: busybox
imagePullPolicy: IfNotPresent
command: ["/bin/sh","-c","sleep 10000"]
tolerations:
- key: "node1"
operator: "Equal"
value: "yes"
effect: "NoSchedule"
//运行pod测试,添加了Toleration以后是否可以在node1上运行
[root@master manifest]# kubectl apply -f test.yml
pod/test created
[root@master manifest]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test 1/1 Running 0 2s 10.244.1.5 node1.example.com



