资源调度（nodeSelector、nodeAffinity、taint、Tolrations）

文章目录

资源调度（nodeSelector、nodeAffinity、taint、Tolrations）
- - 1.nodeSelector
  - 2.nodeAffinity
  - 3.Taint(污点)与Tolerations(污点容忍)

资源调度（nodeSelector、nodeAffinity、taint、Tolrations） 1.nodeSelector

nodeSelector是最简单的约束方式。nodeSelector是pod.spec的一个字段

通过--show-labels可以查看指定node的labels

[root@master haproxy]# kubectl get node node1 --show-labels
NAME    STATUS   ROLES    AGE     VERSION   LABELS
node1   Ready       4d12h   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
[root@master haproxy]#

如果没有额外添加 nodes labels，那么看到的如上所示的默认标签。我们可以通过 kubectl label node 命令给指定 node 添加 labels：

[root@master haproxy]# kubectl get node node1 --show-labels   //这下就可以查看到
NAME    STATUS   ROLES    AGE     VERSION   LABELS
node1   Ready       4d12h   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
[root@master haproxy]#

也可以通过 kubectl label node 删除指定的 labels

[root@master haproxy]# kubectl label node node1 disktype-
node/node1 labeled
[root@master haproxy]# kubectl get node node1 --show-labels
NAME    STATUS   ROLES    AGE     VERSION   LABELS
node1   Ready       4d12h   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
[root@master haproxy]#

创建一个pod并指定nodeSelector选项绑定节点：

[root@master haproxy]# kubectl label node node1 disktype=ssd
node/node1 labeled
[root@master haproxy]# kubectl get node node1 --show-labels
NAME    STATUS   ROLES    AGE     VERSION   LABELS
node1   Ready       4d12h   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
[root@master haproxy]#

[root@master haproxy]# cat test.yml 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: httpd2
  name: httpd2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd2
  template:
    metadata:
      labels:
        app: httpd2
    spec:
      containers:
      - image: 3199560936/httpd:v0.4
        name: httpd2
---
apiVersion: v1
kind: Service
metadata:
  name: httpd2
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: httpd2

[root@master haproxy]# kubectl create -f test.yml 
deployment.apps/httpd2 created
service/httpd2 created
[root@master haproxy]#

查看pod调度的节点

[root@master haproxy]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
httpd2-fd86fb676-l7gk4   1/1     Running   0          61s   10.244.1.45   node1              
[root@master haproxy]#

可以看出来创建的pod强制调度到disktype=ssd这个labes的node上了

2.nodeAffinity

nodeAffinity意为node亲和性调度策略。是用于替换nodeSelector的全新调度策略。目前有两种节点亲和性表达：

RequiredDuringSchedulingIgnoredDuringExecution:
必须满足制定的规则才可以调度pode到Node上。相当于硬限制
PreferredDuringSchedulingIgnoreDuringExecution:
强调优先满足制定规则，调度器会尝试调度pod到Node上，但并不强求，相当于软限制。多个优先级规则还可以设置权重值，以定义执行的先后顺序。
IgnoredDuringExecution的意思是：

如果一个pod所在的节点在pod运行期间标签发生了变更，不在符合该pod的节点亲和性需求，则系统将忽略node上lable的变化，该pod能机选在该节点运行。

NodeAffinity 语法支持的操作符包括：

In：label 的值在某个列表中
NotIn：label 的值不在某个列表中
Exists：某个 label 存在
DoesNotExit：某个 label 不存在
Gt：label 的值大于某个值
Lt：label 的值小于某个值

nodeAffinity规则设置的注意事项如下：

如果同时定义了nodeSelector和nodeAffinity，name必须两个条件都得到满足，pod才能最终运行在指定的node上。
如果nodeAffinity指定了多个nodeSelectorTerms，那么其中一个能够匹配成功即可。
如果在nodeSelectorTerms中有多个matchexpressions，则一个节点必须满足所有matchexpressions才能运行该pod。

[root@master haproxy]# cat test.yml 
apiVersion: v1
kind: Pod
metadata:
  name: test1
  labels:
    app: nginx
spec:
  containers:
  - name: test1
    image: nginx
    imagePullPolicy: IfNotPresent
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchexpressions:
          - key: disktype
            values:
            - ssd
            operator: In
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 10
        preference:
          matchexpressions:
          - key: name
            values:
            - test
            operator: In 
[root@master haproxy]#

给node2主机也打上disktype=ssd的标签

[root@master haproxy]# kubectl label node node2 disktype=ssd
node/node2 labeled
[root@master haproxy]# kubectl get node node2 --show-labels
NAME    STATUS   ROLES    AGE     VERSION   LABELS
node2   Ready       4d12h   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node2,kubernetes.io/os=linux
[root@master haproxy]#

测试

给node1打上name=sb的标签

[root@master ~]# kubectl label node node1 name=sb
node/node1 labeled
[root@master ~]# kubectl get node node1 --show-labels
NAME    STATUS   ROLES    AGE     VERSION   LABELS
node1   Ready       4d12h   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux,name=sb
[root@master ~]#

创建pod查看结果

[root@master haproxy]# cat httpd.yml 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: httpd2
  name: httpd2
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd2
  template:
    metadata:
      labels:
        app: httpd2
    spec:
      containers:
      - image: 3199560936/httpd:v0.4
        name: httpd2
---
apiVersion: v1
kind: Service
metadata:
  name: httpd2
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: httpd2
[root@master haproxy]# 
[root@master haproxy]# kubectl apply -f httpd.yml 
deployment.apps/httpd2 created
service/httpd2 created
[root@master haproxy]#

[root@master haproxy]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
httpd2-fd86fb676-b8pqx   1/1     Running   0          13s   10.244.1.46   node1              
[root@master haproxy]#

删除name=sb并测试查看结果

[root@master haproxy]# kubectl label node node1 name-
node/node1 labeled
[root@master haproxy]# kubectl get node node1 --show-labels
NAME    STATUS   ROLES    AGE     VERSION   LABELS
node1   Ready       4d12h   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=node1,kubernetes.io/os=linux
[root@master haproxy]#

[root@master haproxy]# kubectl apply -f haproxy.yml 
deployment.apps/haproxy created
service/haproxy created
[root@master haproxy]#  kubectl get pod -o wide
NAME                       READY   STATUS              RESTARTS   AGE   IP            NODE    NOMINATED NODE   READINESS GATES
haproxy-74f8f5c6cf-6pf9w   0/1     ContainerCreating   0          8s            node2              
httpd2-fd86fb676-xggxk     1/1     Running             0          65s   10.244.1.47   node1              
[root@master haproxy]#

上面这个pod首先是要求要运行在有disktype=ssd这个标签的node上，如果有多个node上都有这个标签，则优先在有name=sb这个标签上创建

3.Taint(污点)与Tolerations(污点容忍)

Taints:避免Pod调度到特定Node上
Tolerations:允许Pod调度到持有Taints的Node上
应用场景:

专用节点:根据业务线将Node分组管理，希望在默认情况下不调度该节点，只有配置了污点容忍才允许分配

配备特殊硬件:部分Node配有SSD硬盘、GPU，希望在默认情况下不调度该节点，只有配置了污点容忍才允许分配
基于Taint的驱逐

effect说明

上面的例子中effect的取值为NoSchedule，下面对effect的值作下简单说明：

NoSchedule：如果一个pod没有声明容忍这个Taint，则系统不会把该Pod调度到有这个Taint的node上
PreferNoSchedule：NoSchedule的软限制版本，如果一个Pod没有声明容忍这个Taint，则系统会尽量避免把这个pod调度到这一节点上去，但不是强制的。
NoExecute：定义pod的驱逐行为，以应对节点故障。NoExecute这个Taint效果对节点上正在运行的pod有以下影响：
没有设置Toleration的Pod会被立刻驱逐
配置了对应Toleration的pod，如果没有为tolerationSeconds赋值，则会一直留在这一节点中
配置了对应Toleration的pod且指定了tolerationSeconds值，则会在指定时间后驱逐
从kubernetes 1.6版本开始引入了一个alpha版本的功能，即把节点故障标记为Taint（目前只针对node unreachable及node not ready，相应的NodeCondition "Ready"的值为Unknown和False）。激活TaintbasedEvictions功能后（在–feature-gates参数中加入TaintbasedEvictions=true），NodeController会自动为Node设置Taint，而状态为"Ready"的Node上之前设置过的普通驱逐逻辑将会被禁用。注意，在节点故障情况下，为了保持现存的pod驱逐的限速设置，系统将会以限速的模式逐步给node设置Taint，这就能防止在一些特定情况下（比如master暂时失联）造成的大量pod被驱逐的后果。这一功能兼容于tolerationSeconds，允许pod定义节点故障时持续多久才被逐出。

Taint

[root@master ~]# kubectl describe node master
略。。。。。。
detach: true
CreationTimestamp:  Sat, 18 Dec 2021 22:07:52 -0500
Taints:             node-role.kubernetes.io/master:NoSchedule     //避免pod调度到特定的node上
Unschedulable:      false

Tolerations（污点容忍）

[root@master ~]# kubectl describe pod httpd2-fd86fb676-xnrcc
略。。。。。。
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s    //允许pod调度到已有的Taints的node上
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  57s   default-scheduler  Successfully assigned default/httpd2-fd86fb676-xnrcc to node1
  Normal  Pulled     56s   kubelet            Container image "3199560936/httpd:v0.4" already present on machine
  Normal  Created    56s   kubelet            Created container httpd2
  Normal  Started    56s   kubelet            Started container httpd2
[root@master ~]#

节点添加五点

格式: kubectl taint node [node] key=value:[effect]

其中[effect]可取值:

NoSchedule :一定不能被调度
PreferNoSchedule:尽量不要调度，非必须配置容忍
NoExecute:不仅不会调度，还会驱逐Node上已有的Pod

添加污点容忍(tolrations)字段到Pod配置中

添加污点disktype

[root@master ~]# kubectl taint node node1 disktype:NoSchedule
node/node1 tainted
[root@master ~]#

查看

[root@master ~]#  kubectl describe node node1
略。。。。。。
detach: true
CreationTimestamp:  Sat, 18 Dec 2021 22:10:36 -0500
Taints:             disktype:NoSchedule    //这里可以看到添加成功
Unschedulable:      false
略。。。。。。
[root@master ~]#

创建一个容器进行测试

[root@master haproxy]# cat haproxy.yml 
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: haproxy
  namespace: default
spec: 
  replicas: 1
  selector:
    matchLabels:
      app: haproxy
  template:
    metadata:
      labels:
        app: haproxy
    spec:
      containers:
      - image: 93quan/haproxy:v1-alpine
        imagePullPolicy: Always
        env: 
        - name: RSIP
          value: "10.106.56.19 10.96.149.182"
        name: haproxy
        ports:
        - containerPort: 80
          hostPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: haproxy
  namespace: default
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: haproxy
  type: NodePort
[root@master haproxy]#  
[root@master haproxy]# kubectl create -f haprxoy.yml 
deployment.apps/haproxy created
service/haproxy created

查看

[root@master haproxy]# kubectl get pods -o wide  #在这里可以发现node1上面有污点，所以创建的容器会在node2上面出现
NAME                       READY   STATUS              RESTARTS   AGE     IP            NODE    NOMINATED NODE   READINESS GATES
haproxy-74f8f5c6cf-k8867   0/1     ContainerCreating   0          2m47s           node2              
httpd2-fd86fb676-xnrcc     1/1     Running             0          17m     10.244.1.44   node1              
[root@master haproxy]#

去除污点

语法：kubectl taint node [node] key:[effect]-

[root@master haproxy]# kubectl taint node node1 disktype-
node/node1 untainted
[root@master haproxy]#

查看

[root@master haproxy]# kubectl describe node node1
略。。。。。。
detach: true
CreationTimestamp:  Sat, 18 Dec 2021 22:10:36 -0500
Taints:                  //可以看见去除污点成功
Unschedulable:      false

资源调度（nodeSelector、nodeAffinity、taint、Tolrations）

Linux相关栏目本月热门文章