现象1:一直有一个节点未准备
[root@master ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master Ready master 60m v1.17.0 node1 NotReady30m v1.17.0 node2 Ready 29m v1.17.0
现象2:有一个flannel显示ImagePullBackOff
[root@master ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-9d85f5447-r2dx6 1/1 Running 0 64m coredns-9d85f5447-zskjc 1/1 Running 0 64m etcd-master 1/1 Running 0 64m kube-apiserver-master 1/1 Running 0 64m kube-controller-manager-master 1/1 Running 0 64m kube-flannel-ds-7bknh 1/1 Running 0 33m kube-flannel-ds-9xwsr 0/1 Init:ImagePullBackOff 1 35m kube-flannel-ds-tspl2 1/1 Running 0 44m kube-proxy-ggd7p 1/1 Running 1 35m kube-proxy-m8ljk 1/1 Running 0 64m kube-proxy-xrt7c 1/1 Running 0 33m kube-scheduler-master 1/1 Running 0 64m
现象3:查看kube-flannel-ds-9xwsr 发现是pull镜像超时
[root@master ~]# kubectl describe pod -n kube-system kube-flannel-ds-9xwsr Name: kube-flannel-ds-9xwsr Namespace: kube-system Priority: 2000001000 . . Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 46m default-scheduler Successfully assigned kube-system/kube-flannel-ds-9xwsr to node1 Normal Pulling 46m kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0" Normal Pulled 46m kubelet, node1 Successfully pulled image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0" Normal Created 46m kubelet, node1 Created container install-cni-plugin Normal Started 46m kubelet, node1 Started container install-cni-plugin Normal Pulling 46m kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1" Normal SandboxChanged 17m kubelet, node1 Pod sandbox changed, it will be killed and re-created. Normal Started 17m kubelet, node1 Started container install-cni-plugin Normal Created 17m kubelet, node1 Created container install-cni-plugin Normal Pulled 17m kubelet, node1 Container image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0" already present on machine Normal Pulling 10m (x4 over 17m) kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1" Warning Failed 9m23s (x4 over 15m) kubelet, node1 Error: ErrImagePull Warning Failed 9m11s (x5 over 15m) kubelet, node1 Error: ImagePullBackOff Warning Failed 6m24s (x5 over 15m) kubelet, node1 Failed to pull image "rancher/mirrored-flannelcni-flannel:v0.16.1": rpc error: code = Unknown desc = context canceled Normal BackOff 112s (x23 over 15m) kubelet, node1 Back-off pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1"解决过程: 在故障节点尝试的操作:
1、重启故障节点(未能解决)
2、尝试启动停止的容器,发现启动不了
3、重启daemon和docker,未能解决
4、停止运行中的容器,并删除未启动的容器,故障解决
具体操作过程如下所示#查看容器运行状态(此处是为了方便和重启后做对比,此处有6个容器)
[root@node1 ~]# docker ps -a ConTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 2 minutes ago Up 2 minutes k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 2 minutes ago Up 2 minutes k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 12326131af76 cd5235cd7dc2 "cp -f /flannel /opt…" 9 minutes ago Exited (0) 2 minutes ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 2f7b7ceec68e 7d54289267dc "/usr/local/bin/kube…" 10 minutes ago Exited (2) 2 minutes ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_1 268dc494222a registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 minutes ago Up 9 minutes k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 b1ac093e353f registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 minutes ago Exited (0) 2 minutes ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907
#重新加载daemon 和重启docker服务
#重新加载daemon [root@node1 ~]# systemctl daemon-reload #重启docker服务 [root@node1 ~]# systemctl restart docker #再次查看容器运行状态,发现多了2个Exited状态的容器和1个Created状态的容器总计有9个容器 [root@node1 ~]# docker ps -a ConTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6a742f121b59 cd5235cd7dc2 "cp -f /flannel /opt…" 10 seconds ago Exited (0) 9 seconds ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_0 36e69e7a3dcd 7d54289267dc "/usr/local/bin/kube…" 10 seconds ago Up 9 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 95b0547dbf88 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 seconds ago Up 10 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 cd90ee8a56cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 seconds ago Up 9 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_3 b1237af848cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 11 seconds ago Created k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_2 b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 16 minutes ago Exited (2) 11 seconds ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 16 minutes ago Exited (0) 11 seconds ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 268dc494222a registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 23 minutes ago Exited (0) 11 seconds ago k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 b1ac093e353f registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 23 minutes ago Exited (0) 16 minutes ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d690
#过了一段时间之后发现,刚刚新增的3个容器消失了,有变回了6个
[root@node1 ~]# docker ps -a ConTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6a742f121b59 cd5235cd7dc2 "cp -f /flannel /opt…" 50 seconds ago Exited (0) 49 seconds ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_0 36e69e7a3dcd 7d54289267dc "/usr/local/bin/kube…" 50 seconds ago Up 49 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 95b0547dbf88 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 50 seconds ago Up 49 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3 cd90ee8a56cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 50 seconds ago Up 49 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_3 b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 16 minutes ago Exited (2) 51 seconds ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2 46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 16 minutes ago Exited (0) 51 seconds ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d690
##停止运行中的容器
[root@node1 ~]# docker stop 36e69e7a3dcd 95b0547dbf88 cd90ee8a56cf 36e69e7a3dcd 95b0547dbf88 cd90ee8a56cf
#删除所有容器,发现有4个容器无法删除提示在运行
[root@node1 ~]# docker container rm $(docker ps -qa) 11aaee411eb8 e2efad2fb393 a7ad16a4f86e 36e69e7a3dcd 95b0547dbf88 cd90ee8a56cf Error response from daemon: You cannot remove a running container 79d354753c5d143c1b2bd95d1aa52ca48fa861e530da967c86b6537e52895647. Stop the container before attempting removal or force remove Error response from daemon: You cannot remove a running container 74f74b6b43e5dc02110aec24d38489412c4833e18e4ee860d8019bfcdae4aad8. Stop the container before attempting removal or force remove Error response from daemon: You cannot remove a running container 30f4881c1809d107a2bf717c45a780a9b04ae8cc9f40278723a74686ef3f72f2. Stop the container before attempting removal or force remove Error response from daemon: You cannot remove a running container 9a82cbc9374c73288bc0e3bc8205c3c1d298b4409483c38a8d2edcf7682100ec. Stop the container before attempting removal or force remove
#再次查看,发现确实有4个容器在运行,而且是全新运行的容器
[root@node1 ~]# docker ps -a ConTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 79d354753c5d 7d54289267dc "/usr/local/bin/kube…" 26 seconds ago Up 26 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_4 74f74b6b43e5 404fc3ab6749 "/opt/bin/flanneld -…" 39 seconds ago Up 38 seconds k8s_kube-flannel_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1 30f4881c1809 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 41 seconds ago Up 40 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_4 9a82cbc9374c registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 42 seconds ago Up 41 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-4
#回到Master节点查看,发现故障节点已经恢复,且准备完毕
[root@master ~]# kubectl get nodes NAME STATUS ROLES AGE VERSION master Ready master 96m v1.17.0 node1 Ready67m v1.17.0 node2 Ready 65m v1.17.0
#kube-flannel-ds-9xwsr 故障的flannel也恢复了
[root@master ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE coredns-9d85f5447-r2dx6 1/1 Running 0 96m coredns-9d85f5447-zskjc 1/1 Running 0 96m etcd-master 1/1 Running 0 96m kube-apiserver-master 1/1 Running 0 96m kube-controller-manager-master 1/1 Running 0 96m kube-flannel-ds-7bknh 1/1 Running 0 65m kube-flannel-ds-9xwsr 1/1 Running 1 67m kube-flannel-ds-tspl2 1/1 Running 0 76m kube-proxy-ggd7p 1/1 Running 4 67m kube-proxy-m8ljk 1/1 Running 0 96m kube-proxy-xrt7c 1/1 Running 0 65m kube-scheduler-master 1/1 Running 0 96m



