K8S部署服务报UnknownHostException错误处理过程

一、背景

K8S部署一个普通的springboot服务，服务内访问百度（声明：仅用来做测试非爬虫，实际是有个其他网址报同样的错误）报：java.net.UnknownHostException: www.baidu.com: Temporary failure in name resolution

1.1 请求测试代码如下：

就是一个普通的http请求。

1.2 报错信息如下：

二、排查思路 2.1 开始的时候，也是看网上其他的解决方案，比如：

1、修改/etc/resolv.conf 加 nameserver 8.8.8.8；
2、修改/etc/sysconfig/network-scripts/ifcfg-eth0加域名；
3、修改hosts文件加ip和域名对应关系
以上经过测试都不好使不好使不好使（反正在我这边是不好使），本质不是由于这个导致的，因为再服务器上curl访问百度是ok的。故本机网络无异常；

2.2 最终选择还是排查k8s环境：

1、查看kube-system命名空间下的pod的状态
如下，发现coredns-xxx的pod并没有ready

[root@nb001 ~]# kubectl get pods -n kube-system
NAME                            READY   STATUS             RESTARTS   AGE
coredns-7d75679df-5j2xz         0/1     Running            5520       17d
coredns-7d75679df-zz729         0/1     CrashLoopBackOff   5517       17d
etcd-nb001                      1/1     Running            0          17d
kube-apiserver-nb001            1/1     Running            0          17d
kube-controller-manager-nb001   1/1     Running            0          17d
kube-flannel-ds-68hv5           1/1     Running            0          17d
kube-flannel-ds-k882f           1/1     Running            0          17d
kube-proxy-dnzpp                1/1     Running            0          17d
kube-proxy-lqdf7                1/1     Running            0          17d
kube-scheduler-nb001            1/1     Running            0          17d
[root@nb001 ~]#

2、查看coredns-7d75679df-5j2xz的日志：
发现问题，确实是由于Readiness probe failed健康检查的时候报错，导致不能ready，最后有这样的描述信息:Readiness probe failed: Get “http://10.100.0.2:8181/ready”: dial tcp 10.100.0.2:8181: i/o timeout (Client.Timeout exceeded while awaiting headers)
完整描述如下：

[root@nb001 ~]# kubectl describe pod coredns-7d75679df-5j2xz  -n kube-system 
Name:                 coredns-7d75679df-5j2xz
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 nb001/172.28.242.13
Start Time:           Fri, 10 Dec 2021 09:56:24 +0800
Labels:               k8s-app=kube-dns
                      pod-template-hash=7d75679df
Annotations:          
Status:               Running
IP:                   10.100.0.2
IPs:
  IP:           10.100.0.2
Controlled By:  ReplicaSet/coredns-7d75679df
Containers:
  coredns:
    Container ID:  docker://b5e6b653345a99653e6aad37615bc05a5d466b84441c1c32a279e98e37ad0fd2
    Image:         swr.cn-east-2.myhuaweicloud.com/coredns/coredns:1.8.0
    Image ID:      docker-pullable://swr.cn-east-2.myhuaweicloud.com/coredns/coredns@sha256:10ecc12177735e5a6fd6fa0127202776128d860ed7ab0341780ddaeb1f6dfe61
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 27 Dec 2021 11:01:26 +0800
      Finished:     Mon, 27 Dec 2021 11:03:14 +0800
    Ready:          False
    Restart Count:  5525
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vgx8k (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-vgx8k:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node-role.kubernetes.io/control-plane:NoSchedule
                             node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  30m (x42490 over 16d)   kubelet  Readiness probe failed: Get "http://10.100.0.2:8181/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy  20m (x19658 over 16d)   kubelet  Readiness probe failed: Get "http://10.100.0.2:8181/ready": dial tcp 10.100.0.2:8181: connect: no route to host
  Normal   Pulled     15m (x5523 over 17d)    kubelet  Container image "swr.cn-east-2.myhuaweicloud.com/coredns/coredns:1.8.0" already present on machine
  Warning  Unhealthy  5m44s (x7016 over 16d)  kubelet  Readiness probe failed: Get "http://10.100.0.2:8181/ready": dial tcp 10.100.0.2:8181: i/o timeout (Client.Timeout exceeded while awaiting headers)
  Warning  BackOff    37s (x67579 over 16d)   kubelet  Back-off restarting failed container

3、解决: 注释掉如下类似这块代码，重启coredns发现服务正常了

服务正常：

2.3 再次测试

查看日志，发现请求成功了，如下：

END

K8S部署服务报UnknownHostException错误处理过程

Java相关栏目本月热门文章