kubectl version --short kubectl cluster-info kubectl get componentstatus kubectl api-resources -o wide --sort-by name kubectl get events -A kubectl get nodes -o wide kubectl get pods -A -o wide kubectl describe pod kubectl logs kubectl exec -it1、kubectl version --short
# kubectl version --short Client Version: v1.21.0 Server Version: v1.21.0
使用该命令,查看正在运行的服务器版本。可以帮助我们搜索错误和阅读变更日志,及组件之间是否存在版本兼容性问题。
2、kubectl cluster-info# kubectl cluster-info Kubernetes control plane is running at https://127.0.0.1:6443 CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy KubeDNSUpstream is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns-upstream:dns/proxy kubernetes-dashboard is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
接下来我们应该了解集群在哪里运行及CoreDNS是否在运行。
从事例中可以看到为本地集群;运行有dashboard仪表盘;有资源指标获取工具:metrics-server。
可以登陆dashboard中进一步查看集群状况
# kubectl get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
此命令可查看调度程序、控制器管理器、etcd节点是否健康。
其它健康检查命令 kubectl get --raw '/healthz?verbose'
# kubectl get --raw '/healthz?verbose' [+]ping ok [+]log ok [+]etcd ok [+]poststarthook/start-kube-apiserver-admission-initializer ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/bootstrap-controller ok [+]poststarthook/rbac/bootstrap-roles ok [+]poststarthook/scheduling/bootstrap-system-priority-classes ok [+]poststarthook/priority-and-fairness-config-producer ok [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [+]poststarthook/apiservice-registration-controller ok [+]poststarthook/apiservice-status-available-controller ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok healthz check passed4、kubectl api-resources -o wide --sort-by name
# kubectl api-resources -o wide --sort-by name NAME SHORTNAMES APIVERSION NAMESPACED KIND VERBS alertmanagerconfigs monitoring.coreos.com/v1alpha1 true alertmanagerConfig [delete deletecollection get list patch create update watch] alertmanagers monitoring.coreos.com/v1 true alertmanager [delete deletecollection get list patch create update watch] apiservices apiregistration.k8s.io/v1 false APIService [create delete deletecollection get list patch update watch] bindings v1 true Binding [create] certificatesigningrequests csr certificates.k8s.io/v1 false CertificateSigningRequest [create delete deletecollection get list patch update watch] clusterrolebindings rbac.authorization.k8s.io/v1 false ClusterRoleBinding [create delete deletecollection get list patch update watch] clusterroles rbac.authorization.k8s.io/v1 false ClusterRole [create delete deletecollection get list patch update watch] componentstatuses cs v1 false ComponentStatus [get list] configmaps cm v1 true ConfigMap [create delete deletecollection get list patch update watch] controllerrevisions apps/v1 true ControllerRevision [create delete deletecollection get list patch update watch] cronjobs cj batch/v1 true CronJob [create delete deletecollection get list patch update watch] csidrivers storage.k8s.io/v1 false CSIDriver [create delete deletecollection get list patch update watch] csinodes storage.k8s.io/v1 false CSINode [create delete deletecollection get list patch update watch] csistoragecapacities storage.k8s.io/v1beta1 true CSIStorageCapacity [create delete deletecollection get list patch update watch] customresourcedefinitions crd,crds apiextensions.k8s.io/v1 false CustomResourceDefinition [create delete deletecollection get list patch update watch] daemonsets ds apps/v1 true DaemonSet [create delete deletecollection get list patch update watch] deployments deploy apps/v1 true Deployment [create delete deletecollection get list patch update watch] endpoints ep v1 true Endpoints [create delete deletecollection get list patch update watch] endpointslices discovery.k8s.io/v1 true EndpointSlice [create delete deletecollection get list patch update watch] events ev v1 true Event [create delete deletecollection get list patch update watch] events ev events.k8s.io/v1 true Event [create delete deletecollection get list patch update watch] flowschemas flowcontrol.apiserver.k8s.io/v1beta1 false FlowSchema [create delete deletecollection get list patch update watch] horizontalpodautoscalers hpa autoscaling/v1 true HorizontalPodAutoscaler [create delete deletecollection get list patch update watch] ingressclasses networking.k8s.io/v1 false IngressClass [create delete deletecollection get list patch update watch] ingresses ing networking.k8s.io/v1 true Ingress [create delete deletecollection get list patch update watch] ingresses ing extensions/v1beta1 true Ingress [create delete deletecollection get list patch update watch] jobs batch/v1 true Job [create delete deletecollection get list patch update watch] leases coordination.k8s.io/v1 true Lease [create delete deletecollection get list patch update watch] limitranges limits v1 true LimitRange [create delete deletecollection get list patch update watch] localsubjectaccessreviews authorization.k8s.io/v1 true LocalSubjectAccessReview [create] mutatingwebhookconfigurations admissionregistration.k8s.io/v1 false MutatingWebhookConfiguration [create delete deletecollection get list patch update watch] namespaces ns v1 false Namespace [create delete get list patch update watch] networkpolicies netpol networking.k8s.io/v1 true NetworkPolicy [create delete deletecollection get list patch update watch] nodes no v1 false Node [create delete deletecollection get list patch update watch] nodes metrics.k8s.io/v1beta1 false NodeMetrics [get list] persistentvolumeclaims pvc v1 true PersistentVolumeClaim [create delete deletecollection get list patch update watch] persistentvolumes pv v1 false PersistentVolume [create delete deletecollection get list patch update watch] poddisruptionbudgets pdb policy/v1 true PodDisruptionBudget [create delete deletecollection get list patch update watch] podmonitors monitoring.coreos.com/v1 true PodMonitor [delete deletecollection get list patch create update watch] pods po v1 true Pod [create delete deletecollection get list patch update watch] pods metrics.k8s.io/v1beta1 true PodMetrics [get list] podsecuritypolicies psp policy/v1beta1 false PodSecurityPolicy [create delete deletecollection get list patch update watch] podtemplates v1 true PodTemplate [create delete deletecollection get list patch update watch] priorityclasses pc scheduling.k8s.io/v1 false PriorityClass [create delete deletecollection get list patch update watch] prioritylevelconfigurations flowcontrol.apiserver.k8s.io/v1beta1 false PriorityLevelConfiguration [create delete deletecollection get list patch update watch] probes monitoring.coreos.com/v1 true Probe [delete deletecollection get list patch create update watch] prometheuses monitoring.coreos.com/v1 true Prometheus [delete deletecollection get list patch create update watch] prometheusrules monitoring.coreos.com/v1 true PrometheusRule [delete deletecollection get list patch create update watch] replicasets rs apps/v1 true ReplicaSet [create delete deletecollection get list patch update watch] replicationcontrollers rc v1 true ReplicationController [create delete deletecollection get list patch update watch] resourcequotas quota v1 true ResourceQuota [create delete deletecollection get list patch update watch] rolebindings rbac.authorization.k8s.io/v1 true RoleBinding [create delete deletecollection get list patch update watch] roles rbac.authorization.k8s.io/v1 true Role [create delete deletecollection get list patch update watch] runtimeclasses node.k8s.io/v1 false RuntimeClass [create delete deletecollection get list patch update watch] secrets v1 true Secret [create delete deletecollection get list patch update watch] selfsubjectaccessreviews authorization.k8s.io/v1 false SelfSubjectAccessReview [create] selfsubjectrulesreviews authorization.k8s.io/v1 false SelfSubjectRulesReview [create] serviceaccounts sa v1 true ServiceAccount [create delete deletecollection get list patch update watch] servicemonitors monitoring.coreos.com/v1 true ServiceMonitor [delete deletecollection get list patch create update watch] services svc v1 true Service [create delete get list patch update watch] statefulsets sts apps/v1 true StatefulSet [create delete deletecollection get list patch update watch] storageclasses sc storage.k8s.io/v1 false StorageClass [create delete deletecollection get list patch update watch] subjectaccessreviews authorization.k8s.io/v1 false SubjectAccessReview [create] thanosrulers monitoring.coreos.com/v1 true ThanosRuler [delete deletecollection get list patch create update watch] tokenreviews authentication.k8s.io/v1 false TokenReview [create] validatingwebhookconfigurations admissionregistration.k8s.io/v1 false ValidatingWebhookConfiguration [create delete deletecollection get list patch update watch] volumeattachments storage.k8s.io/v1 false VolumeAttachment [create delete deletecollection get list patch update watch]
--sort-by name 按名称排序
-o wide 显示每个资源上可用的动作(create delete deletecollection get list patch update watch)
此命令可以查看集群安装哪些自定义资源以及每个资源的api版本
# kubectl get events -A NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE default 75m Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller default 51m Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller default 28m Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller default 5m38s Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller default 75m Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller default 51m Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller default 28m Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller default 5m38s Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller default 75m Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller default 51m Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller default 28m Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller default 5m38s Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller default 75m Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller default 51m Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller default 28m Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller default 5m38s Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller default 75m Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller default 51m Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller default 28m Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller default 5m38s Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller default 75m Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller default 51m Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller default 28m Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller default 5m38s Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller default 75m Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller default 51m Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller default 28m Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller default 5m38s Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller default 75m Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller default 51m Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller default 28m Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller default 5m38s Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller kube-system 75m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_6caa59c9-8c2e-435d-99d4-d59f46704b5b became leader kube-system 52m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5d324cbb-3752-4e07-a742-1dcdf8781ef6 became leader kube-system 34m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5bfac8da-8530-407b-a715-b42f24c3070f became leader kube-system 28m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5facaf45-85c8-4750-b9da-42c933a8ff38 became leader kube-system 5m47s Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5bdd527d-effe-49ce-8164-973aeb18417d became leader kube-system 75m Normal LeaderElection lease/kube-controller-manager localhost.localdomain_3e4b128c-a93c-4ec8-b0bb-ee2cbead1b15 became leader kube-system 52m Normal LeaderElection lease/kube-controller-manager localhost.localdomain_b915f124-2b75-41b3-8925-856538c2dc83 became leader kube-system 28m Normal LeaderElection lease/kube-controller-manager localhost.localdomain_5170c3fa-e369-4962-96e1-6624ea248a65 became leader kube-system 5m50s Normal LeaderElection lease/kube-controller-manager localhost.localdomain_04429131-7c12-440a-8adb-592ce6f8f933 became leader kube-system 75m Normal LeaderElection lease/kube-scheduler localhost.localdomain_dad4a51f-4954-4a78-9fbd-2147778743ef became leader kube-system 52m Normal LeaderElection lease/kube-scheduler localhost.localdomain_130dba72-4f71-44b5-aed1-7f1677c6d243 became leader kube-system 35m Normal LeaderElection lease/kube-scheduler localhost.localdomain_30d37560-8849-4128-a713-137a686d2a97 became leader kube-system 29m Normal LeaderElection lease/kube-scheduler localhost.localdomain_e0fb2509-c923-4b12-9002-4cfb6d53833d became leader kube-system 5m47s Normal LeaderElection lease/kube-scheduler localhost.localdomain_5ae8cd66-a2bc-48b1-abba-a62c3fe2a2ff became leader kube-system 6m5s Normal Pulled pod/nfs-client-provisioner-5db449f657-97jsh Container image "easzlab/nfs-subdir-external-provisioner:v4.0.1" already present on machine kube-system 6m5s Normal Created pod/nfs-client-provisioner-5db449f657-97jsh Created container nfs-client-provisioner kube-system 6m5s Normal Started pod/nfs-client-provisioner-5db449f657-97jsh Started container nfs-client-provisioner kube-system 29m Warning BackOff pod/nfs-client-provisioner-5db449f657-97jsh Back-off restarting failed container
了解了集群中正在运行的内容后,查看最近出现的故障,可以从集群事件中了解故障前后发生的情况,如果查看指定namespace出现的问题,用-n 指定即可。
通过输出,应关注输出的类型、原因和对象。通过这三条,缩小查找范围
# kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 10.2.33.100 Ready,SchedulingDisabled master 40d v1.21.0 10.2.33.100CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5 10.2.33.101 Ready,SchedulingDisabled master 39d v1.21.0 10.2.33.101 CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5 10.2.33.94 Ready node 39d v1.21.0 10.2.33.94 CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5 10.2.33.95 Ready node 40d v1.21.0 10.2.33.95 CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5 10.2.33.96 Ready node 40d v1.21.0 10.2.33.96 CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5 10.2.33.97 Ready node 40d v1.21.0 10.2.33.97 CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5 10.2.33.98 Ready,SchedulingDisabled master 40d v1.21.0 10.2.33.98 CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5 10.2.33.99 Ready,SchedulingDisabled master 40d v1.21.0 10.2.33.99 CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
可以查看各节点Pod运行的状态。使用-o wide 详细显示如操作系统(os)、ip地址等。查看节点是否为Ready状态。
查看节点的运行时长,以查看状态和运行时长之间是否存在任何相关性。也许只有新节点有问题,因为节点镜像中的某些内容发生了变化。该版本将帮助你快速了解 kubelet 上是否存在版本偏差,以及是否存在由于 kubelet 和 API 服务器之间的版本不同而导致的已知错误。
如果你看到子网之外的 IP 地址,则内部 IP 会很有用。一个节点可能以不正确的静态 IP 地址启动,并且你的 CNI 无法将流量路由到工作负载。
操作系统镜像、内核版本和容器运行时都是可能导致问题的差异的重要指标。你可能只遇到特定操作系统或运行时的问题。此信息将帮助你快速将潜在问题归零,并知道在何处更深入地查看日志。
# kubectl get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default test-pod 0/1 Completed 0 28d 172.20.6.3 10.2.33.94ingress-nginx nginx-ingress-controller-6f4c78bb57-fshh7 1/1 Running 5 40d 172.20.4.3 10.2.33.96 kube-system coredns-74c56d8f8d-k8ggm 1/1 Running 0 40d 172.20.4.2 10.2.33.96 kube-system dashboard-metrics-scraper-856586f554-fft8q 1/1 Running 1 40d 172.20.5.2 10.2.33.97 kube-system kube-flannel-ds-amd64-7fmqs 1/1 Running 0 39d 10.2.33.94 10.2.33.94 kube-system kube-flannel-ds-amd64-cvt77 1/1 Running 0 40d 10.2.33.97 10.2.33.97 kube-system kube-flannel-ds-amd64-d5rzz 1/1 Running 0 40d 10.2.33.100 10.2.33.100 kube-system kube-flannel-ds-amd64-gncjz 1/1 Running 0 39d 10.2.33.101 10.2.33.101 kube-system kube-flannel-ds-amd64-jfkx2 1/1 Running 0 40d 10.2.33.96 10.2.33.96 kube-system kube-flannel-ds-amd64-mltxw 1/1 Running 0 40d 10.2.33.95 10.2.33.95 kube-system kube-flannel-ds-amd64-vlghf 1/1 Running 0 40d 10.2.33.99 10.2.33.99 kube-system kube-flannel-ds-amd64-xmzz7 1/1 Running 0 40d 10.2.33.98 10.2.33.98 kube-system kubernetes-dashboard-c4ff5556c-pcmtw 1/1 Running 31 40d 172.20.5.3 10.2.33.97 kube-system metrics-server-8568cf894b-4925q 1/1 Running 0 40d 172.20.3.2 10.2.33.95 kube-system nfs-client-provisioner-5db449f657-97jsh 1/1 Running 714 28d 172.20.6.2 10.2.33.94 kube-system node-local-dns-4bcdm 1/1 Running 0 40d 10.2.33.98 10.2.33.98 kube-system node-local-dns-bq5j5 1/1 Running 0 40d 10.2.33.97 10.2.33.97 kube-system node-local-dns-d6xr5 1/1 Running 0 40d 10.2.33.100 10.2.33.100 kube-system node-local-dns-hlc7t 1/1 Running 0 39d 10.2.33.101 10.2.33.101 kube-system node-local-dns-k9lqg 1/1 Running 0 40d 10.2.33.96 10.2.33.96 kube-system node-local-dns-ntf59 1/1 Running 0 40d 10.2.33.99 10.2.33.99 kube-system node-local-dns-qs6rw 1/1 Running 0 39d 10.2.33.94 10.2.33.94 kube-system node-local-dns-qxt7m 1/1 Running 0 40d 10.2.33.95 10.2.33.95 monitor alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 18d 172.20.5.5 10.2.33.97 monitor prometheus-grafana-55c5f574d9-sgvr4 2/2 Running 0 18d 172.20.6.5 10.2.33.94 monitor prometheus-kube-prometheus-operator-5f6774b747-zvffc 1/1 Running 0 18d 172.20.6.6 10.2.33.94 monitor prometheus-kube-state-metrics-5f89586745-lfwr2 1/1 Running 0 18d 172.20.3.5 10.2.33.95 monitor prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 18d 172.20.4.4 10.2.33.96 monitor prometheus-prometheus-node-exporter-2cccr 1/1 Running 0 18d 10.2.33.95 10.2.33.95 monitor prometheus-prometheus-node-exporter-66r8q 1/1 Running 0 18d 10.2.33.98 10.2.33.98 monitor prometheus-prometheus-node-exporter-86x9l 1/1 Running 0 18d 10.2.33.100 10.2.33.100 monitor prometheus-prometheus-node-exporter-f8mpk 1/1 Running 0 18d 10.2.33.94 10.2.33.94 monitor prometheus-prometheus-node-exporter-g8mng 1/1 Running 0 18d 10.2.33.99 10.2.33.99 monitor prometheus-prometheus-node-exporter-k5r2j 1/1 Running 0 18d 10.2.33.101 10.2.33.101 monitor prometheus-prometheus-node-exporter-pjbl5 1/1 Running 0 18d 10.2.33.97 10.2.33.97 monitor prometheus-prometheus-node-exporter-s7z8c 1/1 Running 0 18d 10.2.33.96 10.2.33.96 test-tengine plat-tengine-649d486499-w68bx 1/1 Running 0 40d 172.20.3.4 10.2.33.95
-A 列出所有节点,-o wide 显示详细信息。如果查看指定namespace出现的问题,用-n 指定即可。
根据列出的节点状态(STATUS)定位是哪个namespace或node上出现问题
# kubectl describe pod plat-tengine-649d486499-w68bx -n test-tengine
Name: plat-tengine-649d486499-w68bx
Namespace: test-tengine
Priority: 0
Node: 10.2.33.95/10.2.33.95
Start Time: Wed, 20 Oct 2021 17:54:50 +0800
Labels: app=tengine-labels
pod-template-hash=649d486499
Annotations:
Status: Running
IP: 172.20.3.4
IPs:
IP: 172.20.3.4
Controlled By: ReplicaSet/plat-tengine-649d486499
Containers:
plat-tengine:
Container ID: docker://7f1240b861a15d7011ab8a40285a46441179657c8946e1900be259e16c38e080
Image: registry.tengine.tv/zxltest/tengine:25
Image ID: docker-pullable://registry.tengine.tv/zxltest/tengine@sha256:441d952bcf039c6921b0f860ae1bc86159b9ef8a2368f7964b7f88d643f82e5f
Port: 8108/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 20 Oct 2021 17:54:52 +0800
Ready: True
Restart Count: 0
Environment:
Mounts:
/etc/localtime from plat-time (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hdrgk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
plat-time:
Type: HostPath (bare host directory volume)
Path: /usr/share/zoneinfo/Asia/Shanghai
HostPathType:
kube-api-access-hdrgk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
既然已经知道哪个name及在哪个namespace出现了问题,就直接查看该pod的详细信息。
查看是否有error信息,对应解决问题。
# kubectl logs -f plat-tengine-649d486499-w68bx -n test-tengine 2021-11-28 11:25:47,540 INFO exited: tengine (exit status 1; not expected) 2021-11-28 11:25:48,543 INFO spawned: 'tengine' with pid 19128 2021-11-28 11:25:49,554 INFO success: tengine entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2021-11-28 11:25:51,055 INFO exited: tengine (exit status 1; not expected) 2021-11-28 11:25:52,058 INFO spawned: 'tengine' with pid 19129 2021-11-28 11:25:53,069 INFO success: tengine entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2021-11-28 11:25:54,570 INFO exited: tengine (exit status 1; not expected) 2021-11-28 11:25:55,574 INFO spawned: 'tengine' with pid 19130
describe命令为你提供pod内部应用程序发生的时间,而logs则提供了pod相关的详细信息。
可以通过grep 过滤掉不相关的信息,或指定特定事件。
# kubectl exec -it plat-tengine-649d486499-w68bx -n test-tengine /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root@plat-tengine-649d486499-w68bx /]# free
total used free shared buff/cache available
Mem: 16257696 778408 10021860 279332 5457428 14775780
Swap: 0 0 0
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Oct20 ? 01:21:38 /usr/bin/python /usr/bin/supervisord -n -c /etc/supervisord.conf
root 10 1 0 Oct20 ? 00:00:00 nginx: master process /data/server/tengine/bin/nginx -p /data/server/tengine -c /data/server/tengine/conf/tengin
www 11 10 0 Oct20 ? 00:15:08 nginx: worker process
www 12 10 0 Oct20 ? 00:00:00 nginx: worker process
www 13 10 0 Oct20 ? 00:15:15 nginx: worker process
www 14 10 0 Oct20 ? 00:15:15 nginx: worker process
www 15 10 0 Oct20 ? 00:15:34 nginx: worker process
www 16 10 0 Oct20 ? 00:15:09 nginx: worker process
www 17 10 0 Oct20 ? 00:15:18 nginx: worker process
www 18 10 0 Oct20 ? 00:15:29 nginx: worker process
root 22139 0 0 18:31 pts/0 00:00:00 /bin/bash
root 22170 1 0 18:32 ? 00:00:00 /data/server/tengine/bin/nginx -p /data/server/tengine -c /data/server/tengine/conf/tengine.conf -s start
root 22171 22139 0 18:32 pts/0 00:00:00 ps -ef
日志中如果不能查找到问题,这时就只能进入到容器内部查看进程及服务日志,来定位具体问题.



