栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

k8s中etcd报错etcd组件不健康

k8s中etcd报错etcd组件不健康

1、上集群发现页面有报错etcd组件不健康,但是节点显示没有任何问题

后台查看etcd发现etcd列表没有

+--------------------------+------------------+---------+---------+-----------+-----------+------------+
|         ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://192.168.0.3:2379 | 78b2261e379e9a4c |  3.3.15 |   19 MB |      true |     16537 |   14306144 |
| https://192.168.0.2:2379 | d5a8f8671df6bb3b |  3.3.15 |   18 MB |     false |     16537 |   14306144 |

2、检查这个etcd不健康

Sangfor:PaaS/private-master-01-a7aeca ~ x docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") etcd etcdctl endpoint health
{"level":"warn","ts":"2022-02-11T07:55:41.631Z","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-9aae1d5c-2c6b-4474-87e4-35cca2d5cba7/192.168.0.4:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = "transport: Error while dialing dial tcp 192.168.0.4:2379: connect: connection refused""}
https://192.168.0.3:2379 is healthy: successfully committed proposal: took = 13.843404ms
https://192.168.0.2:2379 is healthy: successfully committed proposal: took = 13.728998ms
https://192.168.0.4:2379 is unhealthy: failed to commit proposal: context deadline exceeded

3、检查对应etcd主机后台的etcd日志有报错

2022-01-14 21:11:44.560318 W | etcdserver: failed to reach the peerURL(https://192.168.0.2:2380) of member d5a8f8671df6bb3b (Get https://192.168.0.2:2380/version: dial tcp 192.168.0.2:2380: connect: no route to host)
2022-01-14 21:11:44.560368 W | etcdserver: cannot get the version of member d5a8f8671df6bb3b (Get https://192.168.0.2:2380/version: dial tcp 192.168.0.2:2380: connect: no route to host)
2022-01-14 21:11:47.267996 W | rafthttp: health check for peer d5a8f8671df6bb3b could not connect: dial tcp 192.168.0.2:2380: connect: no route to host (prober "ROUND_TRIPPER_SNAPSHOT")
2022-01-14 21:11:47.296968 W | rafthttp: health check for peer d5a8f8671df6bb3b could not connect: dial tcp 192.168.0.2:2380: connect: no route to host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
2022-01-14 21:11:50.573183 W | etcdserver: failed to reach the peerURL(https://192.168.0.2:2380) of member d5a8f8671df6bb3b (Get https://192.168.0.2:2380/version: dial tcp 192.168.0.2:2380: connect: no route to host)
4、看到有帖子说是分析是因为etcd1的配置文件/etc/systemd/system/etcd.service 启动脚本中的ETCD_INITIAL_CLUSTER_STATE是new,而在配置中ETCD_INITIAL_CLUSTER写入了etcd2/3的IP:PORT,这时etcd1尝试去连接etcd2、etcd3,但是etcd2、3的etcd服务此时还未启动,因此需要先启动etcd2和3的etcd服务,再去启动etcd1。

5、所以考虑到这个问题尝试重启etcd,结果成功,集群正常

Sangfor:PaaS/private-master-03-e672f6 ~ o docker restart etcd
etcd
Sangfor:PaaS/private-master-03-e672f6 ~ o docker ps -a | grep etcd
5e5c47f5b434        10.113.67.53/multi-arch/library/sangforpaas/coreos-etcd:v3.3.15-sangfor1       "/usr/local/bin/etcd…"   4 weeks ago         Up 3 seconds                                   etcd
6、再次检查etcd健康,没有问题了

Sangfor:PaaS/private-master-01-a7aeca ~ o docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") etcd etcdctl endpoint health
https://192.168.0.2:2379 is healthy: successfully committed proposal: took = 17.85983ms
https://192.168.0.3:2379 is healthy: successfully committed proposal: took = 17.603519ms
https://192.168.0.4:2379 is healthy: successfully committed proposal: took = 19.782918ms
Sangfor:PaaS/private-master-01-a7aeca ~ o docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") etcd etcdctl endpoint status --write-out table
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
|         ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://192.168.0.4:2379 | 2bfd771eb734cc67 |  3.3.15 |   18 MB |     false |     16563 |   14309175 |
| https://192.168.0.3:2379 | 78b2261e379e9a4c |  3.3.15 |   19 MB |     false |     16563 |   14309175 |
| https://192.168.0.2:2379 | d5a8f8671df6bb3b |  3.3.15 |   18 MB |      true |     16563 |   14309175 |
+--------------------------+------------------+---------+---------+-----------+-----------+------------+
 

本文章参考如下帖子

kubernetes 二进制安装 遇到 etcd 不能启动报错 处理【附源码】_安享落幕_51CTO博客


 

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/734861.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号