栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 前沿技术 > 大数据 > 大数据系统

华为云上的一次kafka集群故障处理

华为云上的一次kafka集群故障处理

问题现象:

    生产者的日志中大量的超时

    2022-02-17 09:29:41,692 [kafka-producer-network-thread | monolith-rule-engine-xm2m-IOT-0003] WARN  o.t.s.q.k.TbKafkaProducerTemplate - Producer template failure: Expiring 2 record(s) for tb_rule_engine.main.0-0:120000 ms has passed since batch creation
org.apache.kafka.common.errors.TimeoutException: Expiring 2 record(s) for tb_rule_engine.main.0-0:120000 ms has passed since batch creation
2022-02-17 09:29:41,692 [kafka-producer-network-thread | monolith-rule-engine-xm2m-IOT-0003] WARN  o.t.s.q.k.TbKafkaProducerTemplate - Producer template failure: Expiring 2 record(s) for tb_rule_engine.main.0-0:120000 ms has passed since batch creation
org.apache.kafka.common.errors.TimeoutException: Expiring 2 record(s) for tb_rule_engine.main.0-0:120000 ms has passed since batch creation
2022-02-17 09:29:42,167 [tb-rule-engine-consumer-29-thread-3] INFO  o.a.k.clients.FetchSessionHandler - [Consumer clientId=re-Main-consumer-xm2m-IOT-0003, groupId=re-Main-consumer-xm2m-IOT-0003] Error sending fetch request (sessionId=1512270209, epoch=INITIAL) to node 2: org.apache.kafka.common.errors.DisconnectException.
2022-02-17 09:29:51,395 [kafka-producer-network-thread | monolith-transport-api-producer-xm2m-IOT-0003] WARN  o.t.s.q.k.TbKafkaProducerTemplate - Producer template failure: Expiring 4 record(s) for tb_transport.api.responses.xm2m_transport_01-0:120000 ms has passed since batch creation
org.apache.kafka.common.errors.TimeoutException: Expiring 4 record(s) for tb_transport.api.responses.xm2m_transport_01-0:120000 ms has passed since batch creation
2022-02-17 09:29:51,395 [kafka-producer-network-thread | monolith-transport-api-producer-xm2m-IOT-0003] WARN  o.t.s.q.k.TbKafkaProducerTemplate - Producer template failure: Expiring 4 record(s) for tb_transport.api.responses.xm2m_transport_01-0:120000 ms has passed since batch creation
org.apache.kafka.common.errors.TimeoutException: Expiring 4 record(s) for tb_transport.api.responses.xm2m_transport_01-0:120000 ms has passed since batch creation

  另有一行日志:

  [2022-02-17 09:20:18,494] ERROR Error while creating ephemeral at /brokers/ids/0, node already exists and owner '179866866520031379' does not match current session '251925893726535682' (kafka.zk.KafkaZkClient$CheckedEphemeral)

问题分析:

  1.通过kafka-topics.sh --list未发现问题;

  2.怀疑有节点服务宕掉,但通过查看进程未发现问题;

  3.只好检查配置文件了,发现

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://192.168.0.227:9092

# Hostname and port the broker will advertise to producers and consumers. If not set, 
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
advertised.listeners=PLAINTEXT://120.13.124.213:9092

listeners和advertised.listeners不一致。

一个内网地址,一个公网地址。

节点间通过advertised.listeners配置的公网地址互相ping,发现丢包率很高。

于是修改advertised.listeners为私网地址。

然后在各个节点上重启kafka.

问题解决。

转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/742161.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号