Kafka集群在运行时蓝屏,意外宕机。重启后Kafka会自动挂掉。
报错信息:[2021-09-28 20:21:52,487] WARN [RequestSendThread controllerId=0] Controller 0's connection to broker hadoop103:9092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread) java.io.IOException: Connection to hadoop103:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:296) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:250) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:52,610] WARN [RequestSendThread controllerId=0] Controller 0's connection to broker hadoop103:9092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread) java.io.IOException: Connection to hadoop103:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:296) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:250) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:52,841] WARN [RequestSendThread controllerId=0] Controller 0's connection to broker hadoop103:9092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread) java.io.IOException: Connection to hadoop103:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:296) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:250) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:52,944] WARN [RequestSendThread controllerId=0] Controller 0's connection to broker hadoop103:9092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread) java.io.IOException: Connection to hadoop103:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:296) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:250) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:53,047] WARN [RequestSendThread controllerId=0] Controller 0's connection to broker hadoop103:9092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread) java.io.IOException: Connection to hadoop103:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:296) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:250) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:53,117] DEBUG The stop replica request (delete = false) sent to broker 1 is StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=40,Replica=1],false),StopReplicaRequestInfo([Topic=topic_log,Partition=1,Replica=1],false),StopReplicaRequestInfo([Topic=topic_log,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=22,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=10,Replica=1],false),StopReplicaRequestInfo([Topic=topic_source_sensor,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=7,Replica=1],false),StopReplicaRequestInfo([Topic=GMALL_EVENT,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=sensor,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=dwm_user_jump_detail,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=46,Replica=1],false),StopReplicaRequestInfo([Topic=dwd_comment_info,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=31,Replica=1],false),StopReplicaRequestInfo([Topic=GMALL_STARTUP,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=49,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=4,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=34,Replica=1],false),StopReplicaRequestInfo([Topic=dwd_start_log,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=13,Replica=1],false),StopReplicaRequestInfo([Topic=dwd_favor_info,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=ods_base_db_c,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=dwd_refund_payment,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=1,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=25,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=43,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=19,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=28,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=37,Replica=1],false),StopReplicaRequestInfo([Topic=-list,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=__consumer_offsets,Partition=16,Replica=1],false),StopReplicaRequestInfo([Topic=ods_base_log,Partition=0,Replica=1],false),StopReplicaRequestInfo([Topic=GMALL_EVENT,Partition=1,Replica=1],false) (kafka.controller.ControllerBrokerRequestBatch) [2021-09-28 20:21:53,129] INFO [Controller id=0] Processing automatic preferred replica leader election (kafka.controller.KafkaController) [2021-09-28 20:21:53,130] TRACE [Controller id=0] Checking need to trigger auto leader balancing (kafka.controller.KafkaController) [2021-09-28 20:21:53,155] WARN [RequestSendThread controllerId=0] Controller 0's connection to broker hadoop103:9092 (id: 1 rack: null) was unsuccessful (kafka.controller.RequestSendThread) java.io.IOException: Connection to hadoop103:9092 (id: 1 rack: null) failed. at org.apache.kafka.clients.NetworkClientUtils.awaitReady(NetworkClientUtils.java:71) at kafka.controller.RequestSendThread.brokerReady(ControllerChannelManager.scala:296) at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:250) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:53,160] DEBUG [Controller id=0] Preferred replicas by broker Map(2 -> Map(ods_base_db_m-0 -> Vector(2), __consumer_offsets-8 -> Vector(2), dwd_order_detail-0 -> Vector(2), __consumer_offsets-35 -> Vector(2), __consumer_offsets-41 -> Vector(2), __consumer_offsets-23 -> Vector(2), __consumer_offsets-47 -> Vector(2), topic_sink_sensor-0 -> Vector(2), first-0 -> Vector(2), bigdata-0 -> Vector(2), __consumer_offsets-38 -> Vector(2), __consumer_offsets-17 -> Vector(2), __consumer_offsets-11 -> Vector(2), GMALL_EVENT-1 -> Vector(2, 1), __consumer_offsets-2 -> Vector(2), __consumer_offsets-14 -> Vector(2), GMALL_STARTUP-0 -> Vector(2, 1), __consumer_offsets-20 -> Vector(2), __consumer_offsets-44 -> Vector(2), dwm_unique_visit-0 -> Vector(2), topic_log-0 -> Vector(2, 1), dwd_page_log-0 -> Vector(2), __consumer_offsets-5 -> Vector(2), __consumer_offsets-26 -> Vector(2), __consumer_offsets-29 -> Vector(2), dwd_order_refund_info-0 -> Vector(2), __consumer_offsets-32 -> Vector(2), testTopic-0 -> Vector(2)), 1 -> Map(__consumer_offsets-22 -> Vector(1), dwd_favor_info-0 -> Vector(1), __consumer_offsets-4 -> Vector(1), __consumer_offsets-7 -> Vector(1), __consumer_offsets-46 -> Vector(1), dwd_start_log-0 -> Vector(1), __consumer_offsets-25 -> Vector(1), __consumer_offsets-49 -> Vector(1), __consumer_offsets-16 -> Vector(1), __consumer_offsets-28 -> Vector(1), topic_source_sensor-0 -> Vector(1), __consumer_offsets-31 -> Vector(1), sensor-0 -> Vector(1), __consumer_offsets-37 -> Vector(1), ods_base_db_c-0 -> Vector(1), ods_base_log-0 -> Vector(1), dwm_user_jump_detail-0 -> Vector(1), __consumer_offsets-19 -> Vector(1), __consumer_offsets-13 -> Vector(1), __consumer_offsets-43 -> Vector(1), dwd_comment_info-0 -> Vector(1), dwd_refund_payment-0 -> Vector(1), -list-0 -> Vector(1), topic_log-1 -> Vector(1, 0), GMALL_EVENT-0 -> Vector(1, 0), __consumer_offsets-1 -> Vector(1), __consumer_offsets-34 -> Vector(1), __consumer_offsets-10 -> Vector(1), __consumer_offsets-40 -> Vector(1)), 0 -> Map(__consumer_offsets-30 -> Vector(0), ods_base_db-0 -> Vector(0), __consumer_offsets-21 -> Vector(0), __consumer_offsets-27 -> Vector(0), __consumer_offsets-9 -> Vector(0), __consumer_offsets-33 -> Vector(0), __consumer_offsets-36 -> Vector(0), dwd_cart_info-0 -> Vector(0), __consumer_offsets-42 -> Vector(0), __consumer_offsets-3 -> Vector(0), __consumer_offsets-18 -> Vector(0), __consumer_offsets-15 -> Vector(0), __consumer_offsets-24 -> Vector(0), GMALL_STARTUP-1 -> Vector(0, 2), __consumer_offsets-48 -> Vector(0), dwd_display_log-0 -> Vector(0), testTopic-1 -> Vector(0), __consumer_offsets-6 -> Vector(0), dwd_order_info_update-0 -> Vector(0), dwd_payment_info-0 -> Vector(0), __consumer_offsets-0 -> Vector(0), __consumer_offsets-39 -> Vector(0), __consumer_offsets-12 -> Vector(0), dim_base_trademark-0 -> Vector(0), __consumer_offsets-45 -> Vector(0), dwd_order_info-0 -> Vector(0), topic_log-2 -> Vector(0, 2))) (kafka.controller.KafkaController) [2021-09-28 20:21:53,214] DEBUG [Controller id=0] Topics not in preferred replica for broker 2 Map(GMALL_EVENT-1 -> Vector(2, 1), GMALL_STARTUP-0 -> Vector(2, 1), topic_log-0 -> Vector(2, 1)) (kafka.controller.KafkaController) [2021-09-28 20:21:53,215] TRACE [Controller id=0] Leader imbalance ratio for broker 2 is 0.10714285714285714 (kafka.controller.KafkaController) [2021-09-28 20:21:53,217] INFO [Controller id=0] Starting replica leader election (PREFERRED) for partitions GMALL_EVENT-1,GMALL_STARTUP-0,topic_log-0 triggered by AutoTriggered (kafka.controller.KafkaController) [2021-09-28 20:21:53,290] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition GMALL_STARTUP-0 (kafka.controller.KafkaController) kafka.common.StateChangeFailedException: Failed to elect leader for partition GMALL_STARTUP-0 under strategy PreferredReplicaPartitionLeaderElectionStrategy at kafka.controller.ZkPartitionStateMachine$$anonfun$doElectLeaderForPartitions$2.apply(PartitionStateMachine.scala:427) at kafka.controller.ZkPartitionStateMachine$$anonfun$doElectLeaderForPartitions$2.apply(PartitionStateMachine.scala:424) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.controller.ZkPartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:424) at kafka.controller.ZkPartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:335) at kafka.controller.ZkPartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:233) at kafka.controller.ZkPartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:154) at kafka.controller.KafkaController.kafka$controller$KafkaController$$onReplicaElection(KafkaController.scala:761) at kafka.controller.KafkaController$$anonfun$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1135) at kafka.controller.KafkaController$$anonfun$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1116) at scala.collection.immutable.Map$Map3.foreach(Map.scala:161) at kafka.controller.KafkaController.checkAndTriggerAutoLeaderRebalance(KafkaController.scala:1116) at kafka.controller.KafkaController.processAutoPreferredReplicaLeaderElection(KafkaController.scala:1144) at kafka.controller.KafkaController.process(KafkaController.scala:1864) at kafka.controller.QueuedEvent.process(ControllerEventManager.scala:53) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:137) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:137) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:137) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:136) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:53,290] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition topic_log-0 (kafka.controller.KafkaController) kafka.common.StateChangeFailedException: Failed to elect leader for partition topic_log-0 under strategy PreferredReplicaPartitionLeaderElectionStrategy at kafka.controller.ZkPartitionStateMachine$$anonfun$doElectLeaderForPartitions$2.apply(PartitionStateMachine.scala:427) at kafka.controller.ZkPartitionStateMachine$$anonfun$doElectLeaderForPartitions$2.apply(PartitionStateMachine.scala:424) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.controller.ZkPartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:424) at kafka.controller.ZkPartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:335) at kafka.controller.ZkPartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:233) at kafka.controller.ZkPartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:154) at kafka.controller.KafkaController.kafka$controller$KafkaController$$onReplicaElection(KafkaController.scala:761) at kafka.controller.KafkaController$$anonfun$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1135) at kafka.controller.KafkaController$$anonfun$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1116) at scala.collection.immutable.Map$Map3.foreach(Map.scala:161) at kafka.controller.KafkaController.checkAndTriggerAutoLeaderRebalance(KafkaController.scala:1116) at kafka.controller.KafkaController.processAutoPreferredReplicaLeaderElection(KafkaController.scala:1144) at kafka.controller.KafkaController.process(KafkaController.scala:1864) at kafka.controller.QueuedEvent.process(ControllerEventManager.scala:53) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:137) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:137) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:137) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:136) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96) [2021-09-28 20:21:53,290] ERROR [Controller id=0] Error completing replica leader election (PREFERRED) for partition GMALL_EVENT-1 (kafka.controller.KafkaController) kafka.common.StateChangeFailedException: Failed to elect leader for partition GMALL_EVENT-1 under strategy PreferredReplicaPartitionLeaderElectionStrategy at kafka.controller.ZkPartitionStateMachine$$anonfun$doElectLeaderForPartitions$2.apply(PartitionStateMachine.scala:427) at kafka.controller.ZkPartitionStateMachine$$anonfun$doElectLeaderForPartitions$2.apply(PartitionStateMachine.scala:424) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at kafka.controller.ZkPartitionStateMachine.doElectLeaderForPartitions(PartitionStateMachine.scala:424) at kafka.controller.ZkPartitionStateMachine.electLeaderForPartitions(PartitionStateMachine.scala:335) at kafka.controller.ZkPartitionStateMachine.doHandleStateChanges(PartitionStateMachine.scala:233) at kafka.controller.ZkPartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:154) at kafka.controller.KafkaController.kafka$controller$KafkaController$$onReplicaElection(KafkaController.scala:761) at kafka.controller.KafkaController$$anonfun$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1135) at kafka.controller.KafkaController$$anonfun$checkAndTriggerAutoLeaderRebalance$3.apply(KafkaController.scala:1116) at scala.collection.immutable.Map$Map3.foreach(Map.scala:161) at kafka.controller.KafkaController.checkAndTriggerAutoLeaderRebalance(KafkaController.scala:1116) at kafka.controller.KafkaController.processAutoPreferredReplicaLeaderElection(KafkaController.scala:1144) at kafka.controller.KafkaController.process(KafkaController.scala:1864) at kafka.controller.QueuedEvent.process(ControllerEventManager.scala:53) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply$mcV$sp(ControllerEventManager.scala:137) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:137) at kafka.controller.ControllerEventManager$ControllerEventThread$$anonfun$doWork$1.apply(ControllerEventManager.scala:137) at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:31) at kafka.controller.ControllerEventManager$ControllerEventThread.doWork(ControllerEventManager.scala:136) at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)解决方法:
Kafka运行时蓝屏,导致Kafka Replica和Leader之间的offset不一致,非生产环境下直接将kafka的Topic和zookeeper下的Topic删除(zkCli.sh),删除后重启kafka。



