kafka集群搭建-宿主机版本

kafka集群搭建-宿主机版本搭建

文章目录

kafka集群搭建-宿主机版本搭建
前言
一、kafka是什么？
二、集群
- 1.集群
- 2.负载均衡
- 3.扩容
- 4.Zookeeper Leader选举
kafka架构
集群搭建
- 下载安装zookeeper
- kafka基本使用

前言

不使用集群请参考这个文章：https://www.cnblogs.com/luzhanshi/p/13369834.html

kafka 是个高吞吐的消息中间件，为啥快啊？
1顺序读写
2 0拷贝
3 批量提交和批量ACK
4 分片和副本

消息队列高可用就得用集群。

提示：以下是本篇文章正文内容，下面案例可供参考

一、kafka是什么？

kafka理论参考：
从入门到入土 04-2 kafka理论(面试)篇：
https://blog.csdn.net/qq_14919677/article/details/119867555

二、集群 1.集群

集群：多台服务器组成的整体叫做集群，这个整体对生产者和消费者来说，是透明的。其实对消费系统组成的集群添加一台服务器减少一台服务器对生产者和消费者都是无感之的。如果增加消费者（group）
kafka会重新分配你要去消费的分片；比如你1个消费端消费两个服务器组成的集群s1，s2 当你再增加一个消费者（group不一样的时候）会重新分配类似于c1 消费s1 ， c2 消费s2 ，意思就是重新负载

2.负载均衡

负载均衡：对消息系统来说负载均衡是大量的生产者和消费者向消息系统发出请求消息，系统必须均衡这些请求使得每一台服务器的请求达到平衡，而不是大量的请求，落到某一台或几台，使得这几台服务器高负荷或超负荷工作，严重情况下会停止服务或宕机。

3.扩容

扩容：动态扩容是很多公司要求的技术之一，不支持动态扩容就意味着停止服务，这对很多公司来说是不可以接受的。

4.Zookeeper Leader选举

选举：是通过ISR维护的节点列表集合（In-Sync Replicas）。
但是，为了保证较高的处理效率，消息的读写都是在固定的一个副本上完成。这个副本就是所谓的Leader，而其他副本则是Follower。而Follower则会定期地到Leader上同步数据。
同步数据分定期同步和定数量同步

kafka架构

生产者生产消息、kafka集群、消费者获取消息这样一种架构

kafka集群中的消息，是通过Topic（主题）来进行组织的，如下图：

这个有一个HW水平位的概念就是当你消费的时候是消费到副本同步的最小值如上图如果换成是Replication副本的化你最多消费到8 hw是8

1、主题（Topic）：一个主题类似新闻中的体育、娱乐、教育等分类概念，在实际工程中通常一个业务一个主题。
2、分区（Partition）：一个Topic中的消息数据按照多个分区组织，分区是kafka消息队列组织的最小单位，一个分区可以看作是一个FIFO（ First Input First Output的缩写，先入先出队列）的队列。
kafka分区是提高kafka性能的关键所在，当你发现你的集群性能不高时，常用手段就是增加Topic的分区，分区里面的消息是按照从新到老的顺序进行组织，消费者从队列头订阅消息，生产者从队列尾添加消息。

集群搭建下载安装zookeeper

我安装了3台服务器（centos 8.1）
分别为：192.168.1.41 192.168.1.42 192.168.1.43
如果搭建的过程中不想写复杂的ip地址那集群之间可以配置ssh免密码登录
免密登录参考: https://www.cnblogs.com/luzhanshi/p/13369797.html
1.那先去官网下载zookeeper （一个分布式协调服务,管理我们的集群）： http://zookeeper.apache.org/releases.html
然后通过MobaXterm上传到os系统。3台服务器尽量都放到同一个位置，创建同样的文件夹。
2. 菜单进入conf目录下面,将zoo_sample.cfg复制一份到本目录并改名为zoo.cfg ，如果是本来就是zoo.cfg文件的那就不用改了直接进去改配置。
3. 看下配置文件：

#编辑文件：
    vim zoo.cfg
    
----------------------------------------------------------------------------
# The number of milliseconds of each tick
#时间单元，zk中的所有时间都是以该时间单元为基础，进行整数倍配置(单位是毫秒,下面配置的是2秒)
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
#follower在启动过程中，会从leader同步最新数据需要的最大时间。如果集群规模比较大，可以调大该参数
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
#leader与集群中所有机器进行心跳检查的最大时间。如果超出该时间，某follower没有回应，则说明该follower下线
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
#事务日志输出目录  *********============这个一定要配置啊 尽量3台服务器都用一样的好管理好找============*********
dataDir=/public/kafka/zk/data
# the port at which the clients will connect
#客户端连接端口
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#需要保留文件数目，默认就是3个
autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#自动清理事务日志和快照文件的频率，这里是1个小时
autopurge.purgeInterval=1

#集群服务器配置，数字1/2/3需要与myid文件一致。右边两个端口，2888表示数据同步和通信端口；3888表示选举端口
server.1=域名1:2888:3888
server.2=域名2:2888:3888
server.3=域名3:2888:3888
如果没配置域名或者免密登录那就得用ip了如下：
server.1=192.168.1.41:2888:3888
server.2=192.168.1.42:2888:3888
server.3=192.168.1.43:2888:3888

4.data文件夹里要创建一个文件名字叫： myid 3台服务器里面分别写1 2 3 保持三台服务器的配置文件都一样的之后然后用命令启动。
5.启动zookeeper集群

#启动和停止：
 /public/kafka/zk/zookeeper3.7/bin/zkServer.sh start/stop
#查看集群状态：
    /public/kafka/zk/zookeeper3.7/bin/zkServer.sh status

出了问题就多看看文件夹里的log日志和防护墙可以用windows的telnet 或者是ping试试ip和端口通不通

安装kafka集群
下载kafka
#kafka官网：
http://kafka.apache.org/
http://kafka.apache.org/downloads
上传解压
MobaXterm上传到os系统。3台服务器尽量都放到同一个位置，创建同样的文件夹。
tar -xzvf ***.tar.gz or tar -xzvf 文件名.tar.gz
cd 进入kafka文件夹找到config配置文件再找到server.properties
开始配置
注意： broker.id=0 这个地方每个服务器不要一样第一台 0 第二台1 第三天2 要分别加1比较适宜
port=9092
#broker主机地址如果是ip地址的话就写ip地址 192.168.1.41
host.name=server1 or 192.168.1.41
指定kafka的log日志文件夹

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR ConDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
#每个broker在集群中的唯一标识，不能重复
broker.id=0
#端口
port=9092
#broker主机地址 如果是ip地址的话就写ip地址 192.168.1.41
host.name=server1

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092

# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured. Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
#advertised.listeners=PLAINTEXT://your.host.name:9092

# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
#broker处理消息的线程数
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
#broker处理磁盘io的线程数
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
#socket发送数据缓冲区
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
#socket接收数据缓冲区
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
#socket接收请求最大值
socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
#kafka数据存放目录位置，多个位置用逗号隔开
log.dirs=/usr/local/kafka/kafka_2.11-1.0.0/kfk-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
#topic默认的分区数
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
#恢复线程数
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
#默认副本数
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
# 1. Durability: Unflushed data may be lost if you are not using replication.
# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
#消息日志最大存储时间，这里是7天
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
#每个日志段文件大小，这里是1g
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
#消息日志文件大小检查间隔时间
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
#zookeeper集群地址
zookeeper.connect=server1:2181,server2:2181,server3:2181

# Timeout in ms for connecting to zookeeper
#zookeeper连接超时时间
zookeeper.connection.timeout.ms=6000

############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

配置完成之后开始启动 kafka集群

#分别在三台节点执行：node01/node02/node03
 ##启动kafka集群-daemon(以后台服务方式启动) 后面跟的是以配置文件启动
/public/kafka/kafka_2.13-2.5.0/bin/kafka-server-start.sh -daemon /usr/local/kafka/kafka_2.11-1.0.0/config/server.properties

 ## 停止kafka集群
/public/kafka/kafka_2.13-2.5.0/bin/kafka-server-stop.sh

查看是否有kafka进程使用 jps 命令如果没有这个命令的话就安装一个，自行百度是跟jdk有关系的

kafka基本使用

使用kafka tool 连接kafka集群 ip只写一个leader ip就可以了。
然后就可以使用C#连接了，C#连接的时候 broker地址就要写上所有的地址和端口

using MicrosoftExtensions.common;
using MicrosoftExtensions.implement;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace MicrosoftExtensions
{
    public class KafkaHelper: ConfulentKafka
    {
        string BrokerList = "";
        string TopicName = "";
        string GroupName = "";
        public KafkaHelper(string _BrokerList,string _TopicName,string _GroupName)
        {
            BrokerList = _BrokerList; 
            TopicName = _TopicName; 
            GroupName = _GroupName;
            this.SetKafkaParameter();
        }
        public KafkaHelper(string _TopicName, string _GroupName)
        { 
            BrokerList = string.IsNullOrWhiteSpace(BrokerList) ? ConfigHelper.GetNode("KafkaSetting:BrokerList") : BrokerList;
            TopicName = _TopicName;
            GroupName = _GroupName;
            this.SetKafkaParameter();
        }
        public override void SetKafkaParameter()
        {
            brokerList = BrokerList;
            topicName = TopicName;
            groupName = GroupName;
        }
        public override void Send(string logs)
        {
            Produce(logs);
        }
        public async Task SendAsync(string logs)
        {
            Task.Run(() =>
                Produce(logs)
            );
            
        }
        public Task Consume(Action action,bool showConsole)
        {
            Run_Consume((_key, _msg, _offset) =>
            {
                action.Invoke(_key, _msg, _offset);
                //    key = _key;
                //    msg = _msg;
                //    offset = _offset;
                //Console.WriteLine(_key);
                if(showConsole)
                Console.WriteLine(_offset+ "|" + _key + "|"+_msg);
                //Console.WriteLine(_offset);
            });
            return Task.CompletedTask;
        }
    }
}

using Confluent.Kafka;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace MicrosoftExtensions.Interface
{
    public abstract class AConfulentKafka
    {
        protected string brokerList, topicName, groupName;

        protected readonly string mode;

        /// 
        /// 设置brokerlist 地址，topic 的名称。 GroupName 消费者需要用
        /// 
        public abstract void SetKafkaParameter();

        public abstract Task Produce(T content);


        public AConfulentKafka(string consumerModel = "subscribe")
        {
            mode = consumerModel;
        }

        //	public delegate void CallBackFunction(string Key, string Msg, string offset);
        Action actionCallBackFunction;
        /// 
        ///  消费端拿到数据,告诉kafka数据我已经消费完了
        /// 
        /// 
        /// 
        /// 
        public void Run_Consume(Action callBackFunction)//string brokerList, List topics, string group
        {
            int i = 0;
            while (string.IsNullOrEmpty(brokerList) && string.IsNullOrEmpty(topicName) &&  i < 10)
            {
                Thread.Sleep(1000);
                i++;
            }

            var config = new ConsumerConfig
            {
                BootstrapServers = brokerList,
                GroupId = groupName,
                EnableAutoCommit = false,
                AutoOffsetReset = AutoOffsetReset.Earliest,
                EnablePartitionEof = true,
                //代表数据超过了6000没有处理完业务，则把数据给其他消费端
                // 一定要注意。。SessionTimeoutMs值一定要小于MaxPollIntervalMs
                SessionTimeoutMs = 6000,
                MaxPollIntervalMs = 10000,
            };
            const int commitPeriod = 1;
            // 提交偏移量的时候,也可以批量去提交
            using (var consumer = new ConsumerBuilder(config)
                .SetErrorHandler((_, e) => Console.WriteLine($"Error: {e.Reason}"))
                .SetPartitionsAssignedHandler((c, partitions) =>
                {
                    Console.WriteLine("kafka连接成功.....");
                    Console.WriteLine($"Assigned partitions: [{string.Join(", ", partitions)}]");
                    #region 指定分区消费
                    #endregion
                })
                .SetPartitionsRevokedHandler((c, partitions) =>
                {
                    Console.WriteLine($"Revoking assignment: [{string.Join(", ", partitions)}]");
                })
                .Build())
            {
                //消费者会影响在平衡分区，当同一个组新加入消费者时，分区会在分配
                consumer.Subscribe(topicName);
                try
                {
                    while (true)
                    {
                        try
                        {
                            var consumeResult = consumer.Consume();
                            if (consumeResult.IsPartitionEOF)
                            {
                                continue;
                            }
                            if (consumeResult.Offset % commitPeriod == 0)
                            {
                                string kafkaVal = consumeResult.Message.Value;
                                callBackFunction?.Invoke(
                                    consumeResult.Message.Key != null ? consumeResult.Message.Key.ToString() : null,
                                    kafkaVal,
                                    consumeResult.Offset.Value.ToString()
                                    //,(e)=>{ 

                                    //},""
                                    ); consumer.Commit(consumeResult);
                            }
                        }
                        catch (ConsumeException e)
                        {
                            Console.WriteLine($"Consume error: {e.Error.Reason}");
                        }
                    }
                }
                catch (OperationCanceledException)
                {
                    Console.WriteLine("Closing consumer.");
                    consumer.Close();
                }
            }
        }
    }
}

using Confluent.Kafka;
using MicrosoftExtensions.Interface;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace MicrosoftExtensions.implement
{
    public abstract class ConfulentKafka : AConfulentKafka
    {
        public ConfulentKafka(string consumerModel = "subscribe"):base(consumerModel)
        {
            //mode = consumerModel; //加载速度的建议 2适配
        }

		public abstract void Send(string msg);

		#region 生产者
		/// 
		/// 发送到kafka Broker 里
		/// 
		/// 
		public async override Task Produce(T content)
		{
			//if (string.IsNullOrEmpty(brokerList) && string.IsNullOrEmpty(topicName))
			//{
			//	return "parallel parameter is null";
			//}

			var config = new ProducerConfig
			{
				BootstrapServers = brokerList,
				Acks = Acks.All,
				EnableIdempotence = true,
				LingerMs = 3000,
				MessageSendMaxRetries = 3,//补偿重试，发送失败了则重试 
				//Partitioner = Partitioner.Random //存储分片为随机
			};

			using (var producer = new ProducerBuilder(config)
				//.SetValueSerializer(new CustomStringSerializer())
					.SetStatisticsHandler((o, json) =>
					{
						//	Console.WriteLine("json");
						//Console.WriteLine(json);
					})
				.Build())
			{
				try
				{
					// 建议使用异步，传说性能比较好
					// Key 注意是做负载均衡，注意： 比如，有三个节点，一个topic，创建了三个分区。。一个节点一个分区，但是，如果你在写入的数据的时候，没有写key,这样会导致，所有的数据存放到一个分区上面。。。
					//ps：如果用了分区，打死也要写key .根据自己的业务，可以提前配置好，
					// key的随机数，可以根据业务，搞一个权重，如果节点的资源不一样，合理利用资源，可以去写一个
					var deliveryReport = await producer.
					ProduceAsync(topicName, new Message { Key = (new Random().Next(1, 10)).ToString(), Value = content });
				//	Console.WriteLine($"delivered to: {deliveryReport.TopicPartitionOffset}");
				}
				catch (ProduceException e)
				{
					Console.WriteLine($"failed to deliver message: {e.Message} [{e.Error.Code}]");
				}
			}

			//return "ok";
		}
		#endregion
	}
}

appsetting.json

{
  "KafkaSetting": {
    "BrokerList": "192.168.1.41:9092,192.168.1.42:9092,192.168.1.23:9092", 
    "TopicName": "farmzairunloglog1",
    "GroupName": "farmzailog1",
    "NoLogs": "/api/Health/check;/api/Health/checkPost;"
  }
}

提示：这里对文章进行总结：
到此kafka集群搭建就结束了。代码也写好了，如有错误欢迎指教。祝好运！！！

kafka集群搭建-宿主机版本

大数据系统相关栏目本月热门文章