栏目分类:
子分类:
返回
名师互学网用户登录
快速导航关闭
当前搜索
当前分类
子分类
实用工具
热门搜索
名师互学网 > IT > 软件开发 > 后端开发 > Java

BookKeeper AutoRecovery

Java 更新时间: 发布时间: IT归档 最新发布 模块sitemap 名妆网 法律咨询 聚返吧 英语巴士网 伯小乐 网商动力

BookKeeper AutoRecovery

BookKeeper AutoRecovery 背景
  1. 版本:BookKeeper版本为4.12.0(Pulsar 2.7.0的内置版本)

  2. Recovery主要针对的场景是:当集群中有部分bookie节点因异常原因宕机,此时我们恢复该节点上存储的数据。BookKeeper提供了两种数据恢复方式,一种是手动恢复(Manual recovery),一种是自动恢复(AutoRecovery)。

Manual Recovery

如果集群中未开启AutoRecovery的功能,则用户可用手动恢复。

手动恢复有两种方式,一是恢复指定bookie节点的数据;二是恢复指定ledger的数据。

  1. 恢复指定bookie节点的数据(该命令可以在下线节点执行,也可以在正常bookie节点执行);

    ./bin/bookkeeper shell recover bookiehostname:3181
    
  2. 恢复指定ledger的数据;

    bin/bookkeeper shell recover 
      192.168.1.10:3181     # IP and port for the failed bookie
      --ledger ledgerID      # ledgerID which you want to recover 
    
AutoRecovery简介

AutoRecovery有三种部署模式。

  1. 与bookie节点集成。将bookie节点的bookkeeper.conf文件中将配置项autoRecoveryDaemonEnabled设置为true即可。
  2. 在专门的recovery node上执行。(这里需要注意的是,此时需要关闭bookie节点的autoRecoveryDaemonEnabled选项,否则,bookie节点也会参与replication的工作)
  3. 部分在bookie节点执行,部分在专门的recovery node执行。

备注:Pulsar默认开启AutoRecovery的功能,且也是采用第一种部署方式。

本文主要介绍第1中部署方式,即将AutoRecovery作为bookie的附属线程执行。下文的相关说明也是只针对第一种方式。(第2种和第3中方式笔者还未尝试)

关闭AutoRecovery

You can disable AutoRecovery for the whole cluster at any time, for example during maintenance. Disabling AutoRecovery ensures that bookies’ data isn’t unnecessarily rereplicated when the bookie is only taken down for a short period of time, for example when the bookie is being updated or the configuration if being changed.

  1. 关闭AutoRecovery
$ bin/bookkeeper shell autorecovery -disable
  1. 开启AutoRecovery
$ bin/bookkeeper shell autorecovery -enable
配置
  1. 确保bookkeeper.conf中autoRecoveryDaemonEnabled为true即可。
  2. 更多配置参考。BookKeeper AutoRecovery
测试AutoRecovery
  1. 查看当前集群中可用的bookie列表。
[test@mq5 middleware]$ ./bin/bookkeeper shell listbookies -rw
JMX enabled by default
16:18:19.118 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :
16:18:19.133 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:mq8:3181, IP:xxx, Port:3181, Hostname:mq8
16:18:19.134 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:mq7:3181, IP:xxx, Port:3181, Hostname:mq7
16:18:19.134 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:mq5:3181, IP:xxx, Port:3181, Hostname:mq5
16:18:19.134 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:mq6:3181, IP:xxx, Port:3181, Hostname:mq6
16:18:19.240 [Thread-1] WARN  org.apache.zookeeper.Login - TGT renewal thread has been interrupted and will exit.


  1. 查看当前集群中的ledger(命令输出节选)。
[test@mq5 middleware]$ ./bin/bookkeeper shell listledgers
# 以下节选部分输出
...
JMX enabled by default
org.apache.bookkeeper.tools.cli.commands.bookie.ListLedgersCommand - ledgerID: 480
16:21:33.704 [main-EventThread] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.ListLedgersCommand - ledgerID: 481
16:21:33.704 [main-EventThread] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.ListLedgersCommand - ledgerID: 482
16:21:33.704 [main-EventThread] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.ListLedgersCommand - ledgerID: 483
16:21:33.704 [main-EventThread] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.ListLedgersCommand - ledgerID: 484
16:21:33.704 [main-EventThread] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.ListLedgersCommand - ledgerID: 485
...
  1. 查看ledger 485的元数据。
[test@mq5 middleware]$ ./bin/bookkeeper shell ledgermetadata -l 485
JMX enabled by default
17:19:47.521 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgermetaDataCommand - ledgerID: 488
17:19:47.532 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgermetaDataCommand - Ledgermetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[mq8:3181, mq6:3181]}, custommetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90ZXN0Mg==, application=base64:cHVsc2Fy}}
17:19:47.639 [Thread-1] WARN  org.apache.zookeeper.Login - TGT renewal thread has been interrupted and will exit.

  • 观察该数据可知,该ledger存储在mq8和mq6节点上。
  • 该ledger的E,W,A为(2,2,2)
  • 该ledger存储的是public/default/persistent/test2的数据。通过命令echo "cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90ZXN0Mg=="| base64 -d得到
  1. 现在我们停止mq8上的bookie。
./bin/pulsar-daemon stop bookie
  1. 再次观察ledger485的元数据。发现还为发生变化。
[test@mq5 middleware]$ ./bin/bookkeeper shell ledgermetadata -l 485
JMX enabled by default
17:19:47.521 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgermetaDataCommand - ledgerID: 488
17:19:47.532 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgermetaDataCommand - Ledgermetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[mq8:3181, mq6:3181]}, custommetadata={component=base64:bWFuYWdlZC1sZWRnZXI=, pulsar/managed-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90ZXN0Mg==, application=base64:cHVsc2Fy}}
17:19:47.639 [Thread-1] WARN  org.apache.zookeeper.Login - TGT renewal thread has been interrupted and will exit.
  1. 查看当前在复制的ledger(如果ledger数据较小,则复制过程会比较快,可能看不到正在复制的ledger)
[test@mq5 middleware]$ ./bin/bookkeeper shell listunderreplicated
JMX enabled by default
15:58:23.932 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand - 485
15:58:23.938 [main] INFO  org.apache.bookkeeper.tools.cli.commands.autorecovery.ListUnderReplicatedCommand -    Cti
15:58:24.045 [Thread-1] WARN  org.apache.zookeeper.Login - TGT renewal thread has been interrupted and will exit.
[test@mq5 middleware]$ ./bin/bookkeeper shell listunderreplicated
JMX enabled by default
  1. 再次查看ledger485的元数据。
我们xxxxxxxxxx [test@mq5 middleware]$ ./bin/bookkeeper shell ledgermetadata -l 485JMX enabled by default17:22:52.300 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgermetaDataCommand - ledgerID: 48817:22:52.311 [main] INFO  org.apache.bookkeeper.tools.cli.commands.client.LedgermetaDataCommand - Ledgermetadata{formatVersion=3, ensembleSize=2, writeQuorumSize=2, ackQuorumSize=2, state=OPEN, digestType=CRC32C, password=base64:, ensembles={0=[mq7:3181, mq6:3181], 13=[mq5:3181, mq6:3181]}, custommetadata={pulsar/managed-ledger=base64:cHVibGljL2RlZmF1bHQvcGVyc2lzdGVudC90ZXN0Mg==, component=base64:bWFuYWdlZC1sZWRnZXI=, application=base64:cHVsc2Fy}}17:22:52.419 [Thread-1] WARN  org.apache.zookeeper.Login - TGT renewal thread has been interrupted and will exit.shell

我们可以观察下ledger位置的变化。

在停掉bookie8之前,元数据显示ensemble为

ensembles={0=[mq8:3181, mq6:3181]},

在停掉之后,元数据显示ensemble为:

ensembles={0=[mq7:3181, mq6:3181], 13=[mq5:3181, mq6:3181]}

解释:该现象说明:

  • 开始时,ledger放置在mq8和mq6上;
  • 在Recovery之后,该ledger的entryId为【0,12】的entry放置在mq7和mq6上,entryId大于等于13的entry放置在mq5和mq6上。
  • Recovery体现在哪里?因为节点8挂掉了,所以原本entryid范围为【0,12】的数据,又拷贝了一份放置在了mq7上。
参考
  1. BookKeeper Auto Recovery 文档
  2. 《深入理解Apache Pulsar》
转载请注明:文章转载自 www.mshxw.com
本文地址:https://www.mshxw.com/it/591040.html
我们一直用心在做
关于我们 文章归档 网站地图 联系我们

版权所有 (c)2021-2022 MSHXW.COM

ICP备案号:晋ICP备2021003244-6号