Raft——分布式共识_数据挖掘与分析

Raft——分布式共识

作用

Raft is a protocol for implementing distributed consensus.

实现分布式共识的协议。

角色划分

基本原理

Leader Election

所有节点最开始均为follower，若没有监听到leader，则会成为candidate。
candidate可向其他节点发起投票，其他节点若非candidate且在本轮次还没有参与投票，则会给该candidate投票，获得大多数票数的candidate成为leader，即至少获得N/2+1票，N为节点数。
candidate可以给自己投票，同时可存在多个candidate。

Log Replication

选举时间超时

Raft算法中有两次时间超时设置用于控制选举，一是election timeout，二是heartbeat timeout。
election timeout是大多数时候follower成为candidate的原因，election timeout的时间随机取150ms~300ms的值，一旦某一个或一些follower发生election timeout，就会变为candidate角色，并为自己投票并向其他节点发出投票信息，其他节点（follower）若本轮次还未投票，就会为该候选者投票并为节点自身（follower）重新设置election timeout；若某一候选者获得N/2+1的选票，就成为leader。
heartbeat timeout：每次follower发出heartbeat后会设置一个定时器，若设定时间内没有继续收到leader发来的heartbeat（比如leader服务器宕机），follower将会变成candidate，开始新一轮的选举。

特例

若总共偶数个节点，其中两个节点同时成为candidate并同时向其他节点发起投票选举，此时这两个candidate将会平分全部节点的选票，两者都无法成为获得大多数选票的一方，此时各个节点会设置定时器，直到出现candidate开启新一轮的选举。
若2个partition（一个partition一个leader），共奇数个节点，假定partition A的节点数多于partition B 的节点数，A和B的leader分别接收来自client的数据。partition B在接收数据后，发现无法成为大多数从而无法 commit log，轮次停留在 t 轮。partition A的leader在接受数据后，更新partition A内的log并commit成功（A为大多数）。partition B的leader发现partition A的轮次高于自己，判断自身信息更新太慢了，就主动下台降级为follower，此时便又只有一个分区，原partition B的节点将会回滚未提交的entry并对齐leader的log。

动画演示

Raftms

Raft——分布式共识