链接:《Object Storage on CRAQ: High-throughput chain replication for read-mostly workloads》论文总结 - BrianLeeLXT - 博客园
核心CRAQ本身也有很多地方基于Zookeeper实现,例如处理错误恢复或者split-brain问题等。
Zookeeper提供了API来统计集群信息以及实现选举、服务发现等功能。
Dubbo里面的服务发现模块是基于Zookeeper实现。
CRAQ与RAFT为不同的实现容错的机制,他通过分散写(每个master节点只向直接后继逐次传递而非传递给所有子节点)来降低写负载。
CRAQ每个节点都支持读操作,而RAFT等算法只有master节点支持读操作
Why can CRAQ serve reads from replicas linearizably but Raft/ZooKeeper/&c cannot?
Relies on being a chain, so that *all* nodes see each
write before the write commits, so nodes know about
all writes that might have committed, and thus know when
to ask the tail.
Raft/ZooKeeper can't do this because leader can proceed with a mere
majority, so can commit without all followers seeing a write,
so followers are not aware when they have missed a committed write.
虽然对读操作提供更强的支持能力,但这并不代表CRAQ优于RAFT等算法
Does that mean CRAQ is strictly more powerful than Raft &c? No. All CRAQ replicas have to participate for any write to commit. If a node isn't reachable, CRAQ must wait. So not immediately fault-tolerant in the way that ZK and Raft are. CR has the same limitation.
RAFT/ZOOKEEPER和CRAQ可以同时使用
How can we safely make use of a replication system that can't handle partition?
A single "configuration manager" must choose head, chain, tail.
Everyone (servers, clients) must obey or stop.
Regardless of who they locally think is alive/dead.
A configuration manager is a common and useful pattern.
It's the essence of how GFS (master) and VMware-FT (test-and-set server) work.
Usually Paxos/Raft/ZK for config service,
data sharded over many replica groups,
CR or something else fast for each replica group.



