Raft Consensus Algorithm: Mermaid Diagram

Raft Consensus Algorithm flowchart diagram

About Source

Raft is a distributed consensus algorithm designed to be more understandable than Paxos, providing a mechanism for a cluster of nodes to agree on a sequence of values even when some nodes fail or become unreachable.

Raft divides the consensus problem into three relatively independent sub-problems: leader election, log replication, and safety. At any given time, each node is in one of three states: Follower, Candidate, or Leader. Under normal operation a single Leader exists, followers passively replicate entries, and the cluster processes client requests entirely through the Leader.

Leader Election occurs when a follower's election timeout fires without receiving a heartbeat from a leader. The follower increments its current term, transitions to Candidate, votes for itself, and broadcasts RequestVote RPCs to all peers. A candidate becomes leader once it receives votes from a majority of nodes. Raft's randomized election timeouts (150–300 ms) reduce split-vote situations. See Leader Election Algorithm for the sequence-level detail.

Log Replication is the core of Raft. The leader appends the client's command to its own log, then sends AppendEntries RPCs in parallel to all followers. Once a majority acknowledges the entry, the leader commits the entry, advances its commit index, and applies the command to the state machine. Followers commit after they see the leader's updated commit index in the next heartbeat.

Safety is ensured by Raft's election restriction: a candidate cannot win unless its log is at least as up-to-date as a majority of nodes. "Up-to-date" is determined first by term number, then by log length. This guarantees that every committed entry appears in the log of any future leader, preventing data loss across elections.

Raft is the algorithm behind etcd, CockroachDB, TiKV, and Consul's leader subsystem. Compare it with Paxos Consensus Flow to understand why Raft's explicit leader model simplifies implementation. Cluster membership changes are handled by Raft's joint consensus or single-server change approaches, described alongside the broader Cluster Coordination Architecture.

Frequently asked questions

Raft is a distributed consensus protocol that ensures a cluster of nodes agrees on an ordered log of commands even when a minority of nodes fail or become unreachable. It decomposes the problem into three sub-problems — leader election, log replication, and safety — each with clear, independently understandable rules.

When a follower's randomised election timeout expires without a heartbeat from the leader, it increments its term, becomes a Candidate, and broadcasts `RequestVote` RPCs. Any node that has not voted in that term and whose log is no more up-to-date than the candidate's grants its vote. The first candidate to collect a strict majority becomes the new Leader.

Raft is the preferred choice when operational simplicity and debuggability matter. Its explicit leader, well-defined term concept, and clear log-matching property make it straightforward to implement correctly. Use Raft when building replicated state machines for databases, coordination services, or distributed queues.

The most frequent mistakes are incorrect log index tracking during snapshot installation, failing to reset the election timer on all valid AppendEntries messages (not just heartbeats), and allowing a new leader to commit entries from previous terms directly rather than committing them via a no-op entry in the current term.

Paxos is a family of protocols that treats leader election as an optimisation (Multi-Paxos) rather than a first-class concept, making it flexible but notoriously difficult to implement completely. Raft deliberately restricts design choices — one leader, sequential log commits, simple membership changes — trading generality for clarity. Both provide the same safety guarantees for a majority-quorum cluster.

mermaid

flowchart TD
    A([Client Request]) --> B[Leader receives command]
    B --> C[Append entry to leader log\nterm=T, index=N]
    C --> D{Send AppendEntries RPC\nto all followers}
    D --> E[Follower 1 appends entry]
    D --> F[Follower 2 appends entry]
    D --> G[Follower 3 appends entry]
    D --> H[Follower 4 — unreachable]
    E --> I{Majority\nacknowledged?}
    F --> I
    G --> I
    I -->|No — wait| D
    I -->|Yes — quorum reached| J[Leader commits entry\nadvances commitIndex]
    J --> K[Apply to state machine]
    K --> L([Respond to client])
    J --> M[Broadcast commit via\nnext AppendEntries heartbeat]
    M --> N[Followers apply committed entry]

    subgraph Election ["Leader Election"]
        O[Follower timeout] --> P[Become Candidate\nincrement term]
        P --> Q[Broadcast RequestVote]
        Q --> R{Majority\nvotes received?}
        R -->|Yes| S[Transition to Leader\nsend heartbeats]
        R -->|No — split vote| T[Wait random timeout\nretry election]
    end

    style Election fill:#f0f4ff,stroke:#6366f1