diagram.mmd — sequence
Leader Election Algorithm sequence diagram

A leader election algorithm is a distributed coordination protocol that designates exactly one node as the active coordinator (leader) among a set of candidate nodes, with the cluster automatically repeating the election if the current leader fails.

Most modern distributed systems use a variation of the election mechanism found in Raft. When a follower node stops receiving heartbeats from the current leader within an election timeout window, it assumes the leader has failed and initiates a new election.

Election Trigger: Each follower runs an independent timer, reset whenever a heartbeat arrives. Randomized timeouts (typically 150–300 ms) reduce the chance of two followers timing out simultaneously and splitting the vote.

Vote Request Phase: The candidate increments its current term number, votes for itself, and broadcasts RequestVote(term, lastLogIndex, lastLogTerm) to every other node. The lastLogIndex and lastLogTerm fields encode the candidate's log state, enabling voters to reject candidates whose logs are less up-to-date than their own — Raft's key safety guarantee.

Voting Rules: A node grants its vote if (1) it has not already voted in this term, and (2) the candidate's log is at least as up-to-date as the voter's. Each node votes for at most one candidate per term (first-come, first-served).

Leader Announcement: Once the candidate receives votes from a strict majority of nodes (⌊N/2⌋ + 1), it transitions to Leader and immediately broadcasts AppendEntries heartbeats to suppress any competing elections and re-establish authority.

Split Vote Recovery: If no candidate wins a majority, all candidates wait out their election timeout and retry in a new term. Randomized timeouts ensure eventual convergence. See Raft Consensus Algorithm for how the elected leader then manages log replication, and Cluster Coordination Architecture for how leaders integrate with the broader cluster control plane.

Free online editor
Edit this diagram in Graphlet
Fork, modify, and export to SVG or PNG. No sign-up required.
Open in Graphlet →

Frequently asked questions

A distributed leader election algorithm is a coordination protocol that causes a set of nodes to unanimously select exactly one node as the active coordinator. The leader handles requests on behalf of the group and the cluster automatically re-runs the election if the leader fails or becomes unreachable.
When a follower's election timer expires, it increments the cluster term, declares itself a Candidate, and requests votes from every peer. Each peer grants its vote only once per term and only to a candidate whose log is at least as up-to-date. The first candidate to collect a strict majority wins and immediately sends heartbeats to suppress competing elections.
Use leader election any time you need a single authoritative coordinator — for example, to serialise writes to a replicated state machine, to own a distributed lock, or to drive a control loop that must run on exactly one node. Systems like etcd, CockroachDB, and Kafka's KRaft controller all rely on it.
Frequent mistakes include using fixed election timeouts (causing repeated split votes under load), not persisting the voted-for field to stable storage (allowing a node to vote twice after a restart), and forgetting to step down as leader when a higher term is observed in any incoming message.
mermaid
sequenceDiagram participant N1 as Node 1 (Follower) participant N2 as Node 2 (Candidate) participant N3 as Node 3 (Follower) participant N4 as Node 4 (Follower) participant N5 as Node 5 (Follower) note">Note over N1,N5: Current leader has failed — no heartbeats note">Note over N2: Election timeout fires\nterm = 4 → 5, vote for self N2->>N1: RequestVote(term=5, lastIndex=42, lastTerm=4) N2->>N3: RequestVote(term=5, lastIndex=42, lastTerm=4) N2->>N4: RequestVote(term=5, lastIndex=42, lastTerm=4) N2->>N5: RequestVote(term=5, lastIndex=42, lastTerm=4) N1-->>N2: VoteGranted(term=5) N3-->>N2: VoteGranted(term=5) N4-->>N2: VoteGranted(term=5) N5-->>N2: VoteDenied(term=5) — log not up-to-date note">Note over N2: 4 votes including self = majority (3 of 5)\nTransition to LEADER N2->>N1: AppendEntries heartbeat(term=5, leaderId=N2) N2->>N3: AppendEntries heartbeat(term=5, leaderId=N2) N2->>N4: AppendEntries heartbeat(term=5, leaderId=N2) N2->>N5: AppendEntries heartbeat(term=5, leaderId=N2) note">Note over N1,N5: All nodes recognize N2 as leader for term 5
Copied to clipboard