diagram.mmd — flowchart
Cluster Coordination Architecture flowchart diagram

Cluster coordination architecture describes the set of services and protocols that allow multiple nodes in a distributed system to agree on shared state — including cluster membership, configuration, distributed locks, and leader identity — providing a consistent foundation for higher-level applications.

Coordination is hard because it requires strong consistency in the face of network partitions and node failures. Systems that need cluster coordination typically rely on a dedicated coordination service (ZooKeeper, etcd, Consul) backed by a consensus algorithm like Raft to guarantee linearizability.

Coordination Service: The coordination service (e.g., etcd) runs as a small cluster of 3 or 5 nodes. Its own internal consensus guarantees that writes are durable and reads are linearizable. Application services talk to this cluster through a client library, watching keys for changes with low-latency push notifications.

Distributed Locking: Services that need mutual exclusion (e.g., only one instance should run a migration at a time) acquire a lease from the coordination service. The lease is associated with a time-to-live (TTL); if the holder crashes, the lock expires automatically. A fencing token (monotonically increasing version number) prevents a slow lock holder from corrupting data after the lease expires. See Distributed Locking for the full protocol.

Service Discovery and Configuration: Workers register their address and health status in the coordination service. Clients watch these registrations and update their routing tables when workers join or leave. Global configuration values (feature flags, rate limits) are also stored here, enabling zero-downtime configuration changes pushed to all instances simultaneously.

Leader Election in Applications: Application-level leader election (distinct from the coordination service's internal election) is implemented by multiple instances competing to create the same key with a TTL. The instance that creates the key first wins; others watch the key and take over if it disappears. Compare with Leader Election Algorithm for the lower-level Raft-based mechanism.

Free online editor
Edit this diagram in Graphlet
Fork, modify, and export to SVG or PNG. No sign-up required.
Open in Graphlet →

Frequently asked questions

Cluster coordination architecture is the set of services and protocols that allow distributed nodes to agree on shared state — including cluster membership, distributed locks, configuration values, and leader identity. It provides a consistent, fault-tolerant foundation that higher-level application services rely on to make safe decisions without stepping on each other.
A coordination service such as etcd, ZooKeeper, or Consul runs as a small cluster (typically 3 or 5 nodes) backed by a consensus algorithm like Raft. Clients write and watch keys through a client library. The consensus layer guarantees linearisable reads and durable writes, while push notifications (watches) allow clients to react to membership and configuration changes with low latency.
Use a dedicated coordination service whenever multiple services or instances need to agree on something — who holds a distributed lock, which instance is the leader, or what the current configuration value is. Trying to implement these primitives directly in application code without a consensus-backed service leads to subtle race conditions and split-brain scenarios under partition.
The most dangerous mistake is ignoring fencing tokens: a slow lock holder can continue acting after its lease expires and corrupt shared state. Other common issues include running the coordination service with an even number of nodes (reducing fault tolerance), watching too many keys per client (causing thundering-herd reconnects), and relying on TTL-based locks without handling clock skew.
Cluster coordination services (etcd, ZooKeeper) provide general-purpose primitives — key-value storage, watches, and leases — backed by a consensus algorithm for durability. Two-phase commit (2PC) is a specific protocol for atomically committing or aborting a transaction across multiple independent resource managers. Coordination services are used to implement distributed locks and leader election; 2PC is used to achieve atomic cross-service writes in distributed transactions.
mermaid
flowchart TD subgraph CoordCluster ["Coordination Cluster — etcd — 3 nodes — Raft consensus"] E1[etcd Leader] E2[etcd Follower 1] E3[etcd Follower 2] E1 <-->|Raft replication| E2 E1 <-->|Raft replication| E3 end subgraph AppLayer ["Application Services"] A1[Service A — Instance 1] A2[Service A — Instance 2] A3[Service B — Worker] A4[Service B — Worker] end subgraph UseCases ["Coordination Use Cases"] UC1["Distributed Lock\n- acquire lease with TTL\n- fencing token prevents stale writes"] UC2["Service Discovery\n- register address + health\n- watch prefix for changes"] UC3["Leader Election\n- compete to create key\n- winner is app leader"] UC4["Config Management\n- store feature flags\n- push updates to all watchers"] end A1 -->|gRPC watch| E1 A2 -->|gRPC watch| E1 A3 -->|gRPC watch| E2 A4 -->|gRPC watch| E3 E1 --> UC1 E1 --> UC2 E1 --> UC3 E1 --> UC4 subgraph HealthCheck ["Health and Membership"] H1[Worker registers health key\nwith TTL=30s] --> H2[Worker refreshes TTL\nevery 10s] H2 --> H3{TTL expires\nno refresh?} H3 -->|Yes — worker dead| H4[Key deleted — watchers\nnotified of departure] H3 -->|No| H2 end style CoordCluster fill:#eff6ff,stroke:#3b82f6 style UseCases fill:#f0fdf4,stroke:#22c55e style HealthCheck fill:#fef3c7,stroke:#f59e0b
Copied to clipboard