Gossip Protocol
A gossip protocol (also called an epidemic protocol) is a decentralized peer-to-peer communication method in which each node periodically selects one or more random peers and exchanges state information, causing updates to spread through the cluster in a manner similar to the spread of a rumor or infection.
A gossip protocol (also called an epidemic protocol) is a decentralized peer-to-peer communication method in which each node periodically selects one or more random peers and exchanges state information, causing updates to spread through the cluster in a manner similar to the spread of a rumor or infection.
Gossip protocols achieve remarkable reliability without any central coordinator. Each node maintains a membership list — a set of known peers with associated state (alive, suspected dead, metadata). On every gossip interval (typically 200–500 ms), a node selects k random peers (fanout, usually 2–3) and pushes or exchanges its view of the cluster.
Information Spreading: New state (a node joining, failing, or updating metadata) reaches all nodes in O(log N) gossip rounds. Because every node independently propagates what it knows, the protocol tolerates message loss and node failures gracefully — information simply takes a bit longer to arrive via an alternate path.
Failure Detection: Gossip protocols often integrate a suspicion mechanism (SWIM — Scalable Weakly-consistent Infection-style Membership). When a node A fails to receive a direct ACK from node B, it asks k other nodes to ping B indirectly. If none succeed, B is marked Suspected. After a timeout with no refutation, B is declared Failed and the information gossiped to the cluster.
Convergence and Overhead: Gossip scales to thousands of nodes with O(N log N) total messages per round — linear in N nodes × logarithmic rounds to full convergence. Message size is bounded because nodes exchange only deltas or digests. Systems using gossip include Apache Cassandra (ring membership, anti-entropy repair), Redis Cluster (slot assignments), Consul (health status), and HashiCorp's Serf. See Cluster Coordination Architecture for how gossip fits into broader cluster management, and Distributed Hash Table for how membership data drives key routing.