Database Replication: Mermaid Flowchart Diagram

About Source

Database replication is the process of copying and maintaining the same data across multiple database servers so that every node reflects a consistent (or eventually consistent) view of the dataset.

This diagram shows the topology of a standard replication setup. A single Primary node accepts all write operations. Every INSERT, UPDATE, and DELETE is recorded in a binary log (or write-ahead log, depending on the engine). The replication process reads those log entries and forwards them to one or more Replica nodes, which apply the changes to their own storage in the same order.

The diagram distinguishes between synchronous and asynchronous replication paths. In synchronous mode the primary waits for at least one replica to confirm the write before acknowledging success to the client — this guarantees zero data loss on failover but increases write latency. In asynchronous mode the primary acknowledges immediately; replicas catch up in the background. This is the default in MySQL and PostgreSQL streaming replication. The trade-off is a replication lag window during which the replica's data is stale.

Replicas typically serve read traffic only. Routing reads to replicas reduces load on the primary and allows horizontal read scaling. The Read Write Splitting diagram shows how application-layer proxies or drivers implement this routing. When the primary fails, one replica must be promoted — the Database Failover diagram traces that promotion sequence in detail.

For developers, replication is the starting point for any high-availability database architecture. However, reading from replicas means accepting that data may be milliseconds to seconds behind the primary. Applications that require read-your-writes consistency must either route those reads to the primary or implement session-level stickiness. The Primary Replica Sync diagram shows the precise message exchange that drives log shipping between nodes.

Frequently asked questions

Database replication is the process of copying and maintaining the same data across multiple database servers. A primary node accepts all writes and ships its log to one or more replicas, which apply changes in order so that every node reflects a consistent view of the dataset.

In synchronous replication the primary waits for at least one replica to confirm it has written the log entry before acknowledging the write to the client. This guarantees zero data loss on failover but adds write latency. In asynchronous replication the primary acknowledges immediately and replicas catch up in the background, introducing a replication lag window during which replicas hold stale data.

Use replication whenever you need high availability (replica promotion on primary failure), read scaling (serving SELECT queries from replicas), or geographic distribution (replicas in other regions reduce read latency for remote users). Replication is the foundational layer for almost all production database architectures and should be considered the default, not an advanced option.

Replication lag grows when the primary generates writes faster than the replica can apply them — typically due to high write volume, slow replica hardware, network congestion, or long-running replica queries blocking log application. Reduce lag by ensuring replicas have comparable hardware to the primary, monitoring `pg_stat_replication` (PostgreSQL) or `SHOW SLAVE STATUS` (MySQL), avoiding long-running queries on replicas, and considering synchronous replication for critical standbys.

mermaid

flowchart LR
    Client[Client Application] --> Primary[(Primary DB)]
    Primary --> BinLog[Binary Log]
    BinLog --> RepProcess[Replication Process]
    RepProcess -->|Async| Replica1[(Replica 1)]
    RepProcess -->|Async| Replica2[(Replica 2)]
    RepProcess -->|Sync| Replica3[(Replica 3 Sync)]
    Replica3 -->|Ack| Primary
    Replica1 --> ReadTraffic[Read Queries]
    Replica2 --> ReadTraffic
    Primary --> WriteAck[Write Acknowledged]
    WriteAck --> Client