Message Deduplication: Mermaid Flowchart

About Source

Message deduplication is the process of detecting and discarding duplicate message deliveries at the consumer side, ensuring that a business operation is executed exactly once even when the messaging infrastructure delivers the same message more than once.

In any at-least-once delivery system — which includes most production message queues and event streaming platforms — a message may be delivered multiple times. This happens after consumer restarts (replaying from the last committed offset), after network timeouts (producer retries that succeed on the second attempt), or during rebalances in systems like Kafka. Without deduplication, these duplicates translate into double charges, duplicate notifications, or corrupt aggregate counts.

The standard solution is an idempotency key: a unique identifier embedded in every message at the time of creation (a UUID, or a domain-specific key like payment-{order_id}). When the consumer receives a message, it checks the idempotency key against a deduplication store — typically a Redis set with a TTL matching the maximum expected redelivery window (minutes to hours). If the key is already present, the message is a duplicate and is acknowledged without reprocessing. If absent, the key is written to the store and processing proceeds.

The deduplication window TTL is a critical tuning parameter. Too short and late redeliveries slip through; too long and the store grows unbounded. For stream processors, the Stream Processing Pipeline often embeds deduplication as an explicit stage. In Kafka, idempotent producers (enable.idempotence=true) handle producer-side deduplication at the broker level, but consumer-side deduplication remains the application's responsibility. See Idempotent Consumer for the broader design pattern.

Frequently asked questions

Message deduplication is the process of detecting and discarding duplicate message deliveries at the consumer side. In at-least-once delivery systems, the same message may arrive multiple times due to consumer restarts, producer retries, or broker redelivery after acknowledgment loss. Deduplication ensures the business operation executes exactly once despite these duplicates.

Each message carries a unique idempotency key — a UUID or domain-specific key such as `payment-{order_id}`. When the consumer receives a message, it checks the key against a deduplication store (typically a Redis set). If the key exists, the message is a duplicate: acknowledge and skip. If absent, write the key and process. The store key is given a TTL matching the maximum expected redelivery window.

Use message deduplication in any at-least-once delivery system where duplicate processing causes incorrect outcomes — duplicate charges, duplicate notifications, or corrupted counters. It is essential whenever the consumer is not inherently idempotent (i.e., the operation is not a safe repeated set-based write).

mermaid

flowchart TD
    MQ[Message Queue] -->|deliver message\nwith idempotency key| C[Consumer]

    C -->|check key| DS{Deduplication\nStore}

    DS -->|key NOT found\nnew message| PROC[Process Message]
    DS -->|key FOUND\nduplicate| SKIP[Skip Processing]

    PROC -->|store key with TTL| DS
    PROC -->|execute business logic| BL[Write to Database\nSend Email, etc.]
    PROC -->|acknowledge| MQ

    SKIP -->|acknowledge\nno-op| MQ

    BL --> DONE[Operation Complete]

    subgraph DedupStore[Deduplication Store - Redis]
        DS
        TTL[TTL: 24 hours\nper key]
    end

    style SKIP fill:#fa0,color:#000
    style DONE fill:#4a4,color:#fff