diagram.mmd — flowchart
Webhook Retry Strategy flowchart diagram

A webhook retry strategy defines the rules for re-attempting failed webhook deliveries — including which HTTP response codes trigger a retry, how long to wait between attempts using exponential backoff, and what happens when all retry attempts are exhausted.

What the diagram shows

This flowchart details the decision tree that a webhook dispatcher executes after receiving a non-successful delivery response:

1. Classify response: not all failures should be retried. A 200-299 means success. 4xx responses (except 408 Request Timeout and 429 Too Many Requests) indicate a permanent error — a misconfigured endpoint or unauthorized payload — where retrying won't help. 5xx responses and timeouts are transient and should be retried. 2. Increment attempt counter: the dispatcher increments the delivery attempt count for this event. 3. Check max attempts: most platforms attempt delivery 3-10 times over hours or days. GitHub webhooks retry for 3 days; Stripe retries for 3 days with up to 72 attempts. 4. Compute backoff: the wait time follows exponential backoff — min(base * 2^attempt, max_delay) — with optional jitter to spread load. 5. Schedule retry: the retry is scheduled as a delayed job in the queue. 6. Dead letter: exhausted retries move the event to a dead-letter store where it can be inspected, manually replayed, or trigger an alert to the consumer's account.

Why this matters

Consumers go down for maintenance, deployments, or unexpected failures. A robust retry strategy means events aren't permanently lost during those windows — they queue up and drain once the endpoint recovers. The exponential backoff prevents the dispatcher from hammering an already-overwhelmed consumer endpoint.

For the full delivery flow that precedes retries, see Webhook Delivery Flow. The retry algorithm mirrors Request Retry Logic used by HTTP clients. For dead-letter queue handling, see Messaging Dead Letter Queue.

Free online editor
Edit this diagram in Graphlet
Fork, modify, and export to SVG or PNG. No sign-up required.
Open in Graphlet →

Frequently asked questions

A webhook retry strategy defines the rules for re-attempting failed webhook deliveries — which HTTP response codes trigger a retry, how long to wait between attempts using exponential backoff, and what happens when all attempts are exhausted. It ensures events are not permanently lost when a consumer endpoint is temporarily unavailable.
After a failed delivery (a 5xx response, timeout, or retriable 4xx like 429), the dispatcher increments the attempt counter and computes the next retry delay: typically `min(base * 2^attempt, max_delay)` with optional jitter. The retry is scheduled as a delayed job in the queue. This continues until the delivery succeeds or the maximum attempt count is reached, at which point the event moves to a dead-letter store for manual inspection or replay.
Permanent failures — 4xx responses other than 408 (Request Timeout) and 429 (Too Many Requests) — indicate the consumer rejected the payload for a reason that won't be fixed by retrying. A 400 Bad Request means the payload was malformed, a 401 means authentication failed, and a 404 means the endpoint no longer exists. Retrying these wastes resources and delays the dispatcher; they should be moved to the dead-letter queue immediately.
Webhook retry strategy is server-side: the platform delivering the webhook controls retry scheduling and persistence across hours or days, because it owns the event and is responsible for at-least-once delivery to the consumer. Request retry logic is client-side: the application making an HTTP call decides whether to retry within a short window (typically seconds to minutes). Webhook retries are longer-horizon, externally visible, and typically configurable by consumers; request retries are internal and ephemeral.
mermaid
flowchart TD A([Webhook delivery response received]) --> B{HTTP status code} B -- 2xx Success --> C([Mark event delivered]) B -- 4xx except 408 or 429 --> D([Mark delivery permanently failed\nno retry]) B -- 408 or 429 or 5xx --> E[Classify as retriable failure] B -- Timeout or no response --> E E --> F[Increment attempt counter] F --> G{Attempts below maximum?} G -- Max attempts reached --> H[Move event to dead-letter store] H --> I[Send alert to consumer account] I --> J([End retry cycle]) G -- Retries remaining --> K[Compute backoff delay] K --> L[Apply jitter to backoff] L --> M[Schedule retry job with delay] M --> N([Wait for backoff period]) N --> O[Re-attempt webhook delivery] O --> A
Copied to clipboard