Saga Pattern
The saga pattern is a distributed transaction strategy that achieves data consistency across multiple microservices by breaking a multi-step operation into a sequence of local transactions, each publishing an event or message that triggers the next step, and using compensating transactions to undo completed steps if a later step fails.
The saga pattern is a distributed transaction strategy that achieves data consistency across multiple microservices by breaking a multi-step operation into a sequence of local transactions, each publishing an event or message that triggers the next step, and using compensating transactions to undo completed steps if a later step fails.
Traditional ACID transactions work within a single database. In microservice architectures, an operation like "place an order" spans multiple services — each owning its own database. Two-phase commit (2PC) would work in theory, but it introduces tight coupling, blocking locks, and brittleness under partial failure. Sagas replace 2PC with an orchestrated or choreographed sequence of local transactions.
In the orchestrator-based saga shown here, a dedicated Order Orchestrator drives the workflow. It issues commands to each participating service sequentially and waits for responses before proceeding. This makes the workflow logic centralized and observable — you can query the orchestrator at any point to know where a saga stands. The trade-off is that the orchestrator becomes a critical path dependency.
Failure handling is what distinguishes sagas from simple chained calls. When a step fails — say, inventory is unavailable — the orchestrator issues compensating transactions in reverse order: it asks the Payment Service to release the reserved funds, then marks the order as failed. Compensating transactions are not rollbacks in the database sense; they are explicit business operations ("cancel reservation") that undo the effect of the forward transaction.
The choreography variant eliminates the orchestrator: each service emits domain events and other services react. This is more loosely coupled but harder to observe and debug. Both approaches are compatible with Event Sourcing Pattern and the Dead Letter Queue pattern for handling saga step failures that exhaust their retry budget.