|

Mastering Spring & MSA Transactions – Part 17: Understanding the SAGA Pattern: Theory & Key Concepts

When microservices each own their own database, a single global transaction across multiple services is typically unfeasible. Rather than forcing a “one-shot” commit or rollback (like 2PC), the SAGA pattern coordinates each service’s local transaction in a way that either all services complete successfully or any partial changes are undone by compensating transactions. This part focuses on the theory behind SAGA: the motivation, the structure (local commits plus compensation), and the core advantages and challenges.

1) What Exactly Is a SAGA?

1.1) Local Transactions

In a monolithic environment, a single @Transactional block can wrap multiple steps (e.g., “create order,” “deduct inventory,” “charge payment”). With microservices, each step runs in a separate service’s local DB:

  1. Service A updates its own DB, commits locally.
  2. Service B does the same, and so on.

If all steps succeed, you have eventual or “logical” completeness. But if step 3 fails, we need a way to revert steps 1 and 2. That’s where “compensation” steps come in.

1.2) Compensation Transactions

A SAGA is basically:

  1. A series of local commits in each microservice,
  2. If any step fails, run an undo or reverse operation on each previously successful service, restoring them to a consistent state.

Thus, no single “global lock” is needed. Each service commits or rolls back only its data, but the SAGA flow ensures we either get a complete final success or a final rollback across all services—achieved by chaining these local commits with potential compensation steps.

2) Why SAGA Instead of 2PC?

2.1) 2PC (Two-Phase Commit) in Distributed Systems

  • Historically: 2PC tries to unify multiple resources under a single transaction manager.
  • Issues in Microservices:
    1. Performance: Each participant must hold resources in “prepared” state, blocking concurrency.
    2. Coordinator Bottleneck: If the coordinator fails, all participants may remain stuck.
    3. Tight Coupling: Microservices aim for loosely coupled, independent deployments—2PC reintroduces central dependencies.

2.2) SAGA’s Lightweight, Scalable Approach

  • Local: Each microservice commits or rolls back in isolation—no global coordinator forcing a synchronous lock.
  • Recovery: If a later step fails, earlier steps are compensated.
  • Trade-off: Designing effective compensation can be tricky—especially if “undo” logic is more than a simple revert (e.g., an item shipped can’t always be “unshipped”).

Result: SAGA embraces the reality of partial failures in distributed systems rather than trying to force a single synchronous commit.

3) Core Structure: Steps & Compensation

3.1) Forward Steps

A SAGA often breaks down a business flow (like “place order, pay, ship”) into discrete steps:

  1. Order microservice commits (creates order record).
  2. Payment microservice commits (charges card).
  3. Inventory or Shipping microservice commits (reserves or ships items).

If each step completes, we have an overall success. If step 2 fails, step 1 must be reversed (cancel the order). If step 3 fails, steps 1 and 2 must revert (refund payment, revert order status).

3.2) Compensation Steps

Each forward step has an inverse:

  • OrderCreated → OrderCanceled
  • PaymentCharged → PaymentRefund
  • InventoryReserved → InventoryReleased

You define these “undo” transactions so that, logically, you restore the system’s prior state. This might be direct (adding stock back) or more complex if the domain can’t purely “un-ship” a product that’s physically gone.

4) Pros & Cons of the SAGA Pattern

4.1) Pros

  1. Independent Commits
    • Each microservice commits quickly, avoiding the overhead of waiting for all participants to sync.
  2. Loosely Coupled
    • No single transaction coordinator locks everything. Each service is responsible for its local DB.
  3. Better Scalability & Resilience
    • If one service is slow or partial offline, the entire system isn’t blocked. A failure triggers compensation rather than halting the entire flow.

4.2) Cons

  1. Compensation Complexity
    • Writing correct “undo” logic can get complicated, particularly if external side effects are involved (e.g., shipping physically delivered).
  2. No Immediate Consistency
    • At any time before the final step completes, data might be partially updated across services. Some domain logic must handle these incomplete states.
  3. Eventual Consistency Requires Diligence
    • Monitoring, logging, and ensuring each step eventually sees the correct final state can be non-trivial.

5) Typical SAGA Flow Example

Scenario: “Order → Payment → Inventory”

  1. OrderService: createOrder() → local DB commit.
  2. PaymentService: chargeCard() → local DB commit.
  3. InventoryService: reserveStock() → local DB commit.
  4. All success => SAGA success.
  5. If step 2 fails => call cancelOrder() in OrderService. Possibly no need to refund if Payment never succeeded.
  6. If step 3 fails => call refundCard() in PaymentService, cancelOrder() in OrderService.

Hence, each microservice’s local transaction stands on its own. SAGA glues them together with a “commit or undo” approach.

6) Choreography vs. Orchestration

Though we won’t dive deep into either mechanism here, be aware:

  1. Choreography uses events. Services publish success/fail events, other services subscribe and decide next steps or compensation.
  2. Orchestration has a central “Saga Orchestrator” that calls each service in turn and triggers compensation if something fails.

Both achieve similar outcomes but with different trade-offs in complexity, coupling, and traceability.

7) Practical Considerations

  1. Idempotent Compensation
    • If the orchestrator or event bus tries to run the same compensation step multiple times (due to retry), is your “undo” logic idempotent (only revert once)?
  2. Rollback “Impossible” Cases
    • Some real-world actions are not fully revertible (e.g., shipping physically out the door). You might define partial refunds or alternative flows.
  3. Communication
    • If a service is unreachable, you might queue compensation requests until it returns. Ensuring eventual consistency across partial downtime is vital.
  4. Monitoring
    • Observability is key. In a big SAGA, you need logs or distributed tracing to see if the flow ended up fully committed or canceled and which service triggered compensation.

8) Conclusion

The SAGA pattern stands as a practical alternative to 2PC for distributed transactions in microservices, letting each service commit or roll back purely on its own local DB while still achieving end-to-end consistency through compensation. It involves more domain logic—especially for “undo” steps—but it scales and remains more resilient to partial failures than a single global lock step.

By understanding the theory behind SAGA—local commits, compensation transactions, eventual rather than synchronous consistency—you can build microservices that each run @Transactional logic for their own data while collectively guaranteeing the bigger workflow either fully succeeds or is undone.

The book cover of 'Future-Proof Your Java Career With Spring AI', a guide for enterprise Java developers on becoming AI Orchestrators.

Enjoyed this article? Take the next step.

Future-Proof Your Java Career With Spring AI

The age of AI is here, but your Java & Spring experience isn’t obsolete—it’s your greatest asset.

This is the definitive guide for enterprise developers to stop being just coders and become the AI Orchestrators of the future.

View on Amazon Kindle →

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.