Distributed Transactions Cheat Sheet

Distributed transactions guide — two-phase commit, Saga pattern, compensating transactions, eventual consistency patterns, and how to maintain data consistency ac.

Last Updated: May 1, 2025

Consistency Models

ModelGuaranteePerformanceComplexityWhen to Use
ACID (2PC/XA)Strong — all nodes commit or all rollback. Linearizable.Low — locks held for entire transaction. Blocks on coordinator failure.High — need transaction manager, XA drivers.Payment systems, financial ledgers, inventory count.
BASE (Saga)Eventually consistent. Each step commits locally.High — no distributed locks. Steps process independently.Moderate — need compensating transactions for rollback.Order management, reservations, multi-service workflows.
TCC (Try-Confirm-Cancel)Resource-level reservation (Try), then Confirm or Cancel.High — resources reserved, not locked.High — two-phase at resource level.Seat booking, warehouse allocation, hotel reservations.
Outbox PatternAtomic write of state + event in local DB. Guaranteed publish.High — no distributed coordination.Moderate — needs outbox processor (polling or CDC).Domain event publication, event-driven microservices.

Two-Phase Commit (2PC / XA)

ItemDescription
Phase 1: PrepareCoordinator asks ALL participants: 'Can you commit?' Each participant writes undo+redo logs to disk, votes YES/NO. Locks held on resources until phase 2.
Phase 2: CommitIf ALL vote YES: coordinator writes commit decision to log, sends COMMIT to all. If ANY vote NO: coordinator sends ROLLBACK to all. Participants release locks.
Blocking ProblemCoordinator failure after sending PREPARE but before COMMIT → participants block forever holding locks. The single-point-of-failure that makes 2PC impractical at scale.
XA ProtocolStandard 2PC implementation: XA interface between transaction manager and resource managers (databases). javax.transaction.xa in Java. Most RDBMS support XA.
3PC (Three-Phase Commit)Adds pre-commit phase and timeouts to avoid indefinite blocking. In theory, non-blocking. In practice, rarely used — added complexity and message overhead.
When 2PC Makes SenseSingle-digit number of participants. Short-lived transactions (milliseconds). High cost of inconsistency (financial transfers). Within a single cloud region.

Saga Pattern

ItemDescription
Choreographed SagaEach service publishes events after local transaction. Next service subscribes and reacts. No central coordinator. Low coupling, harder to reason about end-to-end flow.
Orchestrated SagaCentral saga orchestrator sends commands and listens for outcomes. Explicit workflow: step 1 → step 2 → step 3. Easier to test, debug, and version control.
Compensating TransactionsUndo already-committed steps when saga fails. Semantic undo — NOT a database rollback. Example: 'cancel order' email to warehouse, not 'DELETE FROM orders'.
Saga Implementation StatesPending (in progress), Completed (all steps succeeded), Compensating (failure detected, undoing steps), Compensated (all undo steps done), Failed (compensation also failed = manual intervention).
Idempotency in SagasEvery saga step must be idempotent: retry on timeout must not double-charge. Each step stores saga_id + step_id for dedup.
Failure ModesLost response (step succeeded but orchestrator didn't get ack), duplicate execution, out-of-order delivery. Handle all three or your saga will fail in production.

Practical Patterns

Outbox Pattern
UPDATE orders SET status='paid', outbox(event_type='OrderPaid', payload=json) — one DB transaction. Outbox poller publishes to Kafka.
Transactional Outbox with CDC
Debezium tails MySQL binlog / Postgres WAL → emits changes to Kafka. Zero application code for outbox polling. Handles exactly-once with log position.
Idempotent Receiver
On incoming message: check `processed_messages` table by message_id. If exists → ACK and skip. If not → process + insert processed row + ACK in one tx.
Saga Orchestrator State Machine
AWS Step Functions, Camunda, Temporal.io, or custom: state machine with retry policies, timeouts, and compensation steps per state.
Pro Tip: ACID is for monoliths, BASE is for microservices. In distributed systems, strong consistency is expensive and slow. Choose eventual consistency with compensating transactions unless you absolutely need 2PC.