Last Updated: May 1, 2025
Circuit Breaker States
| State | Behavior | Transitions To | Typical Action |
|---|---|---|---|
| CLOSED | Requests flow normally. Failure count tracked. | OPEN (when failures exceed threshold) | Normal operation — call downstream. |
| OPEN | Requests fail immediately. No calls to downstream. Timer counts down. | HALF-OPEN (after timeout expires) | Fast-fail with exception or fallback response. |
| HALF-OPEN | Limited probe requests allowed through. | CLOSED (if probes succeed) or OPEN (if probes fail) | Test if downstream recovered; gate remaining requests. |
Configuration Parameters
| Item | Description |
|---|---|
failureThreshold | Number of consecutive failures before opening the circuit (e.g., 5). Also called `errorThresholdPercentage` for sliding window approaches. |
slowCallThreshold | Max duration for a call to be considered slow (e.g., 1000 ms). Slow calls count as failures when configuring slow-call thresholds. |
waitDurationInOpen | Time circuit stays OPEN before transitioning to HALF-OPEN (e.g., 30s). Must be long enough for downstream to recover. |
permittedCallsInHalfOpen | Max concurrent calls allowed in HALF-OPEN state (e.g., 3). If they succeed → CLOSED; if any fail → back to OPEN. |
slidingWindowSize | Number of calls to evaluate when computing failure rate. Count-based (last N calls) or time-based (last N seconds). |
recordExceptions | List of exceptions that count as failures (e.g., TimeoutException, ConnectException). Business exceptions should not trip the breaker. |
Implementation Patterns
Resilience4j CircuitBreaker (Java)@CircuitBreaker(name = 'paymentService', fallbackMethod = 'fallback')
defaultConfig: slidingWindowSize=10, failureRateThreshold=50%
Polly Circuit Breaker (C# .NET)Policy.Handle().CircuitBreakerAsync(
exceptionsAllowedBeforeBreaking: 3,
durationOfBreak: TimeSpan.FromSeconds(30))
Go Circuit BreakerUse sony/gobreaker or hashicorp/gobreaker: settings with MaxRequests=3, Interval=30s, Timeout=60s
Envoy Proxy Circuit Breakerper-connection and per-request limits: max_connections, max_pending_requests, max_requests per cluster
Istio DestinationRuletrafficPolicy.connectionPool for TCP, outlierDetection for HTTP — consecutiveErrors=5, baseEjectionTime=30s
Best Practices & Anti-Patterns
| Item | Description |
|---|---|
Always provide fallback | Degrade gracefully: cached data, default response, or queued retry. Never return raw errors to users without a plan. |
Don't circuit-break business errors | Only trip on integration failures (timeouts, connection refused). HTTP 404 or 422 are valid responses — not failures. |
Log every state transition | Audit every CLOSED→OPEN, OPEN→HALF-OPEN, HALF-OPEN→CLOSED. Essential for debugging cascading failures across services. |
Tune thresholds per-dependency | Database calls need tighter thresholds than cache calls. Redis can handle 10x more retries than a slow downstream API. |
Combine with retries | Retry with exponential backoff inside CLOSED state. Circuit breaker catches what retries can't handle — persistent failures. |
Monitor breaker metrics | Expose via /actuator/health or Prometheus metrics: state, failure rate, not-permitted count. Alert on OPEN state > 5 min. |
Don't share breakers across unrelated dependencies | One slow endpoint shouldn't break the circuit for all calls to a service. Use per-endpoint or per-operation granularity. |
Pro Tip: Circuit breakers protect downstream services from overload. The key insight: failing fast is better than making users wait. Always pair with fallback behavior — a degraded experience beats no experience.