tower-resilience

A comprehensive resilience and fault-tolerance toolkit for Tower services, inspired by Resilience4j.

About

Tower-resilience provides composable middleware for building robust distributed systems in Rust. Built on Tower, it extends the ecosystem with production-ready resilience patterns inspired by Resilience4j.

What sets tower-resilience apart:

Circuit Breaker - Not available in Tower's built-in middleware
Advanced patterns - Bulkhead isolation, reconnect strategies, and more
Enhanced retry - Multiple backoff strategies with better control than Tower's basic retry
Unified observability - Consistent event system across all patterns
Ergonomic APIs - Builder pattern with sensible defaults
Battle-tested design - Patterns adapted from production-proven Resilience4j

Resilience Patterns

Circuit Breaker - Prevents cascading failures by stopping calls to failing services
Bulkhead - Isolates resources to prevent system-wide failures
Time Limiter - Advanced timeout handling with cancellation support
Retry - Intelligent retry with exponential backoff and jitter
Rate Limiter - Controls request rate to protect services
Cache - Response memoization to reduce load
Reconnect - Automatic reconnection with configurable backoff strategies
Health Check - Proactive health monitoring with intelligent resource selection
Chaos - Inject failures and latency for testing resilience (development/testing only)

Features

Composable - Stack multiple resilience patterns using Tower's ServiceBuilder
Observable - Event system for monitoring pattern behavior (retries, state changes, etc.)
Configurable - Builder APIs with sensible defaults
Async-first - Built on tokio for async Rust applications
Zero-cost abstractions - Minimal overhead when patterns aren't triggered

Quick Start

[dependencies]
tower-resilience = "0.1"
tower = "0.5"

use tower::ServiceBuilder;
use tower_resilience::prelude::*;

let circuit_breaker = CircuitBreakerLayer::builder()
    .failure_rate_threshold(0.5)
    .build();

let service = ServiceBuilder::new()
    .layer(circuit_breaker.for_request::<()>())
    .layer(BulkheadLayer::builder()
        .max_concurrent_calls(10)
        .build())
    .service(my_service);

Use `for_request::<T>()` with the request type `T` your service handles so the circuit
breaker can plug into `ServiceBuilder`; the existing `layer.layer(service)` helper still
returns a configured `CircuitBreaker` when you need direct control over the service value.

Examples

Circuit Breaker

Prevent cascading failures by opening the circuit when error rate exceeds threshold:

use tower_resilience_circuitbreaker::CircuitBreakerLayer;
use std::time::Duration;

let layer = CircuitBreakerLayer::<String, ()>::builder()
    .name("api-circuit")
    .failure_rate_threshold(0.5)          // Open at 50% failure rate
    .sliding_window_size(100)              // Track last 100 calls
    .wait_duration_in_open(Duration::from_secs(60))  // Stay open 60s
    .on_state_transition(|from, to| {
        println!("Circuit breaker: {:?} -> {:?}", from, to);
    })
    .build();

let service = layer.layer(my_service);

Full examples: circuitbreaker.rs | circuitbreaker_fallback.rs | circuitbreaker_health_check.rs

Bulkhead

Limit concurrent requests to prevent resource exhaustion:

use tower_resilience_bulkhead::BulkheadLayer;
use std::time::Duration;

let layer = BulkheadLayer::builder()
    .name("worker-pool")
    .max_concurrent_calls(10)                    // Max 10 concurrent
    .max_wait_duration(Some(Duration::from_secs(5)))  // Wait up to 5s
    .on_call_permitted(|concurrent| {
        println!("Request permitted (concurrent: {})", concurrent);
    })
    .on_call_rejected(|max| {
        println!("Request rejected (max: {})", max);
    })
    .build();

let service = layer.layer(my_service);

Full examples: bulkhead.rs | bulkhead_demo.rs

Time Limiter

Enforce timeouts on operations with configurable cancellation:

use tower_resilience_timelimiter::TimeLimiterLayer;
use std::time::Duration;

let layer = TimeLimiterLayer::builder()
    .timeout_duration(Duration::from_secs(30))
    .cancel_running_future(true)  // Cancel on timeout
    .on_timeout(|| {
        println!("Operation timed out!");
    })
    .build();

let service = layer.layer(my_service);

Full examples: timelimiter.rs | timelimiter_example.rs

Retry

Retry failed requests with exponential backoff and jitter:

use tower_resilience_retry::RetryLayer;
use std::time::Duration;

let layer = RetryLayer::<MyError>::builder()
    .max_attempts(5)
    .exponential_backoff(Duration::from_millis(100))
    .on_retry(|attempt, delay| {
        println!("Retrying (attempt {}, delay {:?})", attempt, delay);
    })
    .on_success(|attempts| {
        println!("Success after {} attempts", attempts);
    })
    .build();

let service = layer.layer(my_service);

Full examples: retry.rs | retry_example.rs

Rate Limiter

Control request rate to protect downstream services:

use tower_resilience_ratelimiter::RateLimiterLayer;
use std::time::Duration;

let layer = RateLimiterLayer::builder()
    .limit_for_period(100)                      // 100 requests
    .refresh_period(Duration::from_secs(1))     // per second
    .timeout_duration(Duration::from_millis(500))  // Wait up to 500ms
    .on_permit_acquired(|wait| {
        println!("Request permitted (waited {:?})", wait);
    })
    .build();

let service = layer.layer(my_service);

Full examples: ratelimiter.rs | ratelimiter_example.rs

Cache

Cache responses to reduce load on expensive operations:

use tower_resilience_cache::{CacheLayer, EvictionPolicy};
use std::time::Duration;

let layer = CacheLayer::builder()
    .max_size(1000)
    .ttl(Duration::from_secs(300))                 // 5 minute TTL
    .eviction_policy(EvictionPolicy::Lru)          // LRU, LFU, or FIFO
    .key_extractor(|req: &Request| req.id.clone())
    .on_hit(|| println!("Cache hit!"))
    .on_miss(|| println!("Cache miss"))
    .build();

let service = layer.layer(my_service);

Full examples: cache.rs | cache_example.rs

Reconnect

Automatically reconnect on connection failures with configurable backoff:

use tower_resilience_reconnect::{ReconnectLayer, ReconnectConfig, ReconnectPolicy};
use std::time::Duration;

let layer = ReconnectLayer::new(
    ReconnectConfig::builder()
        .policy(ReconnectPolicy::exponential(
            Duration::from_millis(100),  // Start at 100ms
            Duration::from_secs(5),       // Max 5 seconds
        ))
        .max_attempts(10)
        .retry_on_reconnect(true)         // Retry request after reconnecting
        .connection_errors_only()          // Only reconnect on connection errors
        .on_state_change(|from, to| {
            println!("Connection: {:?} -> {:?}", from, to);
        })
        .build()
);

let service = layer.layer(my_service);

Full examples: reconnect.rs | basic.rs | custom_policy.rs

Health Check

Proactive health monitoring with intelligent resource selection:

use tower_resilience_healthcheck::{HealthCheckWrapper, HealthStatus, SelectionStrategy};
use std::time::Duration;

// Create wrapper with multiple resources
let wrapper = HealthCheckWrapper::builder()
    .with_context(primary_db, "primary")
    .with_context(secondary_db, "secondary")
    .with_checker(|db| async move {
        match db.ping().await {
            Ok(_) => HealthStatus::Healthy,
            Err(_) => HealthStatus::Unhealthy,
        }
    })
    .with_interval(Duration::from_secs(5))
    .with_selection_strategy(SelectionStrategy::RoundRobin)
    .build();

// Start background health checking
wrapper.start().await;

// Get a healthy resource
if let Some(db) = wrapper.get_healthy().await {
    // Use healthy database
}

Note: Health Check is not a Tower layer - it's a wrapper pattern for managing multiple resources with automatic failover.

Full examples: basic.rs

Chaos (Testing Only)

Inject failures and latency to test your resilience patterns:

use tower_resilience_chaos::ChaosLayer;
use std::time::Duration;

let chaos = ChaosLayer::<String, std::io::Error>::builder()
    .name("test-chaos")
    .error_rate(0.1)                               // 10% of requests fail
    .error_fn(|_req| std::io::Error::new(
        std::io::ErrorKind::Other, "chaos!"
    ))
    .latency_rate(0.2)                             // 20% delayed
    .min_latency(Duration::from_millis(50))
    .max_latency(Duration::from_millis(200))
    .seed(42)                                      // Deterministic chaos
    .build();

let service = chaos.layer(my_service);

WARNING: Only use in development/testing environments. Never in production.

Full examples: chaos.rs | chaos_example.rs

Error Handling

Zero-Boilerplate with ResilienceError

When composing multiple resilience layers, use ResilienceError<E> to eliminate manual error conversion code:

use tower_resilience_core::ResilienceError;

// Your application error
#[derive(Debug)]
enum AppError {
    DatabaseDown,
    InvalidRequest,
}

// That's it! No From implementations needed
type ServiceError = ResilienceError<AppError>;

// All resilience layer errors automatically convert
let service = ServiceBuilder::new()
    .layer(timeout_layer)
    .layer(circuit_breaker.for_request::<()>())
    .layer(bulkhead)
    .service(my_service);

Benefits:

Zero boilerplate - no From trait implementations
Rich error context (layer names, counts, durations)
Convenient helpers: is_timeout(), is_rate_limited(), etc.

See the Layer Composition Guide for details.

Manual Error Handling

For specific use cases, you can still implement custom error types with manual From conversions. See examples for both approaches.

Pattern Composition

Stack multiple patterns for comprehensive resilience:

use tower::ServiceBuilder;

// Client-side: timeout -> circuit breaker -> retry
let client = ServiceBuilder::new()
    .layer(timeout_layer)
    .layer(circuit_breaker_layer.for_request::<()>())
    .layer(retry_layer)
    .service(http_client);

// Server-side: rate limit -> bulkhead -> timeout
let server = ServiceBuilder::new()
    .layer(rate_limiter_layer)
    .layer(bulkhead_layer)
    .layer(timeout_layer)
    .service(handler);

Performance

Benchmarks measure the overhead of each pattern in the happy path (no failures, circuit closed, permits available):

Pattern	Overhead (ns)	vs Baseline
Baseline (no middleware)	~10 ns	1.0x
Retry (no retries)	~80-100 ns	~8-10x
Time Limiter	~107 ns	~10x
Rate Limiter	~124 ns	~12x
Bulkhead	~162 ns	~16x
Cache (hit)	~250 ns	~25x
Circuit Breaker (closed)	~298 ns	~29x
Circuit Breaker + Bulkhead	~413 ns	~40x

Key Takeaways:

All patterns add < 300ns overhead individually
Overhead is additive when composing patterns
Even the heaviest pattern (circuit breaker) is negligible for most use cases
Retry and time limiter are the lightest weight options

Run benchmarks yourself:

cargo bench --bench happy_path_overhead

Documentation

API Documentation
Pattern Guides - In-depth guides on when and how to use each pattern

Examples

Two sets of examples are provided:

Top-level examples - Simple, getting-started examples matching this README (one per pattern)
Module examples - Detailed examples in each crate's examples/ directory showing advanced features

Run top-level examples with:

cargo run --example circuitbreaker
cargo run --example bulkhead
cargo run --example retry
# etc.

Stress Tests

Stress tests validate pattern behavior under extreme conditions (high volume, high concurrency, memory stability). They are opt-in and marked with #[ignore]:

# Run all stress tests
cargo test --test stress -- --ignored

# Run specific pattern stress tests
cargo test --test stress circuitbreaker -- --ignored
cargo test --test stress bulkhead -- --ignored
cargo test --test stress cache -- --ignored

# Run with output to see performance metrics
cargo test --test stress -- --ignored --nocapture

Example results:

1M calls through circuit breaker: ~2.8s (357k calls/sec)
10k fast operations through bulkhead: ~56ms (176k ops/sec)
100k cache entries: Fill + hit test validates performance

Stress tests cover:

High volume (millions of operations)
High concurrency (thousands of concurrent requests)
Memory stability (leak detection, bounded growth)
State consistency (correctness under load)
Pattern composition (layered middleware)

Minimum Supported Rust Version (MSRV)

This crate's MSRV is 1.64.0, matching Tower's MSRV policy.

We follow Tower's approach:

MSRV bumps are not considered breaking changes
When increasing MSRV, the new version must have been released at least 6 months ago
MSRV is tested in CI to prevent unintentional increases

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.github		.github
benches		benches
crates		crates
examples		examples
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
release-plz.toml		release-plz.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

tower-resilience

About

Resilience Patterns

Features

Quick Start

Examples

Circuit Breaker

Bulkhead

Time Limiter

Retry

Rate Limiter

Cache

Reconnect

Health Check

Chaos (Testing Only)

Error Handling

Zero-Boilerplate with ResilienceError

Manual Error Handling

Pattern Composition

Performance

Documentation

Examples

Stress Tests

Minimum Supported Rust Version (MSRV)

License

Contributing

About

Licenses found

Uh oh!

Releases 103

Packages

Contributors 3

Uh oh!

Languages

License

Licenses found

joshrotenberg/tower-resilience

Folders and files

Latest commit

History

Repository files navigation

tower-resilience

About

Resilience Patterns

Features

Quick Start

Examples

Circuit Breaker

Bulkhead

Time Limiter

Retry

Rate Limiter

Cache

Reconnect

Health Check

Chaos (Testing Only)

Error Handling

Zero-Boilerplate with ResilienceError

Manual Error Handling

Pattern Composition

Performance

Documentation

Examples

Stress Tests

Minimum Supported Rust Version (MSRV)

License

Contributing

About

Resources

License

Licenses found

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 103

Packages 0

Contributors 3

Uh oh!

Languages

Packages