This guide explains how to configure service-defaults config entries in Consul, specifically focusing on outlier detection (passive health checking).
service-defaults is a Consul config entry that defines default settings for a service, including:
- Protocol (http, http2, grpc, tcp)
- Upstream configuration (connection limits, health checks, etc.)
- Mesh gateway mode
- Transparent proxy settings
- And more...
Create a file web-defaults.hcl:
Kind = "service-defaults"
Name = "web"
Protocol = "http"
UpstreamConfig {
Defaults {
# Apply to all upstreams of this service
PassiveHealthCheck {
Interval = "30s"
MaxFailures = 5
EnforcingConsecutive5xx = 100
MaxEjectionPercent = 10
BaseEjectionTime = "30s"
}
}
Overrides = [
{
# Override for specific upstream
Name = "database"
PassiveHealthCheck {
Interval = "10s"
MaxFailures = 3
EnforcingConsecutive5xx = 100
MaxEjectionPercent = 50
BaseEjectionTime = "60s"
}
}
]
}Apply it:
consul config write web-defaults.hclcurl -X PUT http://localhost:8500/v1/config \
-H "Content-Type: application/json" \
-d '{
"Kind": "service-defaults",
"Name": "web",
"Protocol": "http",
"UpstreamConfig": {
"Defaults": {
"PassiveHealthCheck": {
"Interval": "30s",
"MaxFailures": 5,
"EnforcingConsecutive5xx": 100,
"MaxEjectionPercent": 10,
"BaseEjectionTime": "30s"
}
}
}
}'In your service registration file:
services {
name = "web"
port = 8080
connect {
sidecar_service {
proxy {
upstreams = [
{
destination_name = "database"
local_bind_port = 5432
config {
passive_health_check {
interval = "22s"
max_failures = 4
enforcing_consecutive_5xx = 99
max_ejection_percent = 50
base_ejection_time = "60s"
}
}
}
]
}
}
}
}Register it:
consul services register web-service.hclpackage main
import (
"github.com/hashicorp/consul/api"
)
func main() {
client, _ := api.NewClient(api.DefaultConfig())
interval := 30 * time.Second
maxFailures := uint32(5)
enforcing := uint32(100)
maxEjection := uint32(10)
baseEjection := 30 * time.Second
entry := &api.ServiceConfigEntry{
Kind: api.ServiceDefaults,
Name: "web",
Protocol: "http",
UpstreamConfig: &api.UpstreamConfiguration{
Defaults: &api.UpstreamConfig{
PassiveHealthCheck: &api.PassiveHealthCheck{
Interval: interval,
MaxFailures: maxFailures,
EnforcingConsecutive5xx: &enforcing,
MaxEjectionPercent: &maxEjection,
BaseEjectionTime: &baseEjection,
},
},
},
}
_, _, err := client.ConfigEntries().Set(entry, nil)
if err != nil {
panic(err)
}
}| Parameter | Type | Description | Default |
|---|---|---|---|
Interval |
duration | Time between health check analysis sweeps | - |
MaxFailures |
uint32 | Consecutive failures before ejection | - |
EnforcingConsecutive5xx |
uint32 | % chance of ejection (0-100) | 100 |
MaxEjectionPercent |
uint32 | Max % of cluster that can be ejected | 10 |
BaseEjectionTime |
duration | Base ejection duration (multiplied by ejection count) | 30s |
Consul applies outlier detection configuration in this order (highest to lowest priority):
- Per-upstream inline config (in service registration)
- Service-defaults overrides (per-upstream in UpstreamConfig.Overrides)
- Service-defaults defaults (UpstreamConfig.Defaults)
- Wildcard defaults (service-defaults with Name = "*")
- Envoy defaults (if no config specified)
Kind = "service-defaults"
Name = "api"
Protocol = "http"
UpstreamConfig {
Defaults {
PassiveHealthCheck {
Interval = "10s"
MaxFailures = 3
}
}
}This enables outlier detection with:
- Check every 10 seconds
- Eject after 3 consecutive failures
- Use Envoy defaults for other parameters
Kind = "service-defaults"
Name = "payment-service"
Protocol = "http"
UpstreamConfig {
Defaults {
PassiveHealthCheck {
Interval = "5s"
MaxFailures = 2
EnforcingConsecutive5xx = 100
MaxEjectionPercent = 50
BaseEjectionTime = "120s"
}
}
}This configuration:
- Checks every 5 seconds
- Ejects after only 2 failures
- Always enforces ejection (100%)
- Can eject up to 50% of instances
- Keeps instances ejected for at least 2 minutes
Kind = "service-defaults"
Name = "frontend"
Protocol = "http"
UpstreamConfig {
# Default for all upstreams
Defaults {
PassiveHealthCheck {
Interval = "30s"
MaxFailures = 5
}
}
# Stricter settings for critical database
Overrides = [
{
Name = "postgres"
PassiveHealthCheck {
Interval = "10s"
MaxFailures = 2
MaxEjectionPercent = 30
}
},
{
Name = "redis"
PassiveHealthCheck {
Interval = "5s"
MaxFailures = 3
}
}
]
}consul config read -kind service-defaults -name web# Get cluster configuration
curl http://localhost:19000/config_dump | jq '.configs[1].dynamic_active_clusters[] | select(.cluster.name=="db") | .cluster.outlier_detection'Expected output:
{
"interval": "22s",
"consecutive_5xx": 4,
"enforcing_consecutive_5xx": 99,
"max_ejection_percent": 50,
"base_ejection_time": "60s"
}# Check Envoy stats for outlier detection
curl http://localhost:19000/stats | grep outlier_detection
# Example output:
# cluster.db.outlier_detection.ejections_active: 0
# cluster.db.outlier_detection.ejections_consecutive_5xx: 2
# cluster.db.outlier_detection.ejections_total: 2Apply to all services:
Kind = "service-defaults"
Name = "*"
UpstreamConfig {
Defaults {
PassiveHealthCheck {
Interval = "30s"
MaxFailures = 5
}
}
}Set all values to 0 or very high:
Kind = "service-defaults"
Name = "legacy-service"
UpstreamConfig {
Defaults {
PassiveHealthCheck {
MaxFailures = 999999
EnforcingConsecutive5xx = 0
}
}
}Start with low enforcement, increase gradually:
# Week 1: 25% enforcement
PassiveHealthCheck {
EnforcingConsecutive5xx = 25
}
# Week 2: 50% enforcement
PassiveHealthCheck {
EnforcingConsecutive5xx = 50
}
# Week 3: 100% enforcement
PassiveHealthCheck {
EnforcingConsecutive5xx = 100
}Check:
- Is EDS being used? (Hostname-based services don't support outlier detection)
- Is the config entry applied? (
consul config read) - Is Envoy receiving the config? (Check
/config_dump) - Are there enough instances? (Need multiple endpoints to eject)
Solution: Increase MaxEjectionPercent or MaxFailures:
PassiveHealthCheck {
MaxFailures = 10
MaxEjectionPercent = 30
}Solution: Decrease BaseEjectionTime:
PassiveHealthCheck {
BaseEjectionTime = "10s"
}- Start conservative: Begin with high
MaxFailuresand lowMaxEjectionPercent - Monitor metrics: Watch ejection stats before tightening thresholds
- Use overrides: Apply stricter settings only to critical upstreams
- Test in staging: Validate configuration before production
- Document decisions: Record why specific thresholds were chosen
- Consider traffic patterns: Adjust
Intervalbased on request volume