Skip to content

feat: Add rack-aware verification command #419

@joshrotenberg

Description

@joshrotenberg

Overview

Add a command to verify rack-aware configuration is valid, ensuring masters and slaves are properly distributed across racks for high availability.

Background

Redis Enterprise supports rack-aware deployments where nodes are assigned to logical racks (typically corresponding to availability zones, data centers, or physical racks). For HA, masters and their corresponding slaves should be on different racks.

Use Case

After configuring rack awareness, operators need to verify:

  • Masters and their slaves are on different racks
  • No single rack failure can take down a database
  • Rack configuration meets HA requirements

Desired Behavior

# Verify rack-aware configuration
redisctl enterprise cluster verify-rack-aware

# Example output (table format)
DATABASE      SHARD     MASTER RACK  SLAVE RACK   STATUS
default-db    redis:1   rack-1       rack-2       ✓ OK
default-db    redis:2   rack-1       rack-2       ✓ OK
cache-db      redis:3   rack-2       rack-3       ✓ OK
prod-db       redis:4   rack-1       rack-1       ✗ VIOLATION
prod-db       redis:5   rack-2       rack-3       ✓ OK

Rack-Aware Status: VIOLATED (1 issue found)
Issue: Database 'prod-db' shard redis:4 has master and slave on same rack (rack-1)

Recommendation: Migrate slave shard redis:4:slave to a different rack

# JSON output
redisctl enterprise cluster verify-rack-aware -o json
{
  "status": "violated",
  "violations": [
    {
      "database": "prod-db",
      "shard_id": "redis:4",
      "master_rack": "rack-1",
      "slave_rack": "rack-1",
      "issue": "master and slave on same rack"
    }
  ],
  "compliant_shards": 4,
  "total_shards": 5,
  "compliance_rate": 0.80
}

Implementation Approach

Data Collection

// 1. Get all nodes with rack info
let nodes = client.nodes().list().await?;
let node_to_rack: HashMap<i32, String> = nodes
    .iter()
    .map(|n| (n.node_id, n.rack_id.clone()))
    .collect();

// 2. Get all shards
let shards = client.shards().list().await?;

// 3. Group shards by database and check rack distribution
for (db_id, db_shards) in shards.grouped_by_database() {
    for master_shard in db_shards.masters() {
        let master_rack = node_to_rack[&master_shard.node_id];
        
        // Find corresponding slave(s)
        let slaves = db_shards.slaves_for(master_shard.shard_id);
        
        for slave in slaves {
            let slave_rack = node_to_rack[&slave.node_id];
            
            if master_rack == slave_rack {
                violations.push(RackViolation {
                    database: db_id,
                    shard: master_shard.shard_id,
                    master_rack,
                    slave_rack,
                });
            }
        }
    }
}

Validation Rules

  1. Basic Rule: Master and slave(s) of the same shard must be on different racks
  2. Replication Rule: If database has replication enabled, verify slaves exist
  3. Rack Count: Warn if fewer than 3 racks configured (limited HA)
  4. Even Distribution: Check if racks have roughly equal node counts

Status Levels

  • Compliant: All shards follow rack-aware rules
  • ⚠️ Warning: Minor issues (e.g., uneven distribution)
  • Violated: Critical issues (master/slave on same rack)

rladmin Equivalent

# rladmin command
rladmin verify rack_aware

# Example output (text)
Checking rack aware configuration...

Database: db:1
  Shard 1: Master on rack-1, Slave on rack-2 [OK]
  Shard 2: Master on rack-2, Slave on rack-1 [OK]

Database: db:2
  Shard 3: Master on rack-1, Slave on rack-1 [VIOLATION]

Rack aware status: VIOLATED

Benefits

  1. HA validation - Ensure high availability configuration is correct
  2. Proactive monitoring - Catch misconfigurations before failures
  3. Automation - JSON output for CI/CD checks
  4. Remote access - No SSH required (unlike rladmin)
  5. Actionable recommendations - Tells you what to fix

Future Enhancements

  • Add --fix flag to auto-migrate violating shards
  • Support for Active-Active (CRDB) rack awareness
  • Integration with cluster verify-balance for combined health check
  • Alert webhooks for violations

Related

Priority

Medium - Important for HA deployments, but not all clusters use rack awareness.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions