Skip to content

feat: Add cluster balance verification command #418

@joshrotenberg

Description

@joshrotenberg

Overview

Add a command to verify cluster shard balance distribution across nodes, similar to rladmin's verify balance command.

Use Case

Operators need to quickly check if shards are evenly distributed across cluster nodes, especially after:

  • Adding/removing nodes
  • Database creation/deletion
  • Shard migration operations
  • Cluster scaling events

Desired Behavior

# Show balance report for entire cluster
redisctl enterprise cluster verify-balance

# Example output (table format)
NODE     TOTAL SHARDS  MASTER SHARDS  SLAVE SHARDS  RAM USED  CORES USED
node:1   45            23             22            124GB     12/16
node:2   46            22             24            126GB     11/16
node:3   44            24             20            121GB     13/16

Balance Score: 0.95/1.00 (Good)
Recommendation: Cluster is well-balanced

# JSON output for automation
redisctl enterprise cluster verify-balance -o json
{
  "balance_score": 0.95,
  "status": "good",
  "nodes": [
    {"node_id": 1, "total_shards": 45, "master_shards": 23, "slave_shards": 22},
    {"node_id": 2, "total_shards": 46, "master_shards": 22, "slave_shards": 24},
    {"node_id": 3, "total_shards": 44, "master_shards": 24, "slave_shards": 20}
  ],
  "recommendations": []
}

Implementation Approach

Data Collection (via existing API)

// 1. Get all nodes
let nodes = client.nodes().list().await?;

// 2. Get all shards
let shards = client.shards().list().await?;

// 3. Calculate distribution per node
let mut node_stats = HashMap::new();
for shard in shards {
    let node_id = shard.node_id;
    let entry = node_stats.entry(node_id).or_insert(NodeBalance::default());
    
    entry.total_shards += 1;
    if shard.role == "master" {
        entry.master_shards += 1;
    } else {
        entry.slave_shards += 1;
    }
}

Balance Calculation

// Calculate balance score (0.0-1.0)
// Perfect balance = 1.0, completely unbalanced = 0.0

let avg_shards = total_shards as f64 / num_nodes as f64;
let variance = calculate_variance(&node_shard_counts, avg_shards);
let balance_score = 1.0 - (variance / avg_shards).min(1.0);

// Thresholds
// 0.90-1.00: Good
// 0.75-0.89: Fair  
// 0.00-0.74: Poor

Recommendations

Based on balance score, provide actionable recommendations:

  • Good (0.90+): "Cluster is well-balanced"
  • Fair (0.75-0.89): "Consider rebalancing - X shards could be migrated"
  • Poor (<0.75): "Cluster needs rebalancing - run migration workflow"

rladmin Equivalent

# rladmin command
rladmin verify balance

# Example output (text)
Cluster balance report:
  Node 1: 45 shards (23 masters, 22 slaves)
  Node 2: 46 shards (22 masters, 24 slaves)
  Node 3: 44 shards (24 masters, 20 slaves)
  
Balance: OK

Benefits

  1. Quick health check - Single command to verify cluster health
  2. Automation-friendly - JSON output for monitoring systems
  3. Actionable insights - Recommendations for next steps
  4. Remote access - No SSH required (unlike rladmin)

Future Enhancements

  • Add --fix flag to automatically rebalance
  • Include memory balance, not just shard count
  • Show rack-aware balance (if rack configuration exists)
  • Historical balance tracking

Related

Priority

Medium-High - Useful for cluster health monitoring and operations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions