Open
Description
Describe the bug
I had a pod stuck in terminating state because of rabbitmq-diagnostics check_if_node_is_quorum_critical
telling me 4 queues are critical when they seem to have 1 leader + 2 followers each.
It can also happen that check_if_node_is_quorum_critical
gets stuck if a queue just doesn't have enough followers as it's not growing the queues following.
And there are no alerts sent if the process takes time or even times out.
To Reproduce
This only happens on my big cluster (>300 queues, a small amount of churn).
Expected behavior
- No false positives
- The preStop is not passive (grows the qeues following)
- Alerts are sent
Version and environment information
- RabbitMQ: 4.1.1
- RabbitMQ Cluster Operator: 2.15.0
- Kubernetes: 1.32.5
- Cloud provider or hardware configuration: AWS EKS