generated from amazon-archives/__template_Custom
-
Notifications
You must be signed in to change notification settings - Fork 119
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
OpenSearch will execute 99% of monitors on only 2 nodes, the nodes that contain the primary and replica shards for the .opendistro-alerting-config index
On 2.18 this behaviour doesn't seem to cause too many issues, on 3.0+ it leads to regular crashes of these 2 nodes with the shards, which often cascades into larger cluster issues
Related component
Search:Performance
To Reproduce
- Deploy OpenSearch cluster with significant number of nodes (16+)
- Run a high number of regularly scheduled monitors (100+, ideally 1k+) with ~10 min frequency
- Parse OpenSearch logs and observe which nodes are running these monitors (can query for
Executing scheduled monitor) - Observe that over 100k+ monitor runs, 99% of them will only execute on 2 nodes in the cluster, the 2 nodes that have the
.opendistro-alerting-configprimary and replica shards
This activity has been observed on 2.18, 3.1, 3.3
Expected behavior
Monitors execute on all nodes in the cluster OR it is easy to increase the shard count on .opendistro-alerting-config
Additional Details
Plugins
Standard RPM install of OpenSearch exhibits this issue
Screenshots
Cannot provide
Host/Environment (please complete the following information):
- OS: RHEL
- Version: 2.18, 3.1, 3.3, untested on others
Additional context
N/A
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working