docs: Document ERS split-brain detection and partial relay-log tolerance#2128
docs: Document ERS split-brain detection and partial relay-log tolerance#2128promptless[bot] wants to merge 1 commit into
Conversation
Add documentation for v25 EmergencyReparentShard changes: - Document new --allow-split-brain-promotion flag - Add split-brain detection section explaining fail-fast behavior - Add partial relay-log-apply tolerance section for GTID-based shards - Document three new metrics for ERS observability
| - On the primary-elect tablet, insert a row in the `reparent_journal` table and then updates the `PrimaryAlias` property of the global shard object. | ||
| - In parallel on each replica, excluding the old primary, set the new primary as the replication source and wait for the inserted row to replicate to the replica tablets. | ||
|
|
||
| #### Split-brain detection |
There was a problem hiding this comment.
Added split-brain detection and --allow-split-brain-promotion flag documentation based on PR #18707 which introduces upfront split-brain detection in filterAndCheckUniform() and the operator escape hatch flag.
Source: vitessio/vitess#18707
| - Consider using `--ignore-replicas` to exclude tablets on the side you want to discard from the candidate pool. | ||
|
|
||
| #### Partial relay-log-apply tolerance | ||
|
|
There was a problem hiding this comment.
Added partial relay-log-apply tolerance documentation based on PR #18707 which implements the waitForAllRelayLogsToApply() short-circuit behavior for GTID-based shards.
Source: vitessio/vitess#18707
| | `planned_reparent_counts` | Number of times PlannedReparentShard has been run. Available dimensions are keyspace, shard and the result of the operation. | | ||
| | `emergency_reparent_counts` | Number of times EmergencyReparentShard has been run. Available dimensions are keyspace, shard and the result of the operation. | | ||
| | `reparent_shard_operation_timings` | Timings of reparent shard operations indexed by the type of operation. | | ||
| | `EmergencyReparentFilteredCandidates` | Number of candidates excluded from the relay-log wait during ERS because their `Combined` position was behind the leading group. Keyed by keyspace and shard. | |
There was a problem hiding this comment.
Added three new metrics (EmergencyReparentFilteredCandidates, EmergencyReparentRelayLogFailedCandidates, EmergencyReparentSplitBrainOverrides) based on PR #18707 where they are defined in go/vt/vtctl/reparentutil/emergency_reparenter.go.
Source: vitessio/vitess#18707
✅ Deploy Preview for vitess ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Open this suggestion in Promptless to view citations and reasoning process
Documents v25 EmergencyReparentShard changes: the new --allow-split-brain-promotion flag, split-brain detection fail-fast behavior, partial relay-log-apply tolerance for GTID-based shards, and three new observability metrics.
Trigger Events
Tip: Sort by Shortest Review in the Dashboard to find quick wins ⚡