Overview of the Issue
When EmergencyReparentShard (ERS) runs on a GTID-based shard with two or more leading candidates whose Combined (received relay log + executed) GTID positions are incomparable — two tablets each holding GTIDs under their own server UUIDs from independent writes, where neither set is a superset of the other — there is no upfront check that detects the split-brain.
The existing secondary check in findMostAdvanced() (go/vt/vtctl/reparentutil/emergency_reparenter.go) runs after errant-GTID detection has already had a chance to filter tablets, and in this specific case it doesn't fire before promotion. ERS simply picks one of the diverged candidates as the new primary and proceeds 😱
From outside this looks like a successful ERS — vtctldclient returns success, the new primary takes traffic, no warning is logged. But:
- The losing side's unique GTIDs are now errant relative to the new primary
- Any transactions only on the losing side are silently lost
- The now-errant tablets can't replicate from the new primary without manual operator intervention (
vtctldclient ChangeTabletType ... drained AND a re-clone, OR a manual RESET MASTER after careful inspection)
This is a direct conflict with the safety contracts encoded in CLAUDE.md / AGENTS.md (ERS section):
- "ERS must prioritize certainty that we picked the most-advanced candidate"
- "ERS must error when the most-advanced candidate is not clear, and/or a split-brain is suspected"
- "ERS must avoid introducing errant GTIDs on replicas"
The "outage" appears to be a recovery on the surface, but it's a data-integrity incident that surfaces later via lag alerts, downstream consistency checks, or (worst case) end-user-visible inconsistency
What this looks like from outside
While the bug bites:
vtctldclient EmergencyReparentShard returns 0 — looks healthy
- The new primary serves writes immediately — looks healthy
Slave_IO_Running / Slave_SQL_Running on the now-errant tablets stay Yes (at first) until replication tries to apply something the new primary can't ship — eventually surfaces as Errant GTIDs in SHOW REPLICA STATUS output or VTOrc analysis, hours or days later
- There is no log line at ERS time naming the diverged tablets, so post-incident triage has nothing to grep for
Reproduction Steps
-
Set up a 3-tablet shard (1 primary + 2 replicas) with --durability_policy=semi_sync and GTID-based replication (default on MySQL 5.6+ / 8.0 / 8.4)
-
Detach both replicas from the primary and make them writable (simulates a partition that lets each side accept writes independently):
STOP REPLICA;
RESET REPLICA ALL;
SET GLOBAL read_only = OFF;
-
Write to each detached tablet independently — each INSERT generates a GTID under that tablet's own server UUID, producing two-sided GTID divergence:
-- on replica A:
INSERT INTO vt_insert_test(id, msg) VALUES (90002, 'side A');
-- on replica B:
INSERT INTO vt_insert_test(id, msg) VALUES (90003, 'side B');
-
Kill the original primary tablet
-
Run ERS:
vtctldclient EmergencyReparentShard ks/0 --wait-replicas-timeout=30s
-
Observe — ERS exits with 0, picks one of the divergent replicas as the new primary. The losing side's INSERT survives only on its own tablet, where the GTID is now errant relative to the new primary. There is no error, no warning, and no log line naming the diverged sides
Binary Version
Affects all Vitess versions on `main` as of 2026-05-27.
The risk has existed for as long as `findMostAdvanced()` has run its `AtLeast`
check only AFTER errant-GTID filtering — i.e., before any upfront uniformity
check on the leading-Combined group.
Operating System and Environment details
Not environment-specific.
GTID-based MySQL replication (MySQL 5.6+ / 8.0 / 8.4 / Percona 8.0+).
Log Fragments
The defining symptom is the absence of any log line at ERS time naming the divergence. A reproduction in a 4-tablet local cluster produces vtctld output that is indistinguishable from a healthy ERS — Validate, ShardReplicationPositions, StopReplicationAndGetStatus, PromoteReplica, PopulateReparentJournal, SetReplicationSource — all succeed. The errant-GTID signal only surfaces later, via the now-poisoned tablets' replication status
N/A — the bug is the absence of an upfront log/error when split-brain is present.
Related
Your thoughts are appreciated 🙏
Overview of the Issue
When
EmergencyReparentShard(ERS) runs on a GTID-based shard with two or more leading candidates whoseCombined(received relay log + executed) GTID positions are incomparable — two tablets each holding GTIDs under their own server UUIDs from independent writes, where neither set is a superset of the other — there is no upfront check that detects the split-brain.The existing secondary check in
findMostAdvanced()(go/vt/vtctl/reparentutil/emergency_reparenter.go) runs after errant-GTID detection has already had a chance to filter tablets, and in this specific case it doesn't fire before promotion. ERS simply picks one of the diverged candidates as the new primary and proceeds 😱From outside this looks like a successful ERS —
vtctldclientreturns success, the new primary takes traffic, no warning is logged. But:vtctldclient ChangeTabletType ... drainedAND a re-clone, OR a manualRESET MASTERafter careful inspection)This is a direct conflict with the safety contracts encoded in
CLAUDE.md/AGENTS.md(ERS section):The "outage" appears to be a recovery on the surface, but it's a data-integrity incident that surfaces later via lag alerts, downstream consistency checks, or (worst case) end-user-visible inconsistency
What this looks like from outside
While the bug bites:
vtctldclient EmergencyReparentShardreturns0— looks healthySlave_IO_Running/Slave_SQL_Runningon the now-errant tablets stayYes(at first) until replication tries to apply something the new primary can't ship — eventually surfaces asErrant GTIDsinSHOW REPLICA STATUSoutput or VTOrc analysis, hours or days laterReproduction Steps
Set up a 3-tablet shard (1 primary + 2 replicas) with
--durability_policy=semi_syncand GTID-based replication (default on MySQL 5.6+ / 8.0 / 8.4)Detach both replicas from the primary and make them writable (simulates a partition that lets each side accept writes independently):
Write to each detached tablet independently — each
INSERTgenerates a GTID under that tablet's own server UUID, producing two-sided GTID divergence:Kill the original primary tablet
Run ERS:
Observe — ERS exits with
0, picks one of the divergent replicas as the new primary. The losing side'sINSERTsurvives only on its own tablet, where the GTID is now errant relative to the new primary. There is no error, no warning, and no log line naming the diverged sidesBinary Version
Operating System and Environment details
Log Fragments
The defining symptom is the absence of any log line at ERS time naming the divergence. A reproduction in a 4-tablet local cluster produces vtctld output that is indistinguishable from a healthy ERS —
Validate,ShardReplicationPositions,StopReplicationAndGetStatus,PromoteReplica,PopulateReparentJournal,SetReplicationSource— all succeed. The errant-GTID signal only surfaces later, via the now-poisoned tablets' replication statusRelated
uniformCombinedcheck on the filtered leading group, an explicitFAILED_PRECONDITIONabort, and an opt-out--allow-split-brain-promotionflag for operators who deliberately need to force ERS through)EmergencyReparentShardfails whenmysqldis down on any tablet in a shard #18528 (related — broader ERS hardening discussion)EmergencyReparentShardto fail #18529 (related — lagging-minority intolerance, addressed in the same PR)Your thoughts are appreciated 🙏