Skip to content

Commit 1b620e3

Browse files
add split brain flag
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
1 parent f95806f commit 1b620e3

12 files changed

Lines changed: 183 additions & 20 deletions

File tree

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@ return user.NeedsMigration() && migrate(user) || user
238238
### EmergencyReparentShard (ERS)
239239
- ERS must prioritize **certainty** that we picked the most-advanced candidate
240240
- ERS must error when the most-advanced candidate is not clear, and/or a split-brain is suspected
241-
- ERS must avoid introducing errant GTIDs on replicas. This includes writes that are considered unacknowledged to the client — at the errant-GTID level, whether or not the transaction was acknowledged to the client is inconsequential, as MySQL cannot rewind GTIDs of any kind
241+
- ERS must avoid introducing errant GTIDs on replicas. This includes writes that are considered unacknowledged to the client as MySQL cannot rewind GTIDs of any kind
242242
- Changes should prioritize reducing points of failure - avoid new RPCs or work that may delay or make ERS more brittle
243243
- ERS must error if a shard contains a mix of GTID-based and non-GTID-based replication. Their position semantics differ (`Combined` = retrieved+executed for GTID vs. executed-only for non-GTID), so a unified split-brain / most-advanced check across both is unsafe
244244
- For non-GTID flavors, ERS must wait on every candidate and fail on any error. The "filter to leading group + short-circuit on first success" optimization is only safe for GTID-based flavors, where `Combined` is distinct from the executed position

changelog/25.0/25.0.0/summary.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,11 +100,14 @@ See [#19978](https://github.com/vitessio/vitess/issues/19978) for details.
100100

101101
`EmergencyReparentShard` (ERS) on GTID-based shards no longer fails when only some replicas can apply their relay logs. As long as at least one tablet at the leading `Combined` GTID position applies successfully, ERS proceeds; lagging or stuck-SQL-thread replicas are no longer blockers. Pre-existing pre-PR behavior is preserved for non-GTID flavors (FilePos, MariaDB), where ERS still requires every candidate to apply.
102102

103-
When the leading GTID-based candidates have incomparable `Combined` positions (suspected split-brain), ERS now aborts upfront with a clear `FAILED_PRECONDITION` error rather than risk silently picking one side.
103+
When the leading GTID-based candidates have incomparable `Combined` positions (suspected split-brain), ERS now aborts upfront with a clear `FAILED_PRECONDITION` error naming the diverged tablets, rather than silently picking one side. Pre-PR ERS would pick blindly and let the losing side's unique GTIDs become errant on those tablets — a silent data-integrity incident that surfaced later via lag alerts or downstream consistency checks. See [#20199](https://github.com/vitessio/vitess/issues/20199) for the bug this addresses.
104104

105-
Two new stats are exported for observability:
105+
A new `--allow-split-brain-promotion` flag is added to `vtctldclient EmergencyReparentShard` (and `--allow_split_brain_promotion` on the legacy `vtctl`). It is **off by default**. Operators who deliberately need to force ERS through a detected split-brain — typically because they already know which side to keep and plan to re-clone the losing side — can set it to convert the abort into a `WARN` log and proceed. The losing side's unique GTIDs will become errant after promotion, so this is an explicit operator override, not a default-on safety knob.
106+
107+
Three new stats are exported for observability:
106108

107109
- `EmergencyReparentFilteredCandidates` — counts replicas excluded from the relay-log wait because their `Combined` position is strictly behind the leading group.
108110
- `EmergencyReparentRelayLogFailedCandidates` — counts replicas that genuinely failed to apply relay logs (cancellations after a peer succeeded are not counted).
111+
- `EmergencyReparentSplitBrainOverrides` — counts ERS runs that proceeded despite detected split-brain because `--allow-split-brain-promotion` was set. Stays at zero unless an operator has deliberately invoked the escape hatch.
109112

110113
See [#18707](https://github.com/vitessio/vitess/pull/18707) for details.

go/cmd/vtctldclient/command/reparents.go

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ var emergencyReparentShardOptions = struct {
9696
IgnoreReplicaAliasStrList []string
9797
PreventCrossCellPromotion bool
9898
WaitForAllTablets bool
99+
AllowSplitBrainPromotion bool
99100
}{}
100101

101102
func commandEmergencyReparentShard(cmd *cobra.Command, args []string) error {
@@ -144,6 +145,7 @@ func commandEmergencyReparentShard(cmd *cobra.Command, args []string) error {
144145
WaitReplicasTimeout: protoutil.DurationToProto(emergencyReparentShardOptions.WaitReplicasTimeout),
145146
PreventCrossCellPromotion: emergencyReparentShardOptions.PreventCrossCellPromotion,
146147
WaitForAllTablets: emergencyReparentShardOptions.WaitForAllTablets,
148+
AllowSplitBrainPromotion: emergencyReparentShardOptions.AllowSplitBrainPromotion,
147149
})
148150
if err != nil {
149151
return err
@@ -309,6 +311,7 @@ func init() {
309311
EmergencyReparentShard.Flags().StringVar(&emergencyReparentShardOptions.ExpectedPrimaryAliasStr, "expected-primary", "", "Alias of a tablet that must be the current primary in order for the reparent to be processed.")
310312
EmergencyReparentShard.Flags().BoolVar(&emergencyReparentShardOptions.PreventCrossCellPromotion, "prevent-cross-cell-promotion", false, "Only promotes a new primary from the same cell as the previous primary.")
311313
EmergencyReparentShard.Flags().BoolVar(&emergencyReparentShardOptions.WaitForAllTablets, "wait-for-all-tablets", false, "Should ERS wait for all the tablets to respond. Useful when all the tablets are reachable.")
314+
EmergencyReparentShard.Flags().BoolVar(&emergencyReparentShardOptions.AllowSplitBrainPromotion, "allow-split-brain-promotion", false, "Allow ERS to proceed when two leading candidates have incomparable Combined GTID positions (suspected split-brain). Off by default. Operator escape hatch — accepts that the losing side's unique GTIDs will become errant.")
312315
EmergencyReparentShard.Flags().StringSliceVarP(&emergencyReparentShardOptions.IgnoreReplicaAliasStrList, "ignore-replicas", "i", nil, "Comma-separated, repeated list of replica tablet aliases to ignore during the emergency reparent.")
313316
Root.AddCommand(EmergencyReparentShard)
314317

go/vt/proto/vtctldata/vtctldata.pb.go

Lines changed: 17 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

go/vt/proto/vtctldata/vtctldata_vtproto.pb.go

Lines changed: 34 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

go/vt/vtctl/grpcvtctldserver/server.go

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1304,6 +1304,7 @@ func (s *VtctldServer) EmergencyReparentShard(ctx context.Context, req *vtctldat
13041304
span.Annotate("wait_replicas_timeout_sec", waitReplicasTimeout.Seconds())
13051305
span.Annotate("prevent_cross_cell_promotion", req.PreventCrossCellPromotion)
13061306
span.Annotate("wait_for_all_tablets", req.WaitForAllTablets)
1307+
span.Annotate("allow_split_brain_promotion", req.AllowSplitBrainPromotion)
13071308

13081309
m := sync.RWMutex{}
13091310
logstream := []*logutilpb.Event{}
@@ -1314,7 +1315,8 @@ func (s *VtctldServer) EmergencyReparentShard(ctx context.Context, req *vtctldat
13141315
logstream = append(logstream, e)
13151316
})
13161317

1317-
ev, err := reparentutil.NewEmergencyReparenter(s.ts, s.tmc, logger).ReparentShard(ctx,
1318+
ev, err := reparentutil.NewEmergencyReparenter(s.ts, s.tmc, logger).ReparentShard(
1319+
ctx,
13181320
req.Keyspace,
13191321
req.Shard,
13201322
reparentutil.EmergencyReparentOptions{
@@ -1324,6 +1326,7 @@ func (s *VtctldServer) EmergencyReparentShard(ctx context.Context, req *vtctldat
13241326
WaitAllTablets: req.WaitForAllTablets,
13251327
PreventCrossCellPromotion: req.PreventCrossCellPromotion,
13261328
ExpectedPrimaryAlias: req.ExpectedPrimary,
1329+
AllowSplitBrainPromotion: req.AllowSplitBrainPromotion,
13271330
},
13281331
)
13291332

@@ -3315,7 +3318,8 @@ func (s *VtctldServer) PlannedReparentShard(ctx context.Context, req *vtctldatap
33153318
logstream = append(logstream, e)
33163319
})
33173320

3318-
ev, err := reparentutil.NewPlannedReparenter(s.ts, s.tmc, logger).ReparentShard(ctx,
3321+
ev, err := reparentutil.NewPlannedReparenter(s.ts, s.tmc, logger).ReparentShard(
3322+
ctx,
33193323
req.Keyspace,
33203324
req.Shard,
33213325
reparentutil.PlannedReparentOptions{
@@ -5496,7 +5500,8 @@ func (s *VtctldServer) ValidateVSchema(ctx context.Context, req *vtctldatapb.Val
54965500
r := &tabletmanagerdatapb.GetSchemaRequest{ExcludeTables: req.ExcludeTables, IncludeViews: req.IncludeViews}
54975501
primarySchema, err := schematools.GetSchema(ctx, s.ts, s.tmc, si.PrimaryAlias, r)
54985502
if err != nil {
5499-
errorMessage := fmt.Sprintf("GetSchema(%s, nil, %v, %v) (%v/%v) failed: %v", si.PrimaryAlias.String(),
5503+
errorMessage := fmt.Sprintf(
5504+
"GetSchema(%s, nil, %v, %v) (%v/%v) failed: %v", si.PrimaryAlias.String(),
55005505
excludeTables, includeViews, keyspace, shard, err,
55015506
)
55025507
shardResult.Results = append(shardResult.Results, errorMessage)

go/vt/vtctl/reparent.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ func commandEmergencyReparentShard(ctx context.Context, wr *wrangler.Wrangler, s
173173
preventCrossCellPromotion := subFlags.Bool("prevent_cross_cell_promotion", false, "only promotes a new primary from the same cell as the previous primary")
174174
ignoreReplicasList := subFlags.String("ignore_replicas", "", "comma-separated list of replica tablet aliases to ignore during emergency reparent")
175175
waitForAllTablets := subFlags.Bool("wait_for_all_tablets", false, "should ERS wait for all the tablets to respond. Useful when all the tablets are reachable")
176+
allowSplitBrainPromotion := subFlags.Bool("allow_split_brain_promotion", false, "allow ERS to proceed when two leading candidates have incomparable Combined GTID positions (suspected split-brain); off by default — operator escape hatch")
176177

177178
if err := subFlags.Parse(args); err != nil {
178179
return err
@@ -206,6 +207,7 @@ func commandEmergencyReparentShard(ctx context.Context, wr *wrangler.Wrangler, s
206207
WaitReplicasTimeout: *waitReplicasTimeout,
207208
IgnoreReplicas: topoproto.ParseTabletSet(*ignoreReplicasList),
208209
PreventCrossCellPromotion: *preventCrossCellPromotion,
210+
AllowSplitBrainPromotion: *allowSplitBrainPromotion,
209211
})
210212
}
211213

go/vt/vtctl/reparentutil/emergency_reparenter.go

Lines changed: 34 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,13 @@ type EmergencyReparentOptions struct {
6363
WaitReplicasTimeout time.Duration
6464
PreventCrossCellPromotion bool
6565
ExpectedPrimaryAlias *topodatapb.TabletAlias
66+
// AllowSplitBrainPromotion lets ERS proceed when two leading candidates have
67+
// incomparable Combined GTID positions (suspected split-brain). Off by default —
68+
// operator escape hatch. When set, the upfront uniformCombined check and the
69+
// secondary AtLeast check in findMostAdvanced log a warning and continue instead
70+
// of aborting with FAILED_PRECONDITION. The losing side's unique GTIDs will
71+
// become errant on those tablets after promotion.
72+
AllowSplitBrainPromotion bool
6673

6774
// Private options managed internally. We use value passing to avoid leaking
6875
// these details back out.
@@ -84,6 +91,10 @@ var (
8491
"EmergencyReparentRelayLogFailedCandidates", "Number of candidates that failed to apply relay logs during EmergencyReparentShard",
8592
[]string{"Keyspace", "Shard"},
8693
)
94+
ersSplitBrainOverrides = stats.NewCountersWithMultiLabels(
95+
"EmergencyReparentSplitBrainOverrides", "Number of times EmergencyReparentShard proceeded despite detecting incomparable Combined GTID positions, because the operator set AllowSplitBrainPromotion=true. The losing side's unique GTIDs will become errant on those tablets after promotion.",
96+
[]string{"Keyspace", "Shard"},
97+
)
8798
)
8899

89100
// NewEmergencyReparenter returns a new EmergencyReparenter object, ready to
@@ -291,7 +302,7 @@ func (erp *EmergencyReparenter) reparentShardLocked(ctx context.Context, ev *eve
291302
relayLogWaitCandidates := validCandidates
292303
requireAll := true
293304
if isGTIDBased {
294-
relayLogWaitCandidates, err = erp.filterAndCheckUniform(validCandidates, keyspace, shard, "candidates")
305+
relayLogWaitCandidates, err = erp.filterAndCheckUniform(validCandidates, keyspace, shard, "candidates", opts.AllowSplitBrainPromotion)
295306
if err != nil {
296307
return err
297308
}
@@ -331,7 +342,7 @@ func (erp *EmergencyReparenter) reparentShardLocked(ctx context.Context, ev *eve
331342
}
332343
if !appliedSurvived {
333344
erp.logger.Warningf("all originally-applied candidates were removed by errant-GTID detection; running second relay-log-apply wait on surviving candidates before promotion")
334-
rewaitCandidates, err := erp.filterAndCheckUniform(validCandidates, keyspace, shard, "surviving unwaited candidates")
345+
rewaitCandidates, err := erp.filterAndCheckUniform(validCandidates, keyspace, shard, "surviving unwaited candidates", opts.AllowSplitBrainPromotion)
335346
if err != nil {
336347
return err
337348
}
@@ -485,18 +496,27 @@ type relayLogResult struct {
485496
// has incomparable Combined positions (suspected split-brain). The descriptor parameter is
486497
// interpolated into the error message to identify which ERS pipeline stage detected the
487498
// split-brain (first wait vs. errant-GTID re-wait).
499+
//
500+
// If allowSplitBrain is true, an incomparable result logs a warning and returns the filtered
501+
// set without erroring — the operator-escape-hatch path that accepts errant GTIDs on the
502+
// losing side in exchange for forcing ERS through.
488503
func (erp *EmergencyReparenter) filterAndCheckUniform(
489504
validCandidates map[string]*RelayLogPositions,
490505
keyspace, shard, descriptor string,
506+
allowSplitBrain bool,
491507
) (map[string]*RelayLogPositions, error) {
492508
filtered := filterToMostAdvancedCombined(validCandidates, erp.logger)
493509
if !uniformCombined(filtered) {
494-
return nil, vterrors.Errorf(
495-
vtrpc.Code_FAILED_PRECONDITION,
496-
"emergency reparent aborted: %s have incomparable Combined GTID positions (suspected split-brain): %s",
497-
descriptor,
498-
describeCombinedPositions(filtered),
499-
)
510+
if !allowSplitBrain {
511+
return nil, vterrors.Errorf(
512+
vtrpc.Code_FAILED_PRECONDITION,
513+
"emergency reparent aborted: %s have incomparable Combined GTID positions (suspected split-brain): %s",
514+
descriptor,
515+
describeCombinedPositions(filtered),
516+
)
517+
}
518+
erp.logger.Warningf("AllowSplitBrainPromotion=true: %s have incomparable Combined GTID positions (suspected split-brain): %s — proceeding under operator override; losing side's unique GTIDs will become errant", descriptor, describeCombinedPositions(filtered))
519+
ersSplitBrainOverrides.Add([]string{keyspace, shard}, 1)
500520
}
501521
if excluded := int64(len(validCandidates) - len(filtered)); excluded > 0 {
502522
ersFilteredCandidates.Add([]string{keyspace, shard}, excluded)
@@ -715,10 +735,14 @@ func (erp *EmergencyReparenter) findMostAdvanced(
715735
winningPosition := tabletPositions[0]
716736

717737
// We have already removed the tablets with errant GTIDs before calling this function. At this point our winning position must be a
718-
// superset of all the other valid positions. If that is not the case, then we have a split brain scenario, and we should cancel the ERS
738+
// superset of all the other valid positions. If that is not the case, then we have a split brain scenario, and we should cancel the ERS —
739+
// unless AllowSplitBrainPromotion is set, in which case the operator has accepted that the losing side's unique GTIDs will become errant.
719740
for i, position := range tabletPositions {
720741
if !winningPosition.AtLeast(position) {
721-
return nil, nil, vterrors.Errorf(vtrpc.Code_FAILED_PRECONDITION, "split brain detected between servers - %v and %v", winningPrimaryTablet.Alias, validTablets[i].Alias)
742+
if !opts.AllowSplitBrainPromotion {
743+
return nil, nil, vterrors.Errorf(vtrpc.Code_FAILED_PRECONDITION, "split brain detected between servers - %v and %v", winningPrimaryTablet.Alias, validTablets[i].Alias)
744+
}
745+
erp.logger.Warningf("AllowSplitBrainPromotion=true: split brain detected between %v and %v — proceeding under operator override; losing side's unique GTIDs will become errant", winningPrimaryTablet.Alias, validTablets[i].Alias)
722746
}
723747
}
724748

0 commit comments

Comments
 (0)