Skip to content

Backport: MySQL version-aware reparent for PRS & ERS#866

Draft
ejortegau wants to merge 11 commits into
slack-22.0from
eduardo.ortega/mysql-version-aware-reparent
Draft

Backport: MySQL version-aware reparent for PRS & ERS#866
ejortegau wants to merge 11 commits into
slack-22.0from
eduardo.ortega/mysql-version-aware-reparent

Conversation

@ejortegau

Copy link
Copy Markdown

What's this?

Backport of the MySQL version-aware primary selection feature for PRS and ERS from upstream PR vitessio#20211.

When selecting a new primary during reparenting, this now prefers tablets running a lower MySQL release (major.minor) to maintain replication compatibility — replicas must be at the same or higher version than the primary.

How it works

  • Adds server_version field to the replication status proto, populated during StopReplicationAndGetStatus
  • Extends sortTabletsForReparent to consider MySQL version after promotion rules
  • Compares only major.minor (patch differences are ignored)
  • In identifyPrimaryCandidate, among same-tier candidates prefers the lowest MySQL release

Differences from upstream

  • Uses log.Warningf instead of structured log.Warn (slack-22.0 doesn't have the new logging)
  • Omits semi-sync state collection (unrelated to version-aware reparenting)
  • Keeps existing promotion-rule fallback instead of the deterministic alias tiebreaker

Cherry-picked and adapted by Claude Code from upstream commits.

ejortegau and others added 2 commits May 29, 2026 12:04
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
Patch version differences within the same MySQL release do not affect
replication compatibility, so the version-aware candidate election now
ignores them. Only major.minor boundaries trigger the preference for a
lower-version primary.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
@github-actions github-actions Bot added this to the v22.0.4 milestone May 29, 2026
ejortegau and others added 9 commits June 1, 2026 16:00
PRS always catches the elected tablet up to the old primary's exact
demotion position, so replication position head-start is irrelevant for
data safety. This change introduces SortMode (SortForPRS/SortForERS) to
give PRS a distinct sort order: promotion rules > MySQL version >
position > buffer pool > alias.

ERS retains position-first ordering to minimize data loss.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
- Remove unused `after.ServerVersion` assignment in StopReplicationAndGetStatus
- Log warning when MySQL version string fails to parse
- Clarify findCandidate post-loop comment explaining two-phase logic
- Add v25 changelog entry with deployment note and cross-cell limitation

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
- Add `server_version` field to `PrimaryStatus` proto so that
  DemotePrimary and PrimaryStatus RPCs report the MySQL version
- Populate version in PrimaryStatus, DemotePrimary, and
  StopReplicationAndGetStatus RPCs
- Move GetVersionString call after replication stop in
  StopReplicationAndGetStatus to avoid delaying the critical path
- Read PrimaryStatus.ServerVersion in ERS ERNotReplica path so
  demoted primary-status candidates no longer get unknownVersion
- Extract getMySQLVersion helper to deduplicate the fetch-and-warn
  pattern across 4 call sites
- Add test coverage for version propagation through both paths

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
ReleaseAtLeast already covers the same-release case (Minor >=),
so the redundant IsSameRelease check can be removed.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
StopReplicationAndGetStatus had several early-return paths (IO thread
already stopped, replication not healthy, stop failures, after-status
failure) that returned Before without populating ServerVersion. ERS
builds its version map from Before.ServerVersion, so tablets hitting
these paths became unknownVersion and could lose version-aware election
to newer tablets.

Fix: call getMySQLVersion(ctx) before every return that includes Before.

Add a table-driven test covering all return paths (success and error)
for both IOTHREADONLY and IOANDSQLTHREAD modes.

Ref: vitessio#20211 (review)

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
The cherry-picked test case references a primaryAlias struct field
that doesn't exist on the slack-22.0 branch.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Trace version detection and candidate selection so we can verify
the lowest-version preference is working as intended in production.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
The successful stop path set Before.ServerVersion but left After.ServerVersion
empty, while the no-op paths return After: before and thus include it. Set
after.ServerVersion = before.ServerVersion so the common success path is
consistent, and extend the test to assert After.ServerVersion when present.

Co-Authored-By: Claude <svc-devxp-claude@slack-corp.com>
Signed-off-by: Eduardo Ortega <5791035+ejortegau@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant