vtctld: fix GetTablets stale primary report under --tablet-alias filter#20125
vtctld: fix GetTablets stale primary report under --tablet-alias filter#20125Taeknology wants to merge 4 commits into
Conversation
The alias-filtered `GetTablets` path has no regression coverage for stale primaries, so it can report a stale tablet as `PRIMARY` when queried alone or when grouped with stale tablets from other shards. This adds focused red tests for the single-alias, cross-shard, and mixed-alias cases. Signed-off-by: Mohamed Hamza <mhamza@fastmail.com> (cherry picked from commit c4c308a) Signed-off-by: Taeknology <20297177+Taeknology@users.noreply.github.com>
Fixes vitessio#19898 Signed-off-by: Taeknology <20297177+Taeknology@users.noreply.github.com>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
There was a problem hiding this comment.
Pull request overview
Fixes a bug in vtctld GetTablets where filtering by --tablet-alias could misreport stale former primaries as primary, because the previous code computed the "true primary" only as the max PrimaryTermStartTime within the result set (which excludes the real current primary when only stale aliases are queried). The fix consults GetShard per unique shard to obtain the authoritative term start time, falling back to the existing seed when the shard record is zero, and parallelizes those lookups with strict/non-strict failure modes.
Changes:
- Replace the single
truePrimaryTimestampwith a per-shard map seeded from result-set primaries, then overwrite each entry withShard.PrimaryTermStartTimefetched via parallelGetShardcalls. - Skip stale-primary adjustment for shards whose
GetShardfailed; aggregate errors witherrors.Join, returning them in strict mode and logging a warning otherwise. - Add table-driven tests covering single-alias, multi-shard alias, all-current-primaries, and strict/non-strict
GetShardfailure scenarios, plumbingaddTabletOptionsand afactorySetuphook through the test harness.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| go/vt/vtctl/grpcvtctldserver/server.go | Per-shard primary term resolution via concurrent GetShard with strict/non-strict error handling. |
| go/vt/vtctl/grpcvtctldserver/server_test.go | New test cases for stale-primary alias filtering and topo-failure paths; harness now supports AddTabletOptions and topo factory hooks. |
The prior name 'tablet alias filtering does not stale primaries across shards' used 'stale' as a verb and was grammatically incorrect. Rename to mirror the sibling case 'stale primaries across shards stay unknown' for a consistent 'X stay Y' pattern. Signed-off-by: Taeknology <20297177+Taeknology@users.noreply.github.com>
…alias Adds a new VTCtld minor-changes section with an entry describing the user-visible behavior change introduced by the fix: stale former primaries returned by 'vtctldclient GetTablets --tablet-alias' now report as 'unknown' instead of 'primary'. Includes operator-impact framing, mechanism summary, and notes on the GetShard fallback and --strict failure modes. Signed-off-by: Taeknology <20297177+Taeknology@users.noreply.github.com>
Status of
|
Backport feasibility (verified locally)I cherry-picked the fix onto both candidate branches and ran the
If you agree, please add:
I'll handle the |
Description
GetTablets --tablet-aliaswas reporting stale former primaries asprimarywhen the real current primary wasn't in the request. The fix consults the shard record viaGetShardinstead of trusting the result-set max, hybridised with the existing seed so transient cases (no recorded primary yet) stay best-effort.The shard-filter path (
--keyspace ... --shard ...) was correct already because the result set always includes the real primary. The alias-filter path doesn't — hence the bug.Related Issue(s)
Fixes #19898
Reproducer tests by @mhamza15 (first commit, cherry-picked verbatim).
Checklist
Backport candidates:
release-24.0andrelease-23.0— wrong primary reports from--tablet-aliascan confuse operator automation. Tag if you agree.Deployment Notes
Stale primaries queried by
--tablet-aliasnow report asunknowninstead ofprimary. Review any monitoring that relied on the old behavior. Adds one parallelGetShardper unique shard in the result.AI Disclosure
Most of this was written by Claude Code - I just provided direction.