Skip to content

Bug Report: ReplicationTracker can errantly report lag on a primary #18805

@mattlord

Description

@mattlord

Overview of the Issue

This can happen in the following scenario with the ReplicationTracker:

  1. You start a PRS while the replicas are all lagging
  2. We demote the current primary, which stops the heartbeat writer on the old primary and blocks any new writes
  3. We select the best new primary candidate — which was a replica — and proceed to wait for all replicas to catchup
  4. They catch up within the timeout window and the PRS moves forward (eventually succeeding)
  5. The new primary goes into primary mode and the tracker's heartbeat reader is stopped
  6. At step 2 the previous/old primary stopped sending heartbeats and blocked any new writes. So after that point we are not able to update the new primary's replication lag in the tracker — because no new heartbeats are coming through — to reflect the fact that it got down to 0 in step 3. So the new/current primary continues to export a metric which errantly reflects that it currently has 10s of replica lag as the metric variable is no longer getting updated here because that poller call is no longer made by the tracker when it's running on a primary

This can then cause confusing graphs and errant alerts based on the stale replication lag value for the tablet.

Reproduction Steps

You can follow the steps noted above. Skipping a repeatable test case here as you can see how the issue can happen by walking the code path.

Binary Version

vtgate version Version: 24.0.0-SNAPSHOT (Git revision a8edcbc4542c8f9826b0b8fe3d573c057f0b7ecb branch 'vtadmin_use_vrepl_trx_lag') built on Fri Oct 24 21:05:26 UTC 2025 by matt@pslord.local using go1.25.2 darwin/arm64

Operating System and Environment details

N/A

Log Fragments

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions