Overview of the Issue
MoveTables ... Complete --rename-tables has a race window where the reverse
workflow's apply path can hit a renamed source table and permanently error
the reverse stream. The reverse workflow is not stopped or drained before
the source tables are renamed; it is only deleted afterward.
Observed behavior
Running MoveTables ... Complete --rename-tables=true on a Tables-type
workflow (one source keyspace, one target keyspace, table moves) with a
healthy forward workflow (Frozen) and an active reverse workflow.
Real-world timeline from a production cutover (single source shard, all
times UTC, all on the same source primary tablet):
| Δt |
Event |
| T+0 ms |
vtctld logs Renaming table <src_db>.<tbl1> to <src_db>._<tbl1>_old (traffic_switcher.go, removeSourceTables) |
| T+292 ms |
source vttablet schema engine confirms: created [_<tbl1>_old], dropped [<tbl1>] |
| T+792 ms |
reverse workflow stream errors: error applying event: Table '<src_db>.<tbl1>' doesn't exist (errno 1146) (sqlstate 42S02) |
| T+792 ms |
controller.go:317 classifies as unrecoverable and parks the stream in permanent error state |
dropSourceReverseVReplicationStreams deletes the reverse stream row from
_vt.vreplication after the rename completes — but the in-flight apply on
the now-renamed table has already failed and the controller has already
marked the stream errored, so deleting the row doesn't recover anything;
it just leaves an orphaned reverse-workflow entry that operators have to
clean up manually.
Expected behavior
Complete --rename-tables is documented/intended to atomically finalize
the cutover: tear down the reverse workflow AND rename source tables. From
an operator's perspective there should be no window in which the reverse
workflow can apply to a table that Complete has already renamed.
Root cause
In dropSources (go/vt/vtctl/workflow/server.go, the path that
MoveTablesComplete takes):
validateWorkflowHasCompleted — only reads the forward workflow
on the targets and checks that its streams are Frozen
(go/vt/vtctl/workflow/utils.go, the ReadVReplicationWorkflow call
uses ts.WorkflowName(), which is the forward name). The reverse
workflow's state is never inspected.
removeSourceTables(ctx, removalType) — issues RENAME TABLE <src_db>.<tbl> TO <src_db>._<tbl>_old on each source primary
(go/vt/vtctl/workflow/traffic_switcher.go, in removeSourceTables).
The reverse workflow is still running on the source primary at this
point.
dropArtifacts → dropSourceReverseVReplicationStreams — only now
does it DELETE FROM _vt.vreplication for the reverse streams.
Between steps 2 and 3 the reverse vreplicator is still:
- subscribed to the target keyspace's binlog stream,
- holding events in its in-memory apply pipeline,
- writing applied events back to the source DB.
Any DML for a just-renamed table — whether it arrived during the window or
was already buffered at the moment of rename — fails with 1146. The
controller (go/vt/vttablet/tabletmanager/vreplication/controller.go,
around line 317) treats 1146 as unrecoverable and the stream stays in
error state forever, even though the row gets deleted milliseconds later.
Suggested fix
Close the window by either:
Option A — reorder (smaller change): in dropSources, swap the order
so reverse streams are deleted before source tables are renamed. The
controller will stop trying to apply once the row is gone, eliminating
the rename-vs-apply race. Forward streams are already Frozen so they
won't observe the rename either.
Option B — explicit drain (more robust): before removeSourceTables,
explicitly stop the reverse workflow and wait for its applied position
to catch up to the latest source-side binlog position (or simply wait
for its apply queue to drain and confirm streams are in Stopped
state). Then proceed with the rename, then delete the streams.
Option B is safer if there's any concern about events that haven't yet
been read from the binlog at all (Option A doesn't drain those), though
those are arguably fine to discard once Complete has been called.
Either way validateWorkflowHasCompleted should probably grow a check
on the reverse workflow's state as well, not just the forward.
Reproduction Steps
- Set up a MoveTables workflow between two keyspaces with at least one
moderately busy table on the target. (Continuous writes on the target
side after SwitchTraffic increase the odds of a buffered reverse event
landing on the rename.)
- SwitchTraffic so the forward workflow goes Frozen and the reverse
workflow takes over.
- While the reverse workflow has activity (in-flight DMLs), run
MoveTables ... Complete --rename-tables=true.
- Observe
errno 1146 apply errors on the reverse workflow streams and
the controller parking them in error state, even though Complete
reports success.
Probability scales with reverse-workflow throughput at the moment of
Complete and with the number of tables in the move. We hit it on a
production cutover at observable but non-deterministic frequency.
Binary Version
Operating System and Environment details
Log Fragments
Overview of the Issue
MoveTables ... Complete --rename-tableshas a race window where the reverseworkflow's apply path can hit a renamed source table and permanently error
the reverse stream. The reverse workflow is not stopped or drained before
the source tables are renamed; it is only deleted afterward.
Observed behavior
Running
MoveTables ... Complete --rename-tables=trueon a Tables-typeworkflow (one source keyspace, one target keyspace, table moves) with a
healthy forward workflow (Frozen) and an active reverse workflow.
Real-world timeline from a production cutover (single source shard, all
times UTC, all on the same source primary tablet):
Renaming table <src_db>.<tbl1> to <src_db>._<tbl1>_old(traffic_switcher.go,removeSourceTables)created [_<tbl1>_old], dropped [<tbl1>]error applying event: Table '<src_db>.<tbl1>' doesn't exist (errno 1146) (sqlstate 42S02)controller.go:317classifies as unrecoverable and parks the stream in permanent error statedropSourceReverseVReplicationStreamsdeletes the reverse stream row from_vt.vreplicationafter the rename completes — but the in-flight apply onthe now-renamed table has already failed and the controller has already
marked the stream errored, so deleting the row doesn't recover anything;
it just leaves an orphaned reverse-workflow entry that operators have to
clean up manually.
Expected behavior
Complete --rename-tablesis documented/intended to atomically finalizethe cutover: tear down the reverse workflow AND rename source tables. From
an operator's perspective there should be no window in which the reverse
workflow can apply to a table that Complete has already renamed.
Root cause
In
dropSources(go/vt/vtctl/workflow/server.go, the path thatMoveTablesCompletetakes):validateWorkflowHasCompleted— only reads the forward workflowon the targets and checks that its streams are
Frozen(
go/vt/vtctl/workflow/utils.go, theReadVReplicationWorkflowcalluses
ts.WorkflowName(), which is the forward name). The reverseworkflow's state is never inspected.
removeSourceTables(ctx, removalType)— issuesRENAME TABLE <src_db>.<tbl> TO <src_db>._<tbl>_oldon each source primary(
go/vt/vtctl/workflow/traffic_switcher.go, inremoveSourceTables).The reverse workflow is still running on the source primary at this
point.
dropArtifacts→dropSourceReverseVReplicationStreams— only nowdoes it
DELETE FROM _vt.vreplicationfor the reverse streams.Between steps 2 and 3 the reverse vreplicator is still:
Any DML for a just-renamed table — whether it arrived during the window or
was already buffered at the moment of rename — fails with
1146. Thecontroller (
go/vt/vttablet/tabletmanager/vreplication/controller.go,around line 317) treats
1146as unrecoverable and the stream stays inerror state forever, even though the row gets deleted milliseconds later.
Suggested fix
Close the window by either:
Option A — reorder (smaller change): in
dropSources, swap the orderso reverse streams are deleted before source tables are renamed. The
controller will stop trying to apply once the row is gone, eliminating
the rename-vs-apply race. Forward streams are already Frozen so they
won't observe the rename either.
Option B — explicit drain (more robust): before
removeSourceTables,explicitly stop the reverse workflow and wait for its applied position
to catch up to the latest source-side binlog position (or simply wait
for its apply queue to drain and confirm streams are in
Stoppedstate). Then proceed with the rename, then delete the streams.
Option B is safer if there's any concern about events that haven't yet
been read from the binlog at all (Option A doesn't drain those), though
those are arguably fine to discard once Complete has been called.
Either way
validateWorkflowHasCompletedshould probably grow a checkon the reverse workflow's state as well, not just the forward.
Reproduction Steps
moderately busy table on the target. (Continuous writes on the target
side after SwitchTraffic increase the odds of a buffered reverse event
landing on the rename.)
workflow takes over.
MoveTables ... Complete --rename-tables=true.errno 1146apply errors on the reverse workflow streams andthe controller parking them in error state, even though Complete
reports success.
Probability scales with reverse-workflow throughput at the moment of
Complete and with the number of tables in the move. We hit it on a
production cutover at observable but non-deterministic frequency.
Binary Version
Operating System and Environment details
Log Fragments