Overview of the Issue
When an UPDATE statement's SET clause contains an IN/NOT IN subquery whose routing is compatible with the outer route's routing, the planner merges them but produces a broken plan: the emitted SQL contains a stray ::__sqN ListArg reference with no UncorrelatedSubquery primitive to supply the bind variable. Executing such a plan would fail with missing bind var __sqN from vttablet.
The bug is purely planner-level and reproduces with the existing select_cases.json plantest fixtures — no cluster needed.
Cases that are not affected
DELETE ... WHERE ... IN (subquery) ... — the IN subquery in the WHERE clause goes through the route-predicates path and keeps the inline subquery. Works correctly.
SELECT ... 1 IN (subquery) ... — non-DML projections push down through rewriteSubqueryArgsForPullout, which converts the placeholder shape correctly per merged-vs-pulled-out outcome.
The bug is specific to DML SET/VALUES expressions containing IN/NOT IN subqueries that the planner chooses to merge. It almost certainly hasn't been hit in practice because most DML+IN tests use cross-keyspace or cross-routing patterns that can't merge anyway (e.g. the existing dml_cases.json entry "UPDATE: same subquery as scalar SET and IN SET" uses user_extra Scatter inside a user EqualUnique update — they can't merge, so the UncorrelatedSubquery wrapper stays). The bug only surfaces when both the outer DML and the inner IN subquery target the same shard set.
Root cause
Two pieces interact:
-
SubQueryBuilder.replaceSubqueryNode (go/vt/vtgate/planbuilder/operators/subquery_builder.go) eagerly mints *sqlparser.ListArg for DML IN/NOT IN placeholders, because DML SET expressions don't run through the projection-pushdown pass that would otherwise convert a non-list *Argument to *ListArg later.
-
rewriteMergedSubqueryExpr (go/vt/vtgate/planbuilder/operators/subquery_planning.go) walks expressions looking for *sqlparser.Argument placeholders to substitute back with the original *sqlparser.Subquery when a subquery merges:
expr = sqlparser.Rewrite(expr, nil, func(cursor *sqlparser.Cursor) bool {
arg, ok := cursor.Node().(*sqlparser.Argument)
if !ok || arg.Name != sq.ArgName {
return true
}
...
cursor.Replace(sq.originalSubquery)
...
})
sqlparser.ListArg is a distinct AST node type, so it never matches and never gets substituted. The placeholder leaks into the emitted SQL.
Expected behaviour
Either:
- The merge-time substitution path replaces the
*ListArg placeholder with the original *Subquery (so the emitted SQL contains the inline subquery, just like DELETE ... WHERE col = (1 in (select id from user where id = 5)) does today — that case works correctly), or
- The planner refuses to merge
IN/NOT IN subqueries into a DML outer route until option 1 is implemented, falling back to the UncorrelatedSubquery pullout shape.
Suggested fix direction
Extend rewriteMergedSubqueryExpr to also match sqlparser.ListArg (by name) and replace it with sq.originalSubquery. The other call sites that look for placeholder Arguments could be audited for the same gap (e.g. Ordering.settleOrderingExpressions, the pe.Original rewrite path).
I noticed this while investigating an unrelated CI failure on #19963 — happy to put up a fix in a follow-up if it's helpful.
Reproduction Steps
-
Check out main (verified at commit 507cb822cc).
-
Add this entry to go/vt/vtgate/planbuilder/testdata/select_cases.json anywhere — I used the slot just before "Reference with a subquery which cannot be merged":
{
"comment": "REPRO update with IN subquery merging into outer route",
"query": "update user set col = (1 in (select id from user where id = 5)) where id = 5",
"plan": "REGENERATE_ME"
}
-
Run:
go test ./go/vt/vtgate/planbuilder/ -run "TestPlanTestSuite/TestPlan/select_cases.json/REPRO" -v
-
The generated plan is:
{
"Type": "Passthrough",
"QueryType": "UPDATE",
"Original": "update user set col = (1 in (select id from user where id = 5)) where id = 5",
"Instructions": {
"OperatorType": "Update",
"Variant": "EqualUnique",
"Keyspace": { "Name": "user", "Sharded": true },
"Query": "update `user` set col = 1 in ::__sq1 where id = 5",
"Values": ["5"],
"Vindex": "user_index"
},
"TablesUsed": ["user.user"]
}
Note the emitted Query contains ::__sq1, but there is no UncorrelatedSubquery primitive above the Update to populate __sq1. Sending this to vttablet would fail with missing bind var __sq1.
The same shape repros for NOT IN:
update user set col = (1 not in (select id from user where id = 5)) where id = 5
→ Query: "update \user` set col = 1 not in ::__sq1 where id = 5"` (same broken shape).
Binary Version
vitessio/vitess `main` at 507cb822cc (most recent commit on main as of 2026-04-27)
Operating System and Environment details
n/a — pure planner-level reproduction. No cluster, vttablet, or specific OS/arch needed; the bug is observable directly from the plantest output.
Log Fragments
n/a — see the plantest plan output under "Reproduction Steps" above. The bug is the planner emitting `::__sq1` with no `UncorrelatedSubquery` primitive to back it; there is no runtime log because no one has hit this in production (the bug requires DML+IN with compatible routing, which existing tests don't exercise).
Overview of the Issue
When an
UPDATEstatement'sSETclause contains anIN/NOT INsubquery whose routing is compatible with the outer route's routing, the planner merges them but produces a broken plan: the emitted SQL contains a stray::__sqNListArg reference with noUncorrelatedSubqueryprimitive to supply the bind variable. Executing such a plan would fail withmissing bind var __sqNfrom vttablet.The bug is purely planner-level and reproduces with the existing
select_cases.jsonplantest fixtures — no cluster needed.Cases that are not affected
DELETE ... WHERE ... IN (subquery) ...— the IN subquery in the WHERE clause goes through the route-predicates path and keeps the inline subquery. Works correctly.SELECT ... 1 IN (subquery) ...— non-DML projections push down throughrewriteSubqueryArgsForPullout, which converts the placeholder shape correctly per merged-vs-pulled-out outcome.The bug is specific to DML
SET/VALUESexpressions containingIN/NOT INsubqueries that the planner chooses to merge. It almost certainly hasn't been hit in practice because most DML+IN tests use cross-keyspace or cross-routing patterns that can't merge anyway (e.g. the existingdml_cases.jsonentry "UPDATE: same subquery as scalar SET and IN SET" usesuser_extraScatter inside auserEqualUniqueupdate — they can't merge, so theUncorrelatedSubquerywrapper stays). The bug only surfaces when both the outer DML and the inner IN subquery target the same shard set.Root cause
Two pieces interact:
SubQueryBuilder.replaceSubqueryNode(go/vt/vtgate/planbuilder/operators/subquery_builder.go) eagerly mints*sqlparser.ListArgfor DMLIN/NOT INplaceholders, because DMLSETexpressions don't run through the projection-pushdown pass that would otherwise convert a non-list*Argumentto*ListArglater.rewriteMergedSubqueryExpr(go/vt/vtgate/planbuilder/operators/subquery_planning.go) walks expressions looking for*sqlparser.Argumentplaceholders to substitute back with the original*sqlparser.Subquerywhen a subquery merges:sqlparser.ListArgis a distinct AST node type, so it never matches and never gets substituted. The placeholder leaks into the emitted SQL.Expected behaviour
Either:
*ListArgplaceholder with the original*Subquery(so the emitted SQL contains the inline subquery, just likeDELETE ... WHERE col = (1 in (select id from user where id = 5))does today — that case works correctly), orIN/NOT INsubqueries into a DML outer route until option 1 is implemented, falling back to theUncorrelatedSubquerypullout shape.Suggested fix direction
Extend
rewriteMergedSubqueryExprto also matchsqlparser.ListArg(by name) and replace it withsq.originalSubquery. The other call sites that look for placeholder Arguments could be audited for the same gap (e.g.Ordering.settleOrderingExpressions, thepe.Originalrewrite path).I noticed this while investigating an unrelated CI failure on #19963 — happy to put up a fix in a follow-up if it's helpful.
Reproduction Steps
Check out
main(verified at commit507cb822cc).Add this entry to
go/vt/vtgate/planbuilder/testdata/select_cases.jsonanywhere — I used the slot just before"Reference with a subquery which cannot be merged":{ "comment": "REPRO update with IN subquery merging into outer route", "query": "update user set col = (1 in (select id from user where id = 5)) where id = 5", "plan": "REGENERATE_ME" }Run:
The generated plan is:
{ "Type": "Passthrough", "QueryType": "UPDATE", "Original": "update user set col = (1 in (select id from user where id = 5)) where id = 5", "Instructions": { "OperatorType": "Update", "Variant": "EqualUnique", "Keyspace": { "Name": "user", "Sharded": true }, "Query": "update `user` set col = 1 in ::__sq1 where id = 5", "Values": ["5"], "Vindex": "user_index" }, "TablesUsed": ["user.user"] }Note the emitted
Querycontains::__sq1, but there is noUncorrelatedSubqueryprimitive above theUpdateto populate__sq1. Sending this to vttablet would fail withmissing bind var __sq1.The same shape repros for
NOT IN:→
Query: "update \user` set col = 1 not in ::__sq1 where id = 5"` (same broken shape).Binary Version
vitessio/vitess `main` at 507cb822cc (most recent commit on main as of 2026-04-27)Operating System and Environment details
n/a — pure planner-level reproduction. No cluster, vttablet, or specific OS/arch needed; the bug is observable directly from the plantest output.Log Fragments