Skip to content

Bug Report: planner: merged IN/NOT IN subquery in UPDATE SET leaks ::__sqN ListArg into emitted SQL #19965

@arthurschreiber

Description

@arthurschreiber

Overview of the Issue

When an UPDATE statement's SET clause contains an IN/NOT IN subquery whose routing is compatible with the outer route's routing, the planner merges them but produces a broken plan: the emitted SQL contains a stray ::__sqN ListArg reference with no UncorrelatedSubquery primitive to supply the bind variable. Executing such a plan would fail with missing bind var __sqN from vttablet.

The bug is purely planner-level and reproduces with the existing select_cases.json plantest fixtures — no cluster needed.

Cases that are not affected

  • DELETE ... WHERE ... IN (subquery) ... — the IN subquery in the WHERE clause goes through the route-predicates path and keeps the inline subquery. Works correctly.
  • SELECT ... 1 IN (subquery) ... — non-DML projections push down through rewriteSubqueryArgsForPullout, which converts the placeholder shape correctly per merged-vs-pulled-out outcome.

The bug is specific to DML SET/VALUES expressions containing IN/NOT IN subqueries that the planner chooses to merge. It almost certainly hasn't been hit in practice because most DML+IN tests use cross-keyspace or cross-routing patterns that can't merge anyway (e.g. the existing dml_cases.json entry "UPDATE: same subquery as scalar SET and IN SET" uses user_extra Scatter inside a user EqualUnique update — they can't merge, so the UncorrelatedSubquery wrapper stays). The bug only surfaces when both the outer DML and the inner IN subquery target the same shard set.

Root cause

Two pieces interact:

  1. SubQueryBuilder.replaceSubqueryNode (go/vt/vtgate/planbuilder/operators/subquery_builder.go) eagerly mints *sqlparser.ListArg for DML IN/NOT IN placeholders, because DML SET expressions don't run through the projection-pushdown pass that would otherwise convert a non-list *Argument to *ListArg later.

  2. rewriteMergedSubqueryExpr (go/vt/vtgate/planbuilder/operators/subquery_planning.go) walks expressions looking for *sqlparser.Argument placeholders to substitute back with the original *sqlparser.Subquery when a subquery merges:

    expr = sqlparser.Rewrite(expr, nil, func(cursor *sqlparser.Cursor) bool {
        arg, ok := cursor.Node().(*sqlparser.Argument)
        if !ok || arg.Name != sq.ArgName {
            return true
        }
        ...
        cursor.Replace(sq.originalSubquery)
        ...
    })

    sqlparser.ListArg is a distinct AST node type, so it never matches and never gets substituted. The placeholder leaks into the emitted SQL.

Expected behaviour

Either:

  1. The merge-time substitution path replaces the *ListArg placeholder with the original *Subquery (so the emitted SQL contains the inline subquery, just like DELETE ... WHERE col = (1 in (select id from user where id = 5)) does today — that case works correctly), or
  2. The planner refuses to merge IN/NOT IN subqueries into a DML outer route until option 1 is implemented, falling back to the UncorrelatedSubquery pullout shape.

Suggested fix direction

Extend rewriteMergedSubqueryExpr to also match sqlparser.ListArg (by name) and replace it with sq.originalSubquery. The other call sites that look for placeholder Arguments could be audited for the same gap (e.g. Ordering.settleOrderingExpressions, the pe.Original rewrite path).

I noticed this while investigating an unrelated CI failure on #19963 — happy to put up a fix in a follow-up if it's helpful.

Reproduction Steps

  1. Check out main (verified at commit 507cb822cc).

  2. Add this entry to go/vt/vtgate/planbuilder/testdata/select_cases.json anywhere — I used the slot just before "Reference with a subquery which cannot be merged":

    {
      "comment": "REPRO update with IN subquery merging into outer route",
      "query": "update user set col = (1 in (select id from user where id = 5)) where id = 5",
      "plan": "REGENERATE_ME"
    }
  3. Run:

    go test ./go/vt/vtgate/planbuilder/ -run "TestPlanTestSuite/TestPlan/select_cases.json/REPRO" -v
    
  4. The generated plan is:

    {
      "Type": "Passthrough",
      "QueryType": "UPDATE",
      "Original": "update user set col = (1 in (select id from user where id = 5)) where id = 5",
      "Instructions": {
        "OperatorType": "Update",
        "Variant": "EqualUnique",
        "Keyspace": { "Name": "user", "Sharded": true },
        "Query": "update `user` set col = 1 in ::__sq1 where id = 5",
        "Values": ["5"],
        "Vindex": "user_index"
      },
      "TablesUsed": ["user.user"]
    }

    Note the emitted Query contains ::__sq1, but there is no UncorrelatedSubquery primitive above the Update to populate __sq1. Sending this to vttablet would fail with missing bind var __sq1.

The same shape repros for NOT IN:

update user set col = (1 not in (select id from user where id = 5)) where id = 5

Query: "update \user` set col = 1 not in ::__sq1 where id = 5"` (same broken shape).

Binary Version

vitessio/vitess `main` at 507cb822cc (most recent commit on main as of 2026-04-27)

Operating System and Environment details

n/a — pure planner-level reproduction. No cluster, vttablet, or specific OS/arch needed; the bug is observable directly from the plantest output.

Log Fragments

n/a — see the plantest plan output under "Reproduction Steps" above. The bug is the planner emitting `::__sq1` with no `UncorrelatedSubquery` primitive to back it; there is no runtime log because no one has hit this in production (the bug requires DML+IN with compatible routing, which existing tests don't exercise).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions