[SPARK-55043][SQL] Fix time travel with subquery containing table references#53811
[SPARK-55043][SQL] Fix time travel with subquery containing table references#53811cloud-fan wants to merge 2 commits intoapache:masterfrom
Conversation
…erences ### What changes were proposed in this pull request? This PR fixes an issue where `TIMESTAMP AS OF (subquery)` fails when the subquery references a table. Before this fix, queries like: ```sql SELECT * FROM t TIMESTAMP AS OF (SELECT MIN(ts) FROM t) ``` would fail with: ``` assertion failed: No plan for SubqueryAlias testcat.t ``` ### Why are the changes needed? The `EvalSubqueriesForTimeTravel` analyzer rule was directly calling `QueryExecution.prepareExecutedPlan` on the subquery's inner plan, which failed to properly plan V2 table relations. ### Does this PR introduce _any_ user-facing change? Yes. Users can now use subqueries with table references in `TIMESTAMP AS OF` expressions. ### How was this patch tested? Added a new test case in `DataSourceV2SQLSuite` that verifies time travel with a subquery containing a table reference. ### Was this patch authored or co-authored using generative AI tooling? Yes.
JIRA Issue Information=== Bug SPARK-55043 === This comment was automatically generated by GitHub Actions |
| val spark = SparkSession.active | ||
| val qe = spark.sessionState.executePlan(wrappedPlan) | ||
| val result = qe.executedPlan.executeCollect().head.get(0, s.dataType) | ||
| Literal(result, s.dataType) |
There was a problem hiding this comment.
In case of NULL, what behavior are we expecting for time travel?
There was a problem hiding this comment.
null value will error out at TimeTravelSpec.create, before we call v2 catalog APIs.
| sql(s"INSERT INTO $t3 VALUES (6)") | ||
| sql(s"INSERT INTO $t4 VALUES (7)") | ||
| sql(s"INSERT INTO $t4 VALUES (8)") | ||
| sql(s"INSERT INTO t VALUES ('2019-01-29 00:37:58')") |
There was a problem hiding this comment.
how about we add a test here for the NULL case if we don't have yet?
| val t4 = s"testcat.t$ts2" | ||
|
|
||
| withTable(t3, t4) { | ||
| withTable(t3, t4, "t") { |
There was a problem hiding this comment.
the "t" really stands out here, shall we predef it?
| val spark = SparkSession.active | ||
| val qe = spark.sessionState.executePlan(wrappedPlan) | ||
| val result = qe.executedPlan.executeCollect().head.get(0, s.dataType) | ||
| Literal(result, s.dataType) |
There was a problem hiding this comment.
Is implicit casting allowed here? E.g.
SELECT * FROM t TIMESTAMP AS OF (SELECT MIN(date_type_col) from t)
There was a problem hiding this comment.
Cast is explicitly handled in TimeTravelSpec.create, not relying on the type coercion framework.
We also have test for it. the time travel test in DataSourceV2SQLSuite.scala uses string as timestamp.
|
thanks for review, merging to master! |
What changes were proposed in this pull request?
This PR fixes an issue where
TIMESTAMP AS OF (subquery)fails when the subquery references a table.Before this fix, queries like:
would fail with:
The fix changes
EvalSubqueriesForTimeTravelto wrap the scalar subquery in aProjectoverOneRowRelationand execute it through the normal query execution path (sessionState.executePlan), which properly handles table references including V2 tables.Why are the changes needed?
The
EvalSubqueriesForTimeTravelanalyzer rule was directly callingQueryExecution.prepareExecutedPlanon the subquery's inner plan, which failed to properly plan V2 table relations.Does this PR introduce any user-facing change?
Yes. Users can now use subqueries with table references in
TIMESTAMP AS OFexpressions.How was this patch tested?
Added a new test case in
DataSourceV2SQLSuitethat verifies time travel with a subquery containing a table reference.Was this patch authored or co-authored using generative AI tooling?
Yes.