You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-36280][SQL] Remove redundant aliases after RewritePredicateSubquery
### What changes were proposed in this pull request?
Remove redundant aliases after `RewritePredicateSubquery`. For example:
```scala
sql("CREATE TABLE t1 USING parquet AS SELECT id AS a, id AS b, id AS c FROM range(10)")
sql("CREATE TABLE t2 USING parquet AS SELECT id AS x, id AS y FROM range(8)")
sql(
"""
|SELECT *
|FROM t1
|WHERE a IN (SELECT x
| FROM (SELECT x AS x,
| Rank() OVER (partition BY x ORDER BY Sum(y) DESC) AS ranking
| FROM t2
| GROUP BY x) tmp1
| WHERE ranking <= 5)
|""".stripMargin).explain
```
Before this PR:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastHashJoin [a#10L], [x#7L], LeftSemi, BuildRight, false
:- FileScan parquet default.t1[a#10L,b#11L,c#12L]
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#68]
+- Project [x#7L]
+- Filter (ranking#8 <= 5)
+- Window [rank(_w2#25L) windowspecdefinition(x#15L, _w2#25L DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS ranking#8], [x#15L], [_w2#25L DESC NULLS LAST]
+- Sort [x#15L ASC NULLS FIRST, _w2#25L DESC NULLS LAST], false, 0
+- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=#62]
+- HashAggregate(keys=[x#15L], functions=[sum(y#16L)])
+- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=#59]
+- HashAggregate(keys=[x#15L], functions=[partial_sum(y#16L)])
+- FileScan parquet default.t2[x#15L,y#16L]
```
After this PR:
```
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastHashJoin [a#10L], [x#15L], LeftSemi, BuildRight, false
:- FileScan parquet default.t1[a#10L,b#11L,c#12L]
+- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#67]
+- Project [x#15L]
+- Filter (ranking#8 <= 5)
+- Window [rank(_w2#25L) windowspecdefinition(x#15L, _w2#25L DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS ranking#8], [x#15L], [_w2#25L DESC NULLS LAST]
+- Sort [x#15L ASC NULLS FIRST, _w2#25L DESC NULLS LAST], false, 0
+- HashAggregate(keys=[x#15L], functions=[sum(y#16L)])
+- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=#59]
+- HashAggregate(keys=[x#15L], functions=[partial_sum(y#16L)])
+- FileScan parquet default.t2[x#15L,y#16L]
```
### Why are the changes needed?
Reduce shuffle to improve query performance. This change can benefit TPC-DS q70.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test.
Closes#33509 from wangyum/SPARK-36280.
Authored-by: Yuming Wang <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
0 commit comments