Skip to content

Commit 4a6afb4

Browse files
wangyumdongjoon-hyun
authored andcommitted
[SPARK-36280][SQL] Remove redundant aliases after RewritePredicateSubquery
### What changes were proposed in this pull request? Remove redundant aliases after `RewritePredicateSubquery`. For example: ```scala sql("CREATE TABLE t1 USING parquet AS SELECT id AS a, id AS b, id AS c FROM range(10)") sql("CREATE TABLE t2 USING parquet AS SELECT id AS x, id AS y FROM range(8)") sql( """ |SELECT * |FROM t1 |WHERE a IN (SELECT x | FROM (SELECT x AS x, | Rank() OVER (partition BY x ORDER BY Sum(y) DESC) AS ranking | FROM t2 | GROUP BY x) tmp1 | WHERE ranking <= 5) |""".stripMargin).explain ``` Before this PR: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- BroadcastHashJoin [a#10L], [x#7L], LeftSemi, BuildRight, false :- FileScan parquet default.t1[a#10L,b#11L,c#12L] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#68] +- Project [x#7L] +- Filter (ranking#8 <= 5) +- Window [rank(_w2#25L) windowspecdefinition(x#15L, _w2#25L DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS ranking#8], [x#15L], [_w2#25L DESC NULLS LAST] +- Sort [x#15L ASC NULLS FIRST, _w2#25L DESC NULLS LAST], false, 0 +- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=#62] +- HashAggregate(keys=[x#15L], functions=[sum(y#16L)]) +- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=#59] +- HashAggregate(keys=[x#15L], functions=[partial_sum(y#16L)]) +- FileScan parquet default.t2[x#15L,y#16L] ``` After this PR: ``` == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- BroadcastHashJoin [a#10L], [x#15L], LeftSemi, BuildRight, false :- FileScan parquet default.t1[a#10L,b#11L,c#12L] +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, true]),false), [id=#67] +- Project [x#15L] +- Filter (ranking#8 <= 5) +- Window [rank(_w2#25L) windowspecdefinition(x#15L, _w2#25L DESC NULLS LAST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS ranking#8], [x#15L], [_w2#25L DESC NULLS LAST] +- Sort [x#15L ASC NULLS FIRST, _w2#25L DESC NULLS LAST], false, 0 +- HashAggregate(keys=[x#15L], functions=[sum(y#16L)]) +- Exchange hashpartitioning(x#15L, 5), ENSURE_REQUIREMENTS, [id=#59] +- HashAggregate(keys=[x#15L], functions=[partial_sum(y#16L)]) +- FileScan parquet default.t2[x#15L,y#16L] ``` ### Why are the changes needed? Reduce shuffle to improve query performance. This change can benefit TPC-DS q70. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #33509 from wangyum/SPARK-36280. Authored-by: Yuming Wang <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 67cbc93 commit 4a6afb4

File tree

26 files changed

+2394
-2460
lines changed

26 files changed

+2394
-2460
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -234,6 +234,7 @@ abstract class Optimizer(catalogManager: CatalogManager)
234234
RewritePredicateSubquery,
235235
ColumnPruning,
236236
CollapseProject,
237+
RemoveRedundantAliases,
237238
RemoveNoopOperators) :+
238239
// This batch must be executed after the `RewriteSubquery` batch, which creates joins.
239240
Batch("NormalizeFloatingNumbers", Once, NormalizeFloatingNumbers) :+

sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/explain.txt

Lines changed: 138 additions & 143 deletions
Large diffs are not rendered by default.

sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q11.sf100/simplified.txt

Lines changed: 28 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -100,35 +100,34 @@ TakeOrderedAndProject [customer_preferred_cust_flag]
100100
InputAdapter
101101
Exchange [customer_id] #10
102102
WholeStageCodegen (24)
103-
Project [customer_id,year_total]
104-
Filter [year_total]
105-
HashAggregate [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year,sum] [sum(UnscaledValue(CheckOverflow((promote_precision(cast(ws_ext_list_price as decimal(8,2))) - promote_precision(cast(ws_ext_discount_amt as decimal(8,2)))), DecimalType(8,2), true))),customer_id,year_total,sum]
106-
InputAdapter
107-
Exchange [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year] #11
108-
WholeStageCodegen (23)
109-
HashAggregate [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year,ws_ext_list_price,ws_ext_discount_amt] [sum,sum]
110-
Project [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,ws_ext_discount_amt,ws_ext_list_price,d_year]
111-
SortMergeJoin [ws_bill_customer_sk,c_customer_sk]
112-
InputAdapter
113-
WholeStageCodegen (20)
114-
Sort [ws_bill_customer_sk]
115-
InputAdapter
116-
Exchange [ws_bill_customer_sk] #12
117-
WholeStageCodegen (19)
118-
Project [ws_bill_customer_sk,ws_ext_discount_amt,ws_ext_list_price,d_year]
119-
BroadcastHashJoin [ws_sold_date_sk,d_date_sk]
120-
Filter [ws_bill_customer_sk]
121-
ColumnarToRow
122-
InputAdapter
123-
Scan parquet default.web_sales [ws_bill_customer_sk,ws_ext_discount_amt,ws_ext_list_price,ws_sold_date_sk]
124-
ReusedSubquery [d_date_sk] #1
125-
InputAdapter
126-
ReusedExchange [d_date_sk,d_year] #4
127-
InputAdapter
128-
WholeStageCodegen (22)
129-
Sort [c_customer_sk]
130-
InputAdapter
131-
ReusedExchange [c_customer_sk,c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address] #5
103+
Filter [year_total]
104+
HashAggregate [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year,sum] [sum(UnscaledValue(CheckOverflow((promote_precision(cast(ws_ext_list_price as decimal(8,2))) - promote_precision(cast(ws_ext_discount_amt as decimal(8,2)))), DecimalType(8,2), true))),customer_id,year_total,sum]
105+
InputAdapter
106+
Exchange [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year] #11
107+
WholeStageCodegen (23)
108+
HashAggregate [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,d_year,ws_ext_list_price,ws_ext_discount_amt] [sum,sum]
109+
Project [c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address,ws_ext_discount_amt,ws_ext_list_price,d_year]
110+
SortMergeJoin [ws_bill_customer_sk,c_customer_sk]
111+
InputAdapter
112+
WholeStageCodegen (20)
113+
Sort [ws_bill_customer_sk]
114+
InputAdapter
115+
Exchange [ws_bill_customer_sk] #12
116+
WholeStageCodegen (19)
117+
Project [ws_bill_customer_sk,ws_ext_discount_amt,ws_ext_list_price,d_year]
118+
BroadcastHashJoin [ws_sold_date_sk,d_date_sk]
119+
Filter [ws_bill_customer_sk]
120+
ColumnarToRow
121+
InputAdapter
122+
Scan parquet default.web_sales [ws_bill_customer_sk,ws_ext_discount_amt,ws_ext_list_price,ws_sold_date_sk]
123+
ReusedSubquery [d_date_sk] #1
124+
InputAdapter
125+
ReusedExchange [d_date_sk,d_year] #4
126+
InputAdapter
127+
WholeStageCodegen (22)
128+
Sort [c_customer_sk]
129+
InputAdapter
130+
ReusedExchange [c_customer_sk,c_customer_id,c_first_name,c_last_name,c_preferred_cust_flag,c_birth_country,c_login,c_email_address] #5
132131
InputAdapter
133132
WholeStageCodegen (34)
134133
Sort [customer_id]

0 commit comments

Comments
 (0)