Run empty values node simplify optimizer after connector optimizer#25155
Conversation
8317642 to
9d8039d
Compare
There was a problem hiding this comment.
A noob question, do you have any non-default configuration for optimizer or hive connector? In my local, the plan of the SQL in your example was as follows when first sent to SimplifyPlanWithEmptyInput, after optimized by the logical phase of connector optimizer:
- Output[PlanNodeId 15][col1] => [col1:integer]
- RightJoin[PlanNodeId 10][(VARCHAR'xxx') = (col2)] => [col1:integer]
- InnerJoin[PlanNodeId 4][("col1" = "col1_0")] => [col1:integer]
- Project[PlanNodeId 421][projectLocality = LOCAL] => [col1:integer]
- Values[PlanNodeId 454] => [col1:integer]
- Project[PlanNodeId 422][projectLocality = LOCAL] => [col1_0:integer]
- Values[PlanNodeId 455] => [col1_0:integer]
- ScanProject[PlanNodeId 6,423][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=default, tableName=t3, analyzePartitionValues=Optional.empty}', layout='Optional[default.t3{}]'}, projectLocality = LOCAL] => [col2:varchar(10)]
LAYOUT: default.t3{}
col2 := col2:varchar(10):-13:PARTITION_KEY (1:95)
:: [["2024-01-05"]]
And after SimplifyPlanWithEmptyInput's optimization, the plan was as follows:
- Output[PlanNodeId 15][col1] => [col1:integer]
- Project[PlanNodeId 475][projectLocality = LOCAL] => [col1:integer]
col1 := null (1:20)
- ScanProject[PlanNodeId 6,423][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=default, tableName=t3, analyzePartitionValues=Optional.empty}', layout='Optional[default.t3{}]'}, projectLocality = LOCAL] => [col2:varchar(10)]
LAYOUT: default.t3{}
col2 := col2:varchar(10):-13:PARTITION_KEY (1:95)
:: [["2024-01-05"]]
After this, the plan didn't change anymore by the physical phase of connector optimizer. Am I miss anything?
Oh, the hive.pushdown_filter_enabled needs to be set to true. Thanks for the catch. Also updated in the description now. |
| ImmutableSet.of(new RemoveRedundantIdentityProjections(), new PruneRedundantProjectionAssignments()))); | ||
|
|
||
| // Pass after connector optimizer, as it relies on connector optimizer to identify empty input tables and convert them to empty ValuesNode | ||
| builder.add(new SimplifyPlanWithEmptyInput()); |
There was a problem hiding this comment.
does it need PruneUnreferencedOutputs too like after the logical connector PlanOptimizers?
There was a problem hiding this comment.
The PruneUnreferencedOutputs works only for outer joins which are converted to projection when the inner side is empty, which in some cases the join keys are not in output and need to be pruned. It does not affect validity of the plan, but will further simplify the plan to allow later optimizer simplify the plan further. Since this instance of SimplifyPlanWithEmptyInput is running at the very end, and the PruneUnreferencedOutputs optimizer does not handle some nodes for example merge join node which may be in the plan by this stage, not adding it should be fine here (and can draft another PR to fix the PruneUnreferencedOutputs if needed).
|
Thanks for the release note entry! Suggestion to help follow the Order of changes recommended phrasing in the Release Notes Guidelines. |
|
@feilong-liu Got it, thanks for the explanation. |
Description
The connector optimizer can convert a table scan which returns no data to an empty values node. We have an optimizer SimplifyPlanWithEmptyInput to simplify plan with empty values node.
There are two runs of connector optimizer, logical and physical:
https://github.com/prestodb/presto/blob/master/presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java#L727-L730
https://github.com/prestodb/presto/blob/master/presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java#L951-L958
Previously we only run it after the run of logical connector optimizer. However, turns out that the empty values node conversion also happens after physical run.
One example:
The reason is because the query relies on the first run of connector optimizer to push filters down into table scan. Later during predicate pushdown, it will find that there are no col2 equals to 'xxx' in table t3, and leads to empty table conversion in the later run of connector optimizer.
So run the SimplifyPlanWithEmptyInput once more after physical run.
Motivation and Context
Optimize query plans with empty input
Impact
Optimize query plans with empty input
Test Plan
Existing unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.