feat(analyzer): Allow HAVING in materialized view query rewrite#27677
feat(analyzer): Allow HAVING in materialized view query rewrite#27677tdcmeehan wants to merge 2 commits intoprestodb:masterfrom
Conversation
The rewriter already remaps columns and aggregates inside HAVING the same way it does for WHERE/SELECT, and plan-level PredicatePushDown handles HAVING-to-WHERE pushdown post-planning. The MV-definition rejection is unaffected.
Reviewer's GuideAllows HAVING clauses in user queries participating in materialized view rewrites by removing previous HAVING prohibitions in the optimizer and shape validator, and adds tests to verify correct rewrite behavior, column remapping, and runtime equivalence against base-table execution. Sequence diagram for query planning with HAVING and materialized view rewritesequenceDiagram
actor User
participant PrestoCoordinator
participant Analyzer as MaterializedViewRewriteQueryShapeValidator
participant MVOptimizer as MaterializedViewQueryOptimizer
participant Planner as LogicalPlanner_PredicatePushDown
participant Engine as ExecutionEngine
User->>PrestoCoordinator: submit SELECT ... GROUP BY ... HAVING ...
PrestoCoordinator->>Analyzer: validateMaterializedViewOptimizationQueryShape(querySpecification)
Analyzer-->>PrestoCoordinator: validationResult (HAVING allowed)
PrestoCoordinator->>MVOptimizer: visitQuerySpecification(querySpecification)
MVOptimizer->>MVOptimizer: match materialized view
MVOptimizer->>MVOptimizer: remap SELECT, WHERE, GROUP BY, HAVING onto MV
MVOptimizer-->>PrestoCoordinator: rewrittenQueryUsingMaterializedView
PrestoCoordinator->>Planner: build logical plan(rewrittenQueryUsingMaterializedView)
Planner->>Planner: apply PredicatePushDown (may push HAVING to WHERE)
Planner-->>PrestoCoordinator: optimizedPlan
PrestoCoordinator->>Engine: executePlan(optimizedPlan)
Engine-->>PrestoCoordinator: queryResult
PrestoCoordinator-->>User: queryResult
Class diagram for updated materialized view rewrite componentsclassDiagram
class MaterializedViewRewriteQueryShapeValidator {
- Optional~String~ errorMessage
+ Optional~String~ validateMaterializedViewOptimizationQueryShape(QuerySpecification querySpecification)
}
class MaterializedViewQueryOptimizer {
+ Node visitQuerySpecification(QuerySpecification node, Void context)
- Optional~List~ expressionsInGroupBy
- MaterializedViewInfo materializedViewInfo
}
MaterializedViewRewriteQueryShapeValidator ..> QuerySpecification : validates
MaterializedViewQueryOptimizer ..> QuerySpecification : visits
MaterializedViewQueryOptimizer ..> MaterializedViewInfo : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The four new
testBaseToViewConversionWithHaving*methods inTestHiveMaterializedViewLogicalPlannerrepeat the same table/view creation and refresh logic; consider extracting a small helper to set up the partitioned table, materialized view, and refresh calls to reduce duplication and make future HAVING-related cases easier to add.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The four new `testBaseToViewConversionWithHaving*` methods in `TestHiveMaterializedViewLogicalPlanner` repeat the same table/view creation and refresh logic; consider extracting a small helper to set up the partitioned table, materialized view, and refresh calls to reduce duplication and make future HAVING-related cases easier to add.
## Individual Comments
### Comment 1
<location path="presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java" line_range="1379-1368" />
<code_context>
+ public void testBaseToViewConversionWithHavingOnAggregate()
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test that asserts the rewrite actually uses the materialized view, not just that results match
The new tests only compare `computeActual` with and without `QUERY_OPTIMIZATION_WITH_MATERIALIZED_VIEW_ENABLED`, which checks correctness but not that the optimizer actually uses the materialized view. Please add at least one test (for example, for this HAVING-on-aggregate case) that inspects the plan or uses a helper to assert that the materialized view appears in the plan, so we can detect regressions where the optimizer silently falls back to the base table.
Suggested implementation:
```java
MaterializedResult baseQueryResult = computeActual(baseQuery);
assertEquals(optimizedQueryResult, baseQueryResult);
// Assert that the optimizer actually rewrites the base query to use the materialized view
assertPlan(
queryOptimizationWithMaterializedView,
baseQuery,
anyTree(tableScan(view)));
}
finally {
```
1. Ensure that this block is inside `testBaseToViewConversionWithHavingOnAggregate` (or another appropriate test that defines `queryOptimizationWithMaterializedView`, `baseQuery`, and `view` in scope). If this `assertEquals` block is shared across multiple tests, you may want to move the new `assertPlan` into the specific HAVING-on-aggregate test instead.
2. Confirm that `assertPlan`, `anyTree`, and `tableScan` are already statically imported in this test class. If not, add:
- `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;`
- `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;`
- and ensure you can call `assertPlan` on this test class (it should be inherited from `AbstractTestQueryFramework` or a similar base).
3. If the materialized view scan in the plan does not appear as a simple `tableScan(view)`, you may need to adjust the pattern (e.g., wrapping in `project(...)` or `aggregation(...)`) to match the actual plan shape for the HAVING-on-aggregate query while still ensuring the view name appears in the scan node.
</issue_to_address>
### Comment 2
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="213-209" />
<code_context>
}
+ @Test
+ public void testHavingPreservedThroughRewrite()
+ {
+ String originalViewSql = format("SELECT a, b, c FROM %s", BASE_TABLE_1);
+ String baseQuerySql = format("SELECT SUM(a), c FROM %s GROUP BY c HAVING c > 'X'", BASE_TABLE_1);
+ String expectedRewrittenSql = format("SELECT SUM(a), c FROM %s GROUP BY c HAVING c > 'X'", VIEW_1);
+
+ assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
+ }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add optimizer coverage for HAVING predicates on aggregates being correctly remapped
These tests cover HAVING on a grouping key and column renaming, but not the case where HAVING references an aggregate that needs remapping to the view’s columns. To fully cover this change, please add a test like:
```java
String originalViewSql = format("SELECT a as mv_a, c FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT SUM(a) FROM %s GROUP BY c HAVING SUM(a) > 10", BASE_TABLE_1);
String expectedRewrittenSql = format("SELECT SUM(mv_a) FROM %s GROUP BY c HAVING SUM(mv_a) > 10", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
```
This would confirm aggregate expressions inside HAVING are rewritten consistently with SELECT/GROUP BY.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| "GROUP BY ds, shipmode HAVING shipmode = 'AIR' ORDER BY ds, shipmode", | ||
| table); | ||
|
|
||
| MaterializedResult optimizedQueryResult = computeActual(queryOptimizationWithMaterializedView, baseQuery); |
There was a problem hiding this comment.
suggestion (testing): Add a test that asserts the rewrite actually uses the materialized view, not just that results match
The new tests only compare computeActual with and without QUERY_OPTIMIZATION_WITH_MATERIALIZED_VIEW_ENABLED, which checks correctness but not that the optimizer actually uses the materialized view. Please add at least one test (for example, for this HAVING-on-aggregate case) that inspects the plan or uses a helper to assert that the materialized view appears in the plan, so we can detect regressions where the optimizer silently falls back to the base table.
Suggested implementation:
MaterializedResult baseQueryResult = computeActual(baseQuery);
assertEquals(optimizedQueryResult, baseQueryResult);
// Assert that the optimizer actually rewrites the base query to use the materialized view
assertPlan(
queryOptimizationWithMaterializedView,
baseQuery,
anyTree(tableScan(view)));
}
finally {- Ensure that this block is inside
testBaseToViewConversionWithHavingOnAggregate(or another appropriate test that definesqueryOptimizationWithMaterializedView,baseQuery, andviewin scope). If thisassertEqualsblock is shared across multiple tests, you may want to move the newassertPlaninto the specific HAVING-on-aggregate test instead. - Confirm that
assertPlan,anyTree, andtableScanare already statically imported in this test class. If not, add:import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;- and ensure you can call
assertPlanon this test class (it should be inherited fromAbstractTestQueryFrameworkor a similar base).
- If the materialized view scan in the plan does not appear as a simple
tableScan(view), you may need to adjust the pattern (e.g., wrapping inproject(...)oraggregation(...)) to match the actual plan shape for the HAVING-on-aggregate query while still ensuring the view name appears in the scan node.
| @@ -209,6 +209,26 @@ public void testWithWhereCondition() | |||
| assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1); | |||
There was a problem hiding this comment.
suggestion (testing): Add optimizer coverage for HAVING predicates on aggregates being correctly remapped
These tests cover HAVING on a grouping key and column renaming, but not the case where HAVING references an aggregate that needs remapping to the view’s columns. To fully cover this change, please add a test like:
String originalViewSql = format("SELECT a as mv_a, c FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT SUM(a) FROM %s GROUP BY c HAVING SUM(a) > 10", BASE_TABLE_1);
String expectedRewrittenSql = format("SELECT SUM(mv_a) FROM %s GROUP BY c HAVING SUM(mv_a) > 10", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);This would confirm aggregate expressions inside HAVING are rewritten consistently with SELECT/GROUP BY.
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @tdcmeehan, the change looks good to me! Just some little nits about the tests.
| "GROUP BY ds, shipmode HAVING shipmode = 'AIR' ORDER BY ds, shipmode", | ||
| table); | ||
|
|
||
| assertEquals(computeActual(session, baseQuery), computeActual(baseQuery)); |
There was a problem hiding this comment.
nit: it would be better to validate the rewritten plan here:
assertPlan(session, baseQuery, anyTree(constrainedTableScan(view, ImmutableMap.of())));
| "HAVING SUM(discount * extendedprice) > 100 ORDER BY ds, shipmode", | ||
| table); | ||
|
|
||
| assertEquals(computeActual(session, baseQuery), computeActual(baseQuery)); |
There was a problem hiding this comment.
nit: the same as above:
assertPlan(session, baseQuery, anyTree(constrainedTableScan(view, ImmutableMap.of())));
| "GROUP BY ds, shipmode HAVING COUNT(extendedprice) > 5 ORDER BY ds, shipmode", | ||
| table); | ||
|
|
||
| assertEquals(computeActual(session, baseQuery), computeActual(baseQuery)); |
There was a problem hiding this comment.
nit: the same as above:
assertPlan(session, baseQuery, anyTree(constrainedTableScan(view, ImmutableMap.of())));
Description
Allow HAVING in user queries when transparently rewriting onto a materialized view. Drop the rejection in
MaterializedViewRewriteQueryShapeValidatorandMaterializedViewQueryOptimizer.visitQuerySpecification.Motivation and Context
The rewriter's visit methods already remap columns and aggregates inside HAVING the same way they do for WHERE/SELECT, and plan-level
PredicatePushDownhandles HAVING-to-WHERE pushdown post-planning. The original rejection conflated two distinct problems: HAVING in user queries (always safe) and HAVING in MV definitions (needs filter containment, separately tracked in #16406).Impact
Queries with HAVING that previously fell back to scanning the base table can now be rewritten onto a compatible materialized view. No SPI or syntax changes. MV-definition HAVING continues to be rejected.
Test Plan
Unit and integration tests have been added.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Summary by Sourcery
Allow queries with HAVING clauses to be eligible for materialized view-based optimization and ensure they are correctly rewritten and validated.
New Features:
Enhancements:
Tests: