Skip to content

feat(analyzer): Allow HAVING in materialized view query rewrite#27677

Open
tdcmeehan wants to merge 2 commits intoprestodb:masterfrom
tdcmeehan:having
Open

feat(analyzer): Allow HAVING in materialized view query rewrite#27677
tdcmeehan wants to merge 2 commits intoprestodb:masterfrom
tdcmeehan:having

Conversation

@tdcmeehan
Copy link
Copy Markdown
Contributor

@tdcmeehan tdcmeehan commented Apr 28, 2026

Description

Allow HAVING in user queries when transparently rewriting onto a materialized view. Drop the rejection in MaterializedViewRewriteQueryShapeValidator and MaterializedViewQueryOptimizer.visitQuerySpecification.

Motivation and Context

The rewriter's visit methods already remap columns and aggregates inside HAVING the same way they do for WHERE/SELECT, and plan-level PredicatePushDown handles HAVING-to-WHERE pushdown post-planning. The original rejection conflated two distinct problems: HAVING in user queries (always safe) and HAVING in MV definitions (needs filter containment, separately tracked in #16406).

Impact

Queries with HAVING that previously fell back to scanning the base table can now be rewritten onto a compatible materialized view. No SPI or syntax changes. MV-definition HAVING continues to be rejected.

Test Plan

Unit and integration tests have been added.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Allow HAVING in queries that are transparently rewritten onto a materialized view. 

Summary by Sourcery

Allow queries with HAVING clauses to be eligible for materialized view-based optimization and ensure they are correctly rewritten and validated.

New Features:

  • Support materialized view query rewrites for base queries that include HAVING clauses on grouping keys and aggregates.

Enhancements:

  • Remove restrictions in the materialized view query optimizer and shape validator that previously rejected HAVING clauses in eligible queries.

Tests:

  • Add planner and analyzer tests covering HAVING on grouping keys, aggregates, combinations with WHERE, and COUNT-based predicates to verify correctness of materialized view rewrites.

The rewriter already remaps columns and aggregates inside HAVING
the same way it does for WHERE/SELECT, and plan-level
PredicatePushDown handles HAVING-to-WHERE pushdown post-planning.
The MV-definition rejection is unaffected.
@tdcmeehan tdcmeehan requested review from a team, feilong-liu and jaystarshot as code owners April 28, 2026 17:06
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Apr 28, 2026
@prestodb-ci prestodb-ci requested review from a team, imsayari404 and xin-zhang2 and removed request for a team April 28, 2026 17:06
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 28, 2026

Reviewer's Guide

Allows HAVING clauses in user queries participating in materialized view rewrites by removing previous HAVING prohibitions in the optimizer and shape validator, and adds tests to verify correct rewrite behavior, column remapping, and runtime equivalence against base-table execution.

Sequence diagram for query planning with HAVING and materialized view rewrite

sequenceDiagram
    actor User
    participant PrestoCoordinator
    participant Analyzer as MaterializedViewRewriteQueryShapeValidator
    participant MVOptimizer as MaterializedViewQueryOptimizer
    participant Planner as LogicalPlanner_PredicatePushDown
    participant Engine as ExecutionEngine

    User->>PrestoCoordinator: submit SELECT ... GROUP BY ... HAVING ...
    PrestoCoordinator->>Analyzer: validateMaterializedViewOptimizationQueryShape(querySpecification)
    Analyzer-->>PrestoCoordinator: validationResult (HAVING allowed)

    PrestoCoordinator->>MVOptimizer: visitQuerySpecification(querySpecification)
    MVOptimizer->>MVOptimizer: match materialized view
    MVOptimizer->>MVOptimizer: remap SELECT, WHERE, GROUP BY, HAVING onto MV
    MVOptimizer-->>PrestoCoordinator: rewrittenQueryUsingMaterializedView

    PrestoCoordinator->>Planner: build logical plan(rewrittenQueryUsingMaterializedView)
    Planner->>Planner: apply PredicatePushDown (may push HAVING to WHERE)
    Planner-->>PrestoCoordinator: optimizedPlan

    PrestoCoordinator->>Engine: executePlan(optimizedPlan)
    Engine-->>PrestoCoordinator: queryResult
    PrestoCoordinator-->>User: queryResult
Loading

Class diagram for updated materialized view rewrite components

classDiagram
    class MaterializedViewRewriteQueryShapeValidator {
        - Optional~String~ errorMessage
        + Optional~String~ validateMaterializedViewOptimizationQueryShape(QuerySpecification querySpecification)
    }

    class MaterializedViewQueryOptimizer {
        + Node visitQuerySpecification(QuerySpecification node, Void context)
        - Optional~List~ expressionsInGroupBy
        - MaterializedViewInfo materializedViewInfo
    }

    MaterializedViewRewriteQueryShapeValidator ..> QuerySpecification : validates
    MaterializedViewQueryOptimizer ..> QuerySpecification : visits
    MaterializedViewQueryOptimizer ..> MaterializedViewInfo : uses
Loading

File-Level Changes

Change Details Files
Permit HAVING clauses in materialized-view-optimized queries by dropping previous validation/optimizer rejections.
  • Remove SemanticException thrown when QuerySpecification contains a HAVING clause in the materialized view query optimizer visitor
  • Stop marking HAVING as an invalid query shape in the materialized view rewrite query shape validator, keeping only the from-clause requirement
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewRewriteQueryShapeValidator.java
Extend validator and optimizer tests to cover allowed HAVING usage and correct rewrite semantics, including column renames.
  • Change the validator test to assert that several HAVING patterns now succeed instead of failing
  • Add tests ensuring HAVING predicates are preserved through rewrite and that HAVING referencing renamed columns is correctly remapped from base table to view
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewRewriteQueryShapeValidator.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java
Add integration tests for Hive materialized view rewrites involving HAVING clauses to ensure runtime equivalence with base-table queries.
  • Add four end-to-end Hive tests covering HAVING on grouping keys, HAVING on aggregates, HAVING combined with WHERE, and HAVING with COUNT aggregates
  • Each test creates a partitioned table and MV, refreshes partitions, runs the base query with and without MV optimization enabled, and asserts identical results
presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The four new testBaseToViewConversionWithHaving* methods in TestHiveMaterializedViewLogicalPlanner repeat the same table/view creation and refresh logic; consider extracting a small helper to set up the partitioned table, materialized view, and refresh calls to reduce duplication and make future HAVING-related cases easier to add.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The four new `testBaseToViewConversionWithHaving*` methods in `TestHiveMaterializedViewLogicalPlanner` repeat the same table/view creation and refresh logic; consider extracting a small helper to set up the partitioned table, materialized view, and refresh calls to reduce duplication and make future HAVING-related cases easier to add.

## Individual Comments

### Comment 1
<location path="presto-hive/src/test/java/com/facebook/presto/hive/TestHiveMaterializedViewLogicalPlanner.java" line_range="1379-1368" />
<code_context>
+    public void testBaseToViewConversionWithHavingOnAggregate()
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test that asserts the rewrite actually uses the materialized view, not just that results match

The new tests only compare `computeActual` with and without `QUERY_OPTIMIZATION_WITH_MATERIALIZED_VIEW_ENABLED`, which checks correctness but not that the optimizer actually uses the materialized view. Please add at least one test (for example, for this HAVING-on-aggregate case) that inspects the plan or uses a helper to assert that the materialized view appears in the plan, so we can detect regressions where the optimizer silently falls back to the base table.

Suggested implementation:

```java
            MaterializedResult baseQueryResult = computeActual(baseQuery);
            assertEquals(optimizedQueryResult, baseQueryResult);

            // Assert that the optimizer actually rewrites the base query to use the materialized view
            assertPlan(
                    queryOptimizationWithMaterializedView,
                    baseQuery,
                    anyTree(tableScan(view)));
        }
        finally {

```

1. Ensure that this block is inside `testBaseToViewConversionWithHavingOnAggregate` (or another appropriate test that defines `queryOptimizationWithMaterializedView`, `baseQuery`, and `view` in scope). If this `assertEquals` block is shared across multiple tests, you may want to move the new `assertPlan` into the specific HAVING-on-aggregate test instead.
2. Confirm that `assertPlan`, `anyTree`, and `tableScan` are already statically imported in this test class. If not, add:
   - `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;`
   - `import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;`
   - and ensure you can call `assertPlan` on this test class (it should be inherited from `AbstractTestQueryFramework` or a similar base).
3. If the materialized view scan in the plan does not appear as a simple `tableScan(view)`, you may need to adjust the pattern (e.g., wrapping in `project(...)` or `aggregation(...)`) to match the actual plan shape for the HAVING-on-aggregate query while still ensuring the view name appears in the scan node.
</issue_to_address>

### Comment 2
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestMaterializedViewQueryOptimizer.java" line_range="213-209" />
<code_context>
     }

+    @Test
+    public void testHavingPreservedThroughRewrite()
+    {
+        String originalViewSql = format("SELECT a, b, c FROM %s", BASE_TABLE_1);
+        String baseQuerySql = format("SELECT SUM(a), c FROM %s GROUP BY c HAVING c > 'X'", BASE_TABLE_1);
+        String expectedRewrittenSql = format("SELECT SUM(a), c FROM %s GROUP BY c HAVING c > 'X'", VIEW_1);
+
+        assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
+    }
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add optimizer coverage for HAVING predicates on aggregates being correctly remapped

These tests cover HAVING on a grouping key and column renaming, but not the case where HAVING references an aggregate that needs remapping to the view’s columns. To fully cover this change, please add a test like:

```java
String originalViewSql = format("SELECT a as mv_a, c FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT SUM(a) FROM %s GROUP BY c HAVING SUM(a) > 10", BASE_TABLE_1);
String expectedRewrittenSql = format("SELECT SUM(mv_a) FROM %s GROUP BY c HAVING SUM(mv_a) > 10", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
```

This would confirm aggregate expressions inside HAVING are rewritten consistently with SELECT/GROUP BY.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

"GROUP BY ds, shipmode HAVING shipmode = 'AIR' ORDER BY ds, shipmode",
table);

MaterializedResult optimizedQueryResult = computeActual(queryOptimizationWithMaterializedView, baseQuery);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add a test that asserts the rewrite actually uses the materialized view, not just that results match

The new tests only compare computeActual with and without QUERY_OPTIMIZATION_WITH_MATERIALIZED_VIEW_ENABLED, which checks correctness but not that the optimizer actually uses the materialized view. Please add at least one test (for example, for this HAVING-on-aggregate case) that inspects the plan or uses a helper to assert that the materialized view appears in the plan, so we can detect regressions where the optimizer silently falls back to the base table.

Suggested implementation:

            MaterializedResult baseQueryResult = computeActual(baseQuery);
            assertEquals(optimizedQueryResult, baseQueryResult);

            // Assert that the optimizer actually rewrites the base query to use the materialized view
            assertPlan(
                    queryOptimizationWithMaterializedView,
                    baseQuery,
                    anyTree(tableScan(view)));
        }
        finally {
  1. Ensure that this block is inside testBaseToViewConversionWithHavingOnAggregate (or another appropriate test that defines queryOptimizationWithMaterializedView, baseQuery, and view in scope). If this assertEquals block is shared across multiple tests, you may want to move the new assertPlan into the specific HAVING-on-aggregate test instead.
  2. Confirm that assertPlan, anyTree, and tableScan are already statically imported in this test class. If not, add:
    • import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.anyTree;
    • import static com.facebook.presto.sql.planner.assertions.PlanMatchPattern.tableScan;
    • and ensure you can call assertPlan on this test class (it should be inherited from AbstractTestQueryFramework or a similar base).
  3. If the materialized view scan in the plan does not appear as a simple tableScan(view), you may need to adjust the pattern (e.g., wrapping in project(...) or aggregation(...)) to match the actual plan shape for the HAVING-on-aggregate query while still ensuring the view name appears in the scan node.

@@ -209,6 +209,26 @@ public void testWithWhereCondition()
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add optimizer coverage for HAVING predicates on aggregates being correctly remapped

These tests cover HAVING on a grouping key and column renaming, but not the case where HAVING references an aggregate that needs remapping to the view’s columns. To fully cover this change, please add a test like:

String originalViewSql = format("SELECT a as mv_a, c FROM %s", BASE_TABLE_1);
String baseQuerySql = format("SELECT SUM(a) FROM %s GROUP BY c HAVING SUM(a) > 10", BASE_TABLE_1);
String expectedRewrittenSql = format("SELECT SUM(mv_a) FROM %s GROUP BY c HAVING SUM(mv_a) > 10", VIEW_1);
assertOptimizedQuery(baseQuerySql, expectedRewrittenSql, originalViewSql, BASE_TABLE_1, VIEW_1);

This would confirm aggregate expressions inside HAVING are rewritten consistently with SELECT/GROUP BY.

@tdcmeehan tdcmeehan changed the title Allow HAVING in materialized view query rewrite feat(analyzer): Allow HAVING in materialized view query rewrite Apr 28, 2026
Copy link
Copy Markdown
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tdcmeehan, the change looks good to me! Just some little nits about the tests.

"GROUP BY ds, shipmode HAVING shipmode = 'AIR' ORDER BY ds, shipmode",
table);

assertEquals(computeActual(session, baseQuery), computeActual(baseQuery));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be better to validate the rewritten plan here:

assertPlan(session, baseQuery, anyTree(constrainedTableScan(view, ImmutableMap.of())));

"HAVING SUM(discount * extendedprice) > 100 ORDER BY ds, shipmode",
table);

assertEquals(computeActual(session, baseQuery), computeActual(baseQuery));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the same as above:

assertPlan(session, baseQuery, anyTree(constrainedTableScan(view, ImmutableMap.of())));

"GROUP BY ds, shipmode HAVING COUNT(extendedprice) > 5 ORDER BY ds, shipmode",
table);

assertEquals(computeActual(session, baseQuery), computeActual(baseQuery));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the same as above:

assertPlan(session, baseQuery, anyTree(constrainedTableScan(view, ImmutableMap.of())));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants