Skip to content

refactor(planner): Extract shared expression rewriter for MV query optimizer#27732

Open
ceekay47 wants to merge 1 commit intoprestodb:masterfrom
ceekay47:export-D103812394
Open

refactor(planner): Extract shared expression rewriter for MV query optimizer#27732
ceekay47 wants to merge 1 commit intoprestodb:masterfrom
ceekay47:export-D103812394

Conversation

@ceekay47
Copy link
Copy Markdown
Contributor

@ceekay47 ceekay47 commented May 6, 2026

Summary:
Extracts a new MaterializedViewExpressionRewriter from MaterializedViewQueryOptimizer that handles all expression-level MV rewriting (column resolution, function call handling, GROUP BY/ORDER BY rewriting). Parameterized by Optional tablePrefix and mvPrefix so the same code path serves both single-table rewriting (no prefix) and per-table rewriting inside a JOIN (with prefix).

This enables both the existing single-table QuerySpecificationRewriter and the upcoming JOIN-aware rewriter to share the same expression rewriting logic. New query shape support (e.g., new aggregate functions) only needs to be added once.

Changes:

  • New MaterializedViewExpressionRewriter with rewriteExpression (via ExpressionTreeRewriter), rewriteSingleColumn, rewriteGroupBy (with ordinal resolution), rewriteOrderBy
  • Function call handling: associative (SUM/MIN/MAX/COUNT to SUM), non-associative (AVG/APPROX_DISTINCT), table-less aggregates (COUNT(*)), COUNT DISTINCT rejection
  • Table reference helpers: belongsToRewrittenTable, referencesRewrittenTable, referencesOtherTable, stripPrefix
  • MaterializedViewUtils: added rewriteAssociativeFunction, rewriteGroupingElement shared helpers
  • MaterializedViewQueryOptimizer: single-table path delegates to shared rewriter; behavior unchanged

Differential Revision: D103812394

Summary by Sourcery

Extract a shared materialized-view expression rewriter and adopt it in the query optimizer for expression, GROUP BY, and ORDER BY rewriting.

Enhancements:

  • Introduce MaterializedViewExpressionRewriter to centralize column resolution, aggregate function rewriting, and GROUP BY/ORDER BY handling for materialized view optimizations, supporting both single-table and join-based rewrites.
  • Factor associative aggregate rewriting logic into reusable helpers in MaterializedViewUtils, including COUNT-to-SUM translation.
== NO RELEASE NOTE ==

…timizer

Summary:
Extracts a new MaterializedViewExpressionRewriter from MaterializedViewQueryOptimizer that handles all expression-level MV rewriting (column resolution, function call handling, GROUP BY/ORDER BY rewriting). Parameterized by Optional<Identifier> tablePrefix and mvPrefix so the same code path serves both single-table rewriting (no prefix) and per-table rewriting inside a JOIN (with prefix).

This enables both the existing single-table QuerySpecificationRewriter and the upcoming JOIN-aware rewriter to share the same expression rewriting logic. New query shape support (e.g., new aggregate functions) only needs to be added once.

Changes:
- New MaterializedViewExpressionRewriter with rewriteExpression (via ExpressionTreeRewriter), rewriteSingleColumn, rewriteGroupBy (with ordinal resolution), rewriteOrderBy
- Function call handling: associative (SUM/MIN/MAX/COUNT to SUM), non-associative (AVG/APPROX_DISTINCT), table-less aggregates (COUNT(*)), COUNT DISTINCT rejection
- Table reference helpers: belongsToRewrittenTable, referencesRewrittenTable, referencesOtherTable, stripPrefix
- MaterializedViewUtils: added rewriteAssociativeFunction, rewriteGroupingElement shared helpers
- MaterializedViewQueryOptimizer: single-table path delegates to shared rewriter; behavior unchanged

Differential Revision: D103812394
@ceekay47 ceekay47 requested review from a team, feilong-liu and jaystarshot as code owners May 6, 2026 23:51
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label May 6, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 6, 2026

Reviewer's Guide

Refactors materialized view query optimization by extracting a reusable MaterializedViewExpressionRewriter that centralizes expression, function-call, GROUP BY, and ORDER BY rewriting for both single-table and future JOIN-based MV rewrites, and by moving generic associative/non-associative aggregate helpers into MaterializedViewUtils while keeping existing single-table behavior unchanged.

Sequence diagram for function call rewriting in JOIN-aware MV path

sequenceDiagram
    participant Optimizer as MaterializedViewQueryOptimizer
    participant Rewriter as MaterializedViewExpressionRewriter
    participant ExprRewriter as ExpressionTreeRewriter
    participant MVInfo as MaterializedViewInfo
    participant MVUtils as MaterializedViewUtils

    Optimizer->>Rewriter: rewriteExpression(FunctionCall f)
    Rewriter->>ExprRewriter: rewriteWith(ExpressionRewriter, f)
    ExprRewriter->>Rewriter: rewriteFunctionCall(f)
    alt tablePrefix present (JOIN mode)
        Rewriter->>Rewriter: rewriteFunctionCallJoinMode(f)
        Rewriter->>Rewriter: referencesRewrittenTable(f)
        Rewriter->>Rewriter: referencesOtherTable(f)
        alt aggregate only on rewritten table
            Rewriter->>Rewriter: stripPrefixFromFunctionCall(f)
            Rewriter->>MVInfo: getBaseToViewColumnMap()
            alt non associative function
                Rewriter->>MVUtils: rewriteNonAssociativeFunction(stripped, baseToViewColumnMap)
                MVUtils-->>Rewriter: rewrittenExpr
                Rewriter->>Rewriter: addMvPrefixToExpression(rewrittenExpr)
                Rewriter-->>ExprRewriter: rewrittenExprWithPrefix
            else associative function
                Rewriter->>MVInfo: getBaseToViewColumnMap()
                alt call in baseToViewColumnMap
                    Rewriter->>MVUtils: rewriteAssociativeFunction(f, mvColumn)
                    MVUtils-->>Rewriter: rewrittenCall
                    Rewriter-->>ExprRewriter: rewrittenCall
                else not precomputed in MV
                    Rewriter->>Rewriter: rewriteExpression(args)
                    Rewriter-->>ExprRewriter: new FunctionCall(argsRewritten)
                end
            end
        else mixed or other table references
            Rewriter-->>ExprRewriter: either original f or error
        end
    else single table mode
        Rewriter->>Rewriter: rewriteFunctionCallSingleTableMode(f)
        Rewriter-->>ExprRewriter: rewrittenCallSingleTable
    end

    ExprRewriter-->>Optimizer: rewritten FunctionCall
Loading

Class diagram for MaterializedViewExpressionRewriter and related utilities

classDiagram
    class MaterializedViewExpressionRewriter {
        - Optional~Identifier~ tablePrefix
        - Optional~Identifier~ mvPrefix
        - MaterializedViewInfo mvInfo
        + MaterializedViewExpressionRewriter(Optional~Identifier~ tablePrefix, Optional~Identifier~ mvPrefix, MaterializedViewInfo mvInfo)
        + Expression rewriteExpression(Expression expression)
        + SingleColumn rewriteSingleColumn(SingleColumn node)
        + GroupBy rewriteGroupBy(GroupBy groupBy, List~SelectItem~ selectItems)
        + static GroupingElement rewriteGroupingElement(GroupingElement element, Function~Expression,Expression~ rewriter)
        + OrderBy rewriteOrderBy(OrderBy orderBy)
        + boolean belongsToRewrittenTable(Expression expression)
        + boolean referencesRewrittenTable(Expression expression)
        + boolean referencesOtherTable(Expression expression)
        + Expression stripPrefix(Expression expression)
        -- internal helpers --
        - Expression resolveColumn(Expression lookup)
        - Expression wrapResult(Identifier mapped)
        - Expression addMvPrefixToExpression(Expression expression)
        - Expression rewriteFunctionCall(FunctionCall node)
        - Expression rewriteFunctionCallSingleTableMode(FunctionCall node)
        - Expression rewriteFunctionCallJoinMode(FunctionCall node)
        - Expression rewriteTablelessAggregate(FunctionCall node)
        - FunctionCall stripPrefixFromFunctionCall(FunctionCall functionCall)
        - Expression rewriteGroupByExpression(Expression expression, List~SelectItem~ selectItems)
    }

    class MaterializedViewQueryOptimizer {
        - MaterializedViewInfo materializedViewInfo
        + Node visitGroupBy(GroupBy node, Void context)
        + Node visitOrderBy(OrderBy node, Void context)
        + Node visitFunctionCall(FunctionCall node, Void context)
        -- uses shared expression rewriter --
        - MaterializedViewExpressionRewriter expressionRewriter
    }

    class MaterializedViewUtils {
        <<utility>>
        + static FunctionCall rewriteCountAsSum(FunctionCall countCall, Expression derivedColumnExpression)
        + static FunctionCall rewriteAssociativeFunction(FunctionCall node, Expression derivedColumnExpression)
        + static FunctionCall rewriteNonAssociativeFunction(FunctionCall node, Map~Expression,Identifier~ baseToViewColumnMap)
        + static Map~QualifiedName,Void~ ASSOCIATIVE_REWRITE_FUNCTIONS
        + static Map~QualifiedName,Void~ NON_ASSOCIATIVE_REWRITE_FUNCTIONS
    }

    class MaterializedViewInfo {
        + Map~Expression,Identifier~ getBaseToViewColumnMap()
        + Optional~GroupBy~ getGroupBy()
    }

    MaterializedViewQueryOptimizer --> MaterializedViewExpressionRewriter : creates_uses
    MaterializedViewQueryOptimizer --> MaterializedViewUtils : uses
    MaterializedViewExpressionRewriter --> MaterializedViewInfo : uses
    MaterializedViewExpressionRewriter --> MaterializedViewUtils : uses
Loading

File-Level Changes

Change Details Files
Introduce shared MaterializedViewExpressionRewriter to handle expression-level MV rewriting for both single-table and JOIN-aware optimizers.
  • Add MaterializedViewExpressionRewriter with rewriteExpression, rewriteSingleColumn, rewriteGroupBy (including GROUP BY ordinal resolution), and rewriteOrderBy entrypoints.
  • Implement column resolution and table-prefix handling via resolveColumn, wrapResult, belongsToRewrittenTable, referencesRewrittenTable, referencesOtherTable, stripPrefix, and addMvPrefixToExpression helpers.
  • Split function-call rewriting into single-table and join modes (rewriteFunctionCallSingleTableMode, rewriteFunctionCallJoinMode, rewriteTablelessAggregate, stripPrefixFromFunctionCall) with appropriate semantic checks for unsupported aggregates and COUNT(DISTINCT).
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewExpressionRewriter.java
Centralize aggregate function rewriting logic for MV optimization into MaterializedViewUtils.
  • Add rewriteCountAsSum helper and rewriteAssociativeFunction that dispatches COUNT to rewriteCountAsSum and otherwise rewrites associative aggregates to operate on the MV-derived column.
  • Expose shared helpers so both the existing optimizer and the new expression rewriter can use the same aggregate rewriting behavior.
presto-main-base/src/main/java/com/facebook/presto/sql/MaterializedViewUtils.java
Update MaterializedViewQueryOptimizer to delegate expression, function, GROUP BY, and ORDER BY rewriting to the shared utilities while preserving single-table semantics.
  • Simplify visitFunctionCall to use MaterializedViewUtils.rewriteAssociativeFunction for associative aggregates and enforce COUNT(DISTINCT) rejection and precomputed-aggregate checks.
  • Refactor GROUP BY and ORDER BY visitors to use MaterializedViewExpressionRewriter.rewriteGroupingElement and direct SortItem reconstruction instead of custom node visitors and helper methods.
  • Remove inlined COUNT-as-SUM implementation and per-node GROUP BY visitors, relying instead on the new shared rewriter and utility helpers.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewQueryOptimizer.java

Possibly linked issues

  • #RFC-0016-tracking: PR delivers the planner-oriented refactor of materialized view query optimization logic that RFC-0016 tracks.

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • In MaterializedViewExpressionRewriter.rewriteGroupByExpression, when resolving a GROUP BY ordinal you compute resolved from the corresponding SelectItem but then call rewriteExpression(expression) instead of rewriteExpression(resolved), which means the ordinal resolution result is ignored and likely produces incorrect GROUP BY behavior.
  • The new rewriteCountAsSum helper in MaterializedViewUtils blindly propagates countCall.isDistinct() into the SUM call; since COUNT(DISTINCT) is rejected at higher levels, consider asserting or validating non-distinct here to avoid accidental misuse from future call sites.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `MaterializedViewExpressionRewriter.rewriteGroupByExpression`, when resolving a GROUP BY ordinal you compute `resolved` from the corresponding `SelectItem` but then call `rewriteExpression(expression)` instead of `rewriteExpression(resolved)`, which means the ordinal resolution result is ignored and likely produces incorrect GROUP BY behavior.
- The new `rewriteCountAsSum` helper in `MaterializedViewUtils` blindly propagates `countCall.isDistinct()` into the SUM call; since COUNT(DISTINCT) is rejected at higher levels, consider asserting or validating non-distinct here to avoid accidental misuse from future call sites.

## Individual Comments

### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewExpressionRewriter.java" line_range="342-351" />
<code_context>
+
+    // --- GROUP BY ordinal resolution ---
+
+    private Expression rewriteGroupByExpression(Expression expression, List<SelectItem> selectItems)
+    {
+        if (expression instanceof LongLiteral) {
+            int ordinal = toIntExact(((LongLiteral) expression).getValue());
+            if (ordinal >= 1 && ordinal <= selectItems.size()) {
+                SelectItem selectItem = selectItems.get(ordinal - 1);
+                if (selectItem instanceof SingleColumn) {
+                    Expression resolved = ((SingleColumn) selectItem).getExpression();
+                    if (tablePrefix.isPresent()) {
+                        resolved = stripPrefix(resolved);
+                    }
+                    return rewriteExpression(expression);
+                }
+            }
</code_context>
<issue_to_address>
**issue (bug_risk):** The GROUP BY ordinal rewrite ignores the resolved select-item expression and instead rewrites the original ordinal expression.

In `rewriteGroupByExpression`, when `expression` is a `LongLiteral` that resolves to a `SingleColumn`, you compute `resolved` (with optional `stripPrefix`) but then return `rewriteExpression(expression)` instead of using `resolved`. This discards the ordinal resolution and just rewrites the numeric literal. You likely want to return `rewriteExpression(resolved)` (or `resolved` directly, if no further rewrite is needed) so the GROUP BY uses the resolved select expression rather than the literal index.
</issue_to_address>

### Comment 2
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/MaterializedViewUtils.java" line_range="406-415" />
<code_context>
         }
     }

+    public static FunctionCall rewriteCountAsSum(FunctionCall countCall, Expression derivedColumnExpression)
+    {
+        return new FunctionCall(
+                SUM,
+                countCall.getWindow(),
+                countCall.getFilter(),
+                countCall.getOrderBy(),
+                countCall.isDistinct(),
+                countCall.isIgnoreNulls(),
+                ImmutableList.of(derivedColumnExpression));
+    }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** rewriteCountAsSum no longer validates that the function is COUNT and non-DISTINCT, which makes misuse easier and may allow rewriting COUNT(DISTINCT).

Previously, the inlined `rewriteCountAsSum` logic validated the function name was `COUNT` and rejected `COUNT(DISTINCT)` before building the `SUM`. The new helper simply reuses `countCall` (including `isDistinct()`) without any checks. While some callers already gate on `isDistinct()`, `rewriteTablelessAggregate` calls `rewriteAssociativeFunction` directly and could end up rewriting `COUNT(DISTINCT)` for tableless aggregates. Please either reintroduce validation inside `rewriteCountAsSum` (ensure name is `COUNT` and `!isDistinct()`) or add equivalent guards at all call sites, including the tableless path, so the behavior remains safe and consistent with the original implementation.
</issue_to_address>

### Comment 3
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewExpressionRewriter.java" line_range="211-216" />
<code_context>
+
+    // --- Function call rewriting ---
+
+    private Expression rewriteFunctionCall(FunctionCall node)
+    {
+        if (tablePrefix.isPresent()) {
+            return rewriteFunctionCallJoinMode(node);
+        }
+        return rewriteFunctionCallSingleTableMode(node);
+    }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** All function calls are routed through aggregate-focused rewrite logic, which will throw for scalar functions not in the rewrite maps.

Previously, scalar functions (e.g., CONCAT, JSON_EXTRACT, ABS) followed a separate path: their arguments were recursively rewritten and the original function was preserved. With the new unified `rewriteFunctionCall``rewriteFunctionCallSingleTableMode` flow, any function not in `NON_ASSOCIATIVE_REWRITE_FUNCTIONS` or `ASSOCIATIVE_REWRITE_FUNCTIONS` now causes a `SemanticException`, so valid scalar expressions start failing when MVs are involved. Please reintroduce the scalar-function behavior: for functions missing from both rewrite maps, rewrite their arguments and return a `FunctionCall` with the same name instead of throwing, so normal scalar projections/filters remain supported during MV rewrite.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +342 to +351
private Expression rewriteGroupByExpression(Expression expression, List<SelectItem> selectItems)
{
if (expression instanceof LongLiteral) {
int ordinal = toIntExact(((LongLiteral) expression).getValue());
if (ordinal >= 1 && ordinal <= selectItems.size()) {
SelectItem selectItem = selectItems.get(ordinal - 1);
if (selectItem instanceof SingleColumn) {
Expression resolved = ((SingleColumn) selectItem).getExpression();
if (tablePrefix.isPresent()) {
resolved = stripPrefix(resolved);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): The GROUP BY ordinal rewrite ignores the resolved select-item expression and instead rewrites the original ordinal expression.

In rewriteGroupByExpression, when expression is a LongLiteral that resolves to a SingleColumn, you compute resolved (with optional stripPrefix) but then return rewriteExpression(expression) instead of using resolved. This discards the ordinal resolution and just rewrites the numeric literal. You likely want to return rewriteExpression(resolved) (or resolved directly, if no further rewrite is needed) so the GROUP BY uses the resolved select expression rather than the literal index.

Comment on lines +406 to +415
public static FunctionCall rewriteCountAsSum(FunctionCall countCall, Expression derivedColumnExpression)
{
return new FunctionCall(
SUM,
countCall.getWindow(),
countCall.getFilter(),
countCall.getOrderBy(),
countCall.isDistinct(),
countCall.isIgnoreNulls(),
ImmutableList.of(derivedColumnExpression));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): rewriteCountAsSum no longer validates that the function is COUNT and non-DISTINCT, which makes misuse easier and may allow rewriting COUNT(DISTINCT).

Previously, the inlined rewriteCountAsSum logic validated the function name was COUNT and rejected COUNT(DISTINCT) before building the SUM. The new helper simply reuses countCall (including isDistinct()) without any checks. While some callers already gate on isDistinct(), rewriteTablelessAggregate calls rewriteAssociativeFunction directly and could end up rewriting COUNT(DISTINCT) for tableless aggregates. Please either reintroduce validation inside rewriteCountAsSum (ensure name is COUNT and !isDistinct()) or add equivalent guards at all call sites, including the tableless path, so the behavior remains safe and consistent with the original implementation.

Comment on lines +211 to +216
private Expression rewriteFunctionCall(FunctionCall node)
{
if (tablePrefix.isPresent()) {
return rewriteFunctionCallJoinMode(node);
}
return rewriteFunctionCallSingleTableMode(node);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): All function calls are routed through aggregate-focused rewrite logic, which will throw for scalar functions not in the rewrite maps.

Previously, scalar functions (e.g., CONCAT, JSON_EXTRACT, ABS) followed a separate path: their arguments were recursively rewritten and the original function was preserved. With the new unified rewriteFunctionCallrewriteFunctionCallSingleTableMode flow, any function not in NON_ASSOCIATIVE_REWRITE_FUNCTIONS or ASSOCIATIVE_REWRITE_FUNCTIONS now causes a SemanticException, so valid scalar expressions start failing when MVs are involved. Please reintroduce the scalar-function behavior: for functions missing from both rewrite maps, rewrite their arguments and return a FunctionCall with the same name instead of throwing, so normal scalar projections/filters remain supported during MV rewrite.

Copy link
Copy Markdown
Contributor

@jja725 jja725 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor LGTM but leave one concern

if (tablePrefix.isPresent()) {
resolved = stripPrefix(resolved);
}
return rewriteExpression(expression);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we return rewriteExpression(resolved)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants