refactor(planner): Extract shared expression rewriter for MV query optimizer#27732
refactor(planner): Extract shared expression rewriter for MV query optimizer#27732ceekay47 wants to merge 1 commit intoprestodb:masterfrom
Conversation
…timizer Summary: Extracts a new MaterializedViewExpressionRewriter from MaterializedViewQueryOptimizer that handles all expression-level MV rewriting (column resolution, function call handling, GROUP BY/ORDER BY rewriting). Parameterized by Optional<Identifier> tablePrefix and mvPrefix so the same code path serves both single-table rewriting (no prefix) and per-table rewriting inside a JOIN (with prefix). This enables both the existing single-table QuerySpecificationRewriter and the upcoming JOIN-aware rewriter to share the same expression rewriting logic. New query shape support (e.g., new aggregate functions) only needs to be added once. Changes: - New MaterializedViewExpressionRewriter with rewriteExpression (via ExpressionTreeRewriter), rewriteSingleColumn, rewriteGroupBy (with ordinal resolution), rewriteOrderBy - Function call handling: associative (SUM/MIN/MAX/COUNT to SUM), non-associative (AVG/APPROX_DISTINCT), table-less aggregates (COUNT(*)), COUNT DISTINCT rejection - Table reference helpers: belongsToRewrittenTable, referencesRewrittenTable, referencesOtherTable, stripPrefix - MaterializedViewUtils: added rewriteAssociativeFunction, rewriteGroupingElement shared helpers - MaterializedViewQueryOptimizer: single-table path delegates to shared rewriter; behavior unchanged Differential Revision: D103812394
Reviewer's GuideRefactors materialized view query optimization by extracting a reusable MaterializedViewExpressionRewriter that centralizes expression, function-call, GROUP BY, and ORDER BY rewriting for both single-table and future JOIN-based MV rewrites, and by moving generic associative/non-associative aggregate helpers into MaterializedViewUtils while keeping existing single-table behavior unchanged. Sequence diagram for function call rewriting in JOIN-aware MV pathsequenceDiagram
participant Optimizer as MaterializedViewQueryOptimizer
participant Rewriter as MaterializedViewExpressionRewriter
participant ExprRewriter as ExpressionTreeRewriter
participant MVInfo as MaterializedViewInfo
participant MVUtils as MaterializedViewUtils
Optimizer->>Rewriter: rewriteExpression(FunctionCall f)
Rewriter->>ExprRewriter: rewriteWith(ExpressionRewriter, f)
ExprRewriter->>Rewriter: rewriteFunctionCall(f)
alt tablePrefix present (JOIN mode)
Rewriter->>Rewriter: rewriteFunctionCallJoinMode(f)
Rewriter->>Rewriter: referencesRewrittenTable(f)
Rewriter->>Rewriter: referencesOtherTable(f)
alt aggregate only on rewritten table
Rewriter->>Rewriter: stripPrefixFromFunctionCall(f)
Rewriter->>MVInfo: getBaseToViewColumnMap()
alt non associative function
Rewriter->>MVUtils: rewriteNonAssociativeFunction(stripped, baseToViewColumnMap)
MVUtils-->>Rewriter: rewrittenExpr
Rewriter->>Rewriter: addMvPrefixToExpression(rewrittenExpr)
Rewriter-->>ExprRewriter: rewrittenExprWithPrefix
else associative function
Rewriter->>MVInfo: getBaseToViewColumnMap()
alt call in baseToViewColumnMap
Rewriter->>MVUtils: rewriteAssociativeFunction(f, mvColumn)
MVUtils-->>Rewriter: rewrittenCall
Rewriter-->>ExprRewriter: rewrittenCall
else not precomputed in MV
Rewriter->>Rewriter: rewriteExpression(args)
Rewriter-->>ExprRewriter: new FunctionCall(argsRewritten)
end
end
else mixed or other table references
Rewriter-->>ExprRewriter: either original f or error
end
else single table mode
Rewriter->>Rewriter: rewriteFunctionCallSingleTableMode(f)
Rewriter-->>ExprRewriter: rewrittenCallSingleTable
end
ExprRewriter-->>Optimizer: rewritten FunctionCall
Class diagram for MaterializedViewExpressionRewriter and related utilitiesclassDiagram
class MaterializedViewExpressionRewriter {
- Optional~Identifier~ tablePrefix
- Optional~Identifier~ mvPrefix
- MaterializedViewInfo mvInfo
+ MaterializedViewExpressionRewriter(Optional~Identifier~ tablePrefix, Optional~Identifier~ mvPrefix, MaterializedViewInfo mvInfo)
+ Expression rewriteExpression(Expression expression)
+ SingleColumn rewriteSingleColumn(SingleColumn node)
+ GroupBy rewriteGroupBy(GroupBy groupBy, List~SelectItem~ selectItems)
+ static GroupingElement rewriteGroupingElement(GroupingElement element, Function~Expression,Expression~ rewriter)
+ OrderBy rewriteOrderBy(OrderBy orderBy)
+ boolean belongsToRewrittenTable(Expression expression)
+ boolean referencesRewrittenTable(Expression expression)
+ boolean referencesOtherTable(Expression expression)
+ Expression stripPrefix(Expression expression)
-- internal helpers --
- Expression resolveColumn(Expression lookup)
- Expression wrapResult(Identifier mapped)
- Expression addMvPrefixToExpression(Expression expression)
- Expression rewriteFunctionCall(FunctionCall node)
- Expression rewriteFunctionCallSingleTableMode(FunctionCall node)
- Expression rewriteFunctionCallJoinMode(FunctionCall node)
- Expression rewriteTablelessAggregate(FunctionCall node)
- FunctionCall stripPrefixFromFunctionCall(FunctionCall functionCall)
- Expression rewriteGroupByExpression(Expression expression, List~SelectItem~ selectItems)
}
class MaterializedViewQueryOptimizer {
- MaterializedViewInfo materializedViewInfo
+ Node visitGroupBy(GroupBy node, Void context)
+ Node visitOrderBy(OrderBy node, Void context)
+ Node visitFunctionCall(FunctionCall node, Void context)
-- uses shared expression rewriter --
- MaterializedViewExpressionRewriter expressionRewriter
}
class MaterializedViewUtils {
<<utility>>
+ static FunctionCall rewriteCountAsSum(FunctionCall countCall, Expression derivedColumnExpression)
+ static FunctionCall rewriteAssociativeFunction(FunctionCall node, Expression derivedColumnExpression)
+ static FunctionCall rewriteNonAssociativeFunction(FunctionCall node, Map~Expression,Identifier~ baseToViewColumnMap)
+ static Map~QualifiedName,Void~ ASSOCIATIVE_REWRITE_FUNCTIONS
+ static Map~QualifiedName,Void~ NON_ASSOCIATIVE_REWRITE_FUNCTIONS
}
class MaterializedViewInfo {
+ Map~Expression,Identifier~ getBaseToViewColumnMap()
+ Optional~GroupBy~ getGroupBy()
}
MaterializedViewQueryOptimizer --> MaterializedViewExpressionRewriter : creates_uses
MaterializedViewQueryOptimizer --> MaterializedViewUtils : uses
MaterializedViewExpressionRewriter --> MaterializedViewInfo : uses
MaterializedViewExpressionRewriter --> MaterializedViewUtils : uses
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
MaterializedViewExpressionRewriter.rewriteGroupByExpression, when resolving a GROUP BY ordinal you computeresolvedfrom the correspondingSelectItembut then callrewriteExpression(expression)instead ofrewriteExpression(resolved), which means the ordinal resolution result is ignored and likely produces incorrect GROUP BY behavior. - The new
rewriteCountAsSumhelper inMaterializedViewUtilsblindly propagatescountCall.isDistinct()into the SUM call; since COUNT(DISTINCT) is rejected at higher levels, consider asserting or validating non-distinct here to avoid accidental misuse from future call sites.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `MaterializedViewExpressionRewriter.rewriteGroupByExpression`, when resolving a GROUP BY ordinal you compute `resolved` from the corresponding `SelectItem` but then call `rewriteExpression(expression)` instead of `rewriteExpression(resolved)`, which means the ordinal resolution result is ignored and likely produces incorrect GROUP BY behavior.
- The new `rewriteCountAsSum` helper in `MaterializedViewUtils` blindly propagates `countCall.isDistinct()` into the SUM call; since COUNT(DISTINCT) is rejected at higher levels, consider asserting or validating non-distinct here to avoid accidental misuse from future call sites.
## Individual Comments
### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewExpressionRewriter.java" line_range="342-351" />
<code_context>
+
+ // --- GROUP BY ordinal resolution ---
+
+ private Expression rewriteGroupByExpression(Expression expression, List<SelectItem> selectItems)
+ {
+ if (expression instanceof LongLiteral) {
+ int ordinal = toIntExact(((LongLiteral) expression).getValue());
+ if (ordinal >= 1 && ordinal <= selectItems.size()) {
+ SelectItem selectItem = selectItems.get(ordinal - 1);
+ if (selectItem instanceof SingleColumn) {
+ Expression resolved = ((SingleColumn) selectItem).getExpression();
+ if (tablePrefix.isPresent()) {
+ resolved = stripPrefix(resolved);
+ }
+ return rewriteExpression(expression);
+ }
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** The GROUP BY ordinal rewrite ignores the resolved select-item expression and instead rewrites the original ordinal expression.
In `rewriteGroupByExpression`, when `expression` is a `LongLiteral` that resolves to a `SingleColumn`, you compute `resolved` (with optional `stripPrefix`) but then return `rewriteExpression(expression)` instead of using `resolved`. This discards the ordinal resolution and just rewrites the numeric literal. You likely want to return `rewriteExpression(resolved)` (or `resolved` directly, if no further rewrite is needed) so the GROUP BY uses the resolved select expression rather than the literal index.
</issue_to_address>
### Comment 2
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/MaterializedViewUtils.java" line_range="406-415" />
<code_context>
}
}
+ public static FunctionCall rewriteCountAsSum(FunctionCall countCall, Expression derivedColumnExpression)
+ {
+ return new FunctionCall(
+ SUM,
+ countCall.getWindow(),
+ countCall.getFilter(),
+ countCall.getOrderBy(),
+ countCall.isDistinct(),
+ countCall.isIgnoreNulls(),
+ ImmutableList.of(derivedColumnExpression));
+ }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** rewriteCountAsSum no longer validates that the function is COUNT and non-DISTINCT, which makes misuse easier and may allow rewriting COUNT(DISTINCT).
Previously, the inlined `rewriteCountAsSum` logic validated the function name was `COUNT` and rejected `COUNT(DISTINCT)` before building the `SUM`. The new helper simply reuses `countCall` (including `isDistinct()`) without any checks. While some callers already gate on `isDistinct()`, `rewriteTablelessAggregate` calls `rewriteAssociativeFunction` directly and could end up rewriting `COUNT(DISTINCT)` for tableless aggregates. Please either reintroduce validation inside `rewriteCountAsSum` (ensure name is `COUNT` and `!isDistinct()`) or add equivalent guards at all call sites, including the tableless path, so the behavior remains safe and consistent with the original implementation.
</issue_to_address>
### Comment 3
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/MaterializedViewExpressionRewriter.java" line_range="211-216" />
<code_context>
+
+ // --- Function call rewriting ---
+
+ private Expression rewriteFunctionCall(FunctionCall node)
+ {
+ if (tablePrefix.isPresent()) {
+ return rewriteFunctionCallJoinMode(node);
+ }
+ return rewriteFunctionCallSingleTableMode(node);
+ }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** All function calls are routed through aggregate-focused rewrite logic, which will throw for scalar functions not in the rewrite maps.
Previously, scalar functions (e.g., CONCAT, JSON_EXTRACT, ABS) followed a separate path: their arguments were recursively rewritten and the original function was preserved. With the new unified `rewriteFunctionCall` → `rewriteFunctionCallSingleTableMode` flow, any function not in `NON_ASSOCIATIVE_REWRITE_FUNCTIONS` or `ASSOCIATIVE_REWRITE_FUNCTIONS` now causes a `SemanticException`, so valid scalar expressions start failing when MVs are involved. Please reintroduce the scalar-function behavior: for functions missing from both rewrite maps, rewrite their arguments and return a `FunctionCall` with the same name instead of throwing, so normal scalar projections/filters remain supported during MV rewrite.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| private Expression rewriteGroupByExpression(Expression expression, List<SelectItem> selectItems) | ||
| { | ||
| if (expression instanceof LongLiteral) { | ||
| int ordinal = toIntExact(((LongLiteral) expression).getValue()); | ||
| if (ordinal >= 1 && ordinal <= selectItems.size()) { | ||
| SelectItem selectItem = selectItems.get(ordinal - 1); | ||
| if (selectItem instanceof SingleColumn) { | ||
| Expression resolved = ((SingleColumn) selectItem).getExpression(); | ||
| if (tablePrefix.isPresent()) { | ||
| resolved = stripPrefix(resolved); |
There was a problem hiding this comment.
issue (bug_risk): The GROUP BY ordinal rewrite ignores the resolved select-item expression and instead rewrites the original ordinal expression.
In rewriteGroupByExpression, when expression is a LongLiteral that resolves to a SingleColumn, you compute resolved (with optional stripPrefix) but then return rewriteExpression(expression) instead of using resolved. This discards the ordinal resolution and just rewrites the numeric literal. You likely want to return rewriteExpression(resolved) (or resolved directly, if no further rewrite is needed) so the GROUP BY uses the resolved select expression rather than the literal index.
| public static FunctionCall rewriteCountAsSum(FunctionCall countCall, Expression derivedColumnExpression) | ||
| { | ||
| return new FunctionCall( | ||
| SUM, | ||
| countCall.getWindow(), | ||
| countCall.getFilter(), | ||
| countCall.getOrderBy(), | ||
| countCall.isDistinct(), | ||
| countCall.isIgnoreNulls(), | ||
| ImmutableList.of(derivedColumnExpression)); |
There was a problem hiding this comment.
issue (bug_risk): rewriteCountAsSum no longer validates that the function is COUNT and non-DISTINCT, which makes misuse easier and may allow rewriting COUNT(DISTINCT).
Previously, the inlined rewriteCountAsSum logic validated the function name was COUNT and rejected COUNT(DISTINCT) before building the SUM. The new helper simply reuses countCall (including isDistinct()) without any checks. While some callers already gate on isDistinct(), rewriteTablelessAggregate calls rewriteAssociativeFunction directly and could end up rewriting COUNT(DISTINCT) for tableless aggregates. Please either reintroduce validation inside rewriteCountAsSum (ensure name is COUNT and !isDistinct()) or add equivalent guards at all call sites, including the tableless path, so the behavior remains safe and consistent with the original implementation.
| private Expression rewriteFunctionCall(FunctionCall node) | ||
| { | ||
| if (tablePrefix.isPresent()) { | ||
| return rewriteFunctionCallJoinMode(node); | ||
| } | ||
| return rewriteFunctionCallSingleTableMode(node); |
There was a problem hiding this comment.
issue (bug_risk): All function calls are routed through aggregate-focused rewrite logic, which will throw for scalar functions not in the rewrite maps.
Previously, scalar functions (e.g., CONCAT, JSON_EXTRACT, ABS) followed a separate path: their arguments were recursively rewritten and the original function was preserved. With the new unified rewriteFunctionCall → rewriteFunctionCallSingleTableMode flow, any function not in NON_ASSOCIATIVE_REWRITE_FUNCTIONS or ASSOCIATIVE_REWRITE_FUNCTIONS now causes a SemanticException, so valid scalar expressions start failing when MVs are involved. Please reintroduce the scalar-function behavior: for functions missing from both rewrite maps, rewrite their arguments and return a FunctionCall with the same name instead of throwing, so normal scalar projections/filters remain supported during MV rewrite.
jja725
left a comment
There was a problem hiding this comment.
refactor LGTM but leave one concern
| if (tablePrefix.isPresent()) { | ||
| resolved = stripPrefix(resolved); | ||
| } | ||
| return rewriteExpression(expression); |
There was a problem hiding this comment.
should we return rewriteExpression(resolved)?
Summary:
Extracts a new MaterializedViewExpressionRewriter from MaterializedViewQueryOptimizer that handles all expression-level MV rewriting (column resolution, function call handling, GROUP BY/ORDER BY rewriting). Parameterized by Optional tablePrefix and mvPrefix so the same code path serves both single-table rewriting (no prefix) and per-table rewriting inside a JOIN (with prefix).
This enables both the existing single-table QuerySpecificationRewriter and the upcoming JOIN-aware rewriter to share the same expression rewriting logic. New query shape support (e.g., new aggregate functions) only needs to be added once.
Changes:
Differential Revision: D103812394
Summary by Sourcery
Extract a shared materialized-view expression rewriter and adopt it in the query optimizer for expression, GROUP BY, and ORDER BY rewriting.
Enhancements: