Skip to content

fix(native): Prevent decimal precision loss in sidecar function registry and expression optimizer#27735

Draft
pramodsatya wants to merge 1 commit intoprestodb:masterfrom
pramodsatya:expr-opt-decimal-fix
Draft

fix(native): Prevent decimal precision loss in sidecar function registry and expression optimizer#27735
pramodsatya wants to merge 1 commit intoprestodb:masterfrom
pramodsatya:expr-opt-decimal-fix

Conversation

@pramodsatya
Copy link
Copy Markdown
Contributor

@pramodsatya pramodsatya commented May 7, 2026

Problem

Resolves #27749.

Two related bugs caused decimal precision loss and flaky test failures in testCoercions when the native sidecar is enabled:

1. Wrong concat overload bound at planning time

NativeSidecarFunctionRegistryTool exposes all native worker function signatures to the Presto coordinator, including concat overloads. The native concat contains a signature that binds (array(decimal), decimal) operands as array(double), which wins overload resolution over Presto's built-in decimal-aware concat. The query's output type is therefore resolved as array(double) at planning time — precision is lost before execution even begins.

2. Lossy decimal serialization during native constant folding

NativeExpressionOptimizer can elect to constant-fold sub-expressions by shipping them to the native sidecar. The sidecar's wire protocol serializes DECIMAL constants as floating-point, so any folded decimal value comes back as a DOUBLE. The precision loss is silent and only affects expressions that are constant-foldable, making it intermittent.

Together these produced flaky failures: tests without a live sidecar always passed; tests with sidecar enabled failed on the decimal-array coercion queries.

Solution

NativeSidecarFunctionRegistryTool — filter out all native concat signatures before exposing them to the coordinator. Presto's built-in concat overloads remain the sole candidates for overload resolution, so decimal array/element concat always resolves to the type-preserving implementation.

NativeExpressionOptimizer — add a containsDecimalType guard in visitCall: if the CallExpression or any child expression has a DECIMAL type anywhere in its type tree, immediately mark it non-constant-foldable (visitNode(node, false)). The expression stays in Presto's own evaluator, which handles decimal materialization correctly.

Testing

TestTpchDistributedQueries#testCoercions now passes consistently with sidecar enabled. No new tests added — the existing coercions test is the regression coverage.

Summary by Sourcery

Prevent decimal precision loss when using the native sidecar by avoiding lossy concat overloads and disabling native constant folding for decimal expressions.

Bug Fixes:

  • Ensure decimal-containing expressions are not constant-folded via the native sidecar to avoid floating-point serialization precision loss.
  • Exclude native concat function signatures from the sidecar function registry so Presto's decimal-preserving concat remains authoritative for overload resolution.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 7, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented May 7, 2026

Reviewer's Guide

Prevents decimal precision loss when the native sidecar is used by (1) disabling native concat overload exposure so Presto’s built-in decimal-safe concat remains authoritative, and (2) disabling native constant folding for any expressions whose type tree contains DECIMAL so they are evaluated by Presto instead of being serialized as lossy doubles.

Sequence diagram for decimal-aware native expression optimization

sequenceDiagram
    participant Planner
    participant NativeExpressionOptimizer
    participant CollectingVisitor
    participant PrestoEvaluator
    participant NativeSidecar

    Planner->>NativeExpressionOptimizer: optimize(RowExpression root)
    NativeExpressionOptimizer->>CollectingVisitor: visitCall(CallExpression node, context)
    CollectingVisitor->>CollectingVisitor: canBeOptimized(node)
    CollectingVisitor->>CollectingVisitor: containsDecimalType(node)
    alt expression contains DECIMAL
        CollectingVisitor->>CollectingVisitor: visitNode(node, false)
        Note right of PrestoEvaluator: Expression remains in Presto
        NativeExpressionOptimizer->>PrestoEvaluator: evaluate(node) with full DECIMAL precision
    else expression does not contain DECIMAL
        CollectingVisitor->>CollectingVisitor: visitNode(node, true)
        NativeExpressionOptimizer->>NativeSidecar: send expression for constant folding
        NativeSidecar-->>NativeExpressionOptimizer: folded constants (non-decimal)
        NativeExpressionOptimizer->>Planner: return optimized RowExpression
    end
Loading

Class diagram for NativeExpressionOptimizer decimal guard changes

classDiagram
    class NativeExpressionOptimizer {
        - CollectingVisitor collectingVisitor
        + Void visitCall(CallExpression node, Object context)
        + List~RowExpression~ getExpressionsToOptimize()
        + boolean canBeOptimized(RowExpression expression)
    }

    class CollectingVisitor {
        - List~RowExpression~ expressionsToOptimize
        + Void visitCall(CallExpression node, Object context)
        + Void visitNode(RowExpression node, boolean canBeOptimized)
        + List~RowExpression~ getExpressionsToOptimize()
        + static boolean containsDecimalType(RowExpression expression)
        + static boolean containsDecimalType(Type type)
    }

    class RowExpression {
        + Type getType()
        + List~RowExpression~ getChildren()
    }

    class CallExpression {
        + List~RowExpression~ getArguments()
        + boolean isConstant()
    }

    class Type {
        + TypeSignature getTypeSignature()
        + List~Type~ getTypeParameters()
    }

    class TypeSignature {
        + String getBase()
    }

    NativeExpressionOptimizer *-- CollectingVisitor : uses
    CollectingVisitor ..> RowExpression : traverses
    CollectingVisitor ..> CallExpression : visits
    CollectingVisitor ..> Type : inspects
    Type ..> TypeSignature : owns
Loading

File-Level Changes

Change Details Files
Avoid native-side decimal serialization by skipping native constant folding for expressions whose type trees contain DECIMAL.
  • Add a containsDecimalType guard in visitCall to detect any DECIMAL types in the call expression or its children.
  • When a DECIMAL is present, mark the call as non-optimizable via visitNode(node, false) and return early instead of enqueueing it for sidecar optimization.
  • Introduce static helper methods containsDecimalType(RowExpression) and containsDecimalType(Type) that recursively inspect type signatures and parameters for DECIMAL.
presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java
Ensure Presto’s built-in concat remains the only available concat for overload resolution by filtering out native concat signatures from the sidecar function registry.
  • Filter the worker function UDF signature stream to exclude entries whose function name equalsIgnoreCase("concat").
  • Document in comments that native concat signatures can bind decimal array/element concat as array(double), leading to type changes, and thus must be hidden from the coordinator.
presto-built-in-worker-function-tools/src/main/java/com/facebook/presto/builtin/tools/NativeSidecarFunctionRegistryTool.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native-sidecar]: Native sidecar silently coerces decimal array concatenation to array(double)

2 participants