Skip to content

feat(native): Add native_min_shuffle_compression_page_size_bytes session property (#27683)#27683

Merged
hdikeman merged 1 commit into
prestodb:masterfrom
hdikeman:export-D100420473
May 8, 2026
Merged

feat(native): Add native_min_shuffle_compression_page_size_bytes session property (#27683)#27683
hdikeman merged 1 commit into
prestodb:masterfrom
hdikeman:export-D100420473

Conversation

@hdikeman
Copy link
Copy Markdown
Contributor

@hdikeman hdikeman commented Apr 28, 2026

Summary:

Wires the new Velox min_shuffle_compression_page_size_bytes property through to a Presto-native session property and the BroadcastWrite operator so users can tune the small-page shuffle-compression skip threshold per query.

Adds:

  • native_min_shuffle_compression_page_size_bytes Java session property in NativeWorkerSessionPropertyProvider, grouped with the other PartitionedOutput / output-buffer properties.
  • Matching constant and addSessionProperty() registration in the Prestissimo SessionProperties (placed near kPartitionedOutputEagerFlush, alongside the other shuffle/output-related session properties).
  • Mapping entry in SessionPropertiesTest::validateMapping to keep the Java↔Velox name correspondence asserted.
  • BroadcastWrite operator: passes the new threshold through to getVectorSerdeOptions so broadcast shuffle pages also honor the skip behavior.

Default value is 0 (disabled), preserving existing behavior unless the user opts in.

Differential Revision: D100420473

@hdikeman hdikeman requested review from a team as code owners April 28, 2026 23:11
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Apr 28, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 28, 2026

Reviewer's Guide

Adds a new native session property to control the minimum shuffle compression page size, wires it through Java and C++ session/property plumbing, and ensures BroadcastWrite uses the configured threshold while keeping the default behavior disabled (0).

Sequence diagram for native_min_shuffle_compression_page_size_bytes propagation

sequenceDiagram
    actor User
    participant PrestoSession as PrestoSession
    participant NativeWorkerSessionPropertyProvider as NativeWorkerSessionPropertyProvider
    participant NativeEngine as NativeEngine
    participant QueryConfig as QueryConfig
    participant BroadcastWriteOperator as BroadcastWriteOperator
    participant VectorSerde as VectorSerdeOptions

    User->>PrestoSession: Set native_min_shuffle_compression_page_size_bytes
    PrestoSession->>NativeWorkerSessionPropertyProvider: Request native session properties
    NativeWorkerSessionPropertyProvider-->>PrestoSession: integerProperty NATIVE_MIN_SHUFFLE_COMPRESSION_PAGE_SIZE_BYTES

    PrestoSession->>NativeEngine: Start query with session properties
    NativeEngine->>QueryConfig: Initialize from session
    QueryConfig-->>NativeEngine: minShuffleCompressionPageSizeBytes

    NativeEngine->>BroadcastWriteOperator: Create operator with OperatorCtx
    BroadcastWriteOperator->>QueryConfig: minShuffleCompressionPageSizeBytes()
    QueryConfig-->>BroadcastWriteOperator: thresholdBytes

    BroadcastWriteOperator->>VectorSerde: getVectorSerdeOptions(compressionKind, Presto, nullopt, thresholdBytes)
    VectorSerde-->>BroadcastWriteOperator: SerdeOptions with minPageSizeBytes
Loading

Class diagram for new shuffle compression page size session property

classDiagram
    class NativeWorkerSessionPropertyProvider {
        <<class>>
        +String NATIVE_MAX_PAGE_PARTITIONING_BUFFER_SIZE
        +String NATIVE_PARTITIONED_OUTPUT_EAGER_FLUSH
        +String NATIVE_MAX_OUTPUT_BUFFER_SIZE
        +String NATIVE_MIN_SHUFFLE_COMPRESSION_PAGE_SIZE_BYTES
        +String NATIVE_QUERY_TRACE_ENABLED
        +String NATIVE_QUERY_TRACE_DIR
        +String NATIVE_QUERY_TRACE_NODE_ID
        +NativeWorkerSessionPropertyProvider(FeaturesConfig featuresConfig)
        -SessionProperty integerProperty(String name, String description, int defaultValue, boolean hidden)
    }

    class SessionProperties {
        <<class>>
        +static const char* kShuffleCompressionCodec
        +static const char* kMinShuffleCompressionPageSizeBytes
        +SessionProperties()
        -void addSessionProperty(const char* name, const char* description, Type type, bool hidden, const char* queryConfigKey, std::string defaultValue)
    }

    class QueryConfig {
        <<class>>
        +std::string shuffleCompressionKind()
        +uint64_t minShuffleCompressionPageSizeBytes()
    }

    class BroadcastWriteOperator {
        <<class>>
        -OperatorCtx* operatorCtx_
        +BroadcastWriteOperator(OperatorCtx* operatorCtx, BroadcastFactory* broadcastFactory, RowTypePtr rowType)
        -void createSerdeOptions()
    }

    class VectorSerdeOptions {
        <<class>>
    }

    class getVectorSerdeOptions {
        <<function>>
        +VectorSerdeOptions getVectorSerdeOptions(CompressionKind kind, std::string driverType, std::optional~uint32_t~ level, uint64_t minPageSizeBytes)
    }

    SessionProperties --> QueryConfig : uses QueryConfig kMinShuffleCompressionPageSizeBytes
    NativeWorkerSessionPropertyProvider --> SessionProperties : maps NATIVE_MIN_SHUFFLE_COMPRESSION_PAGE_SIZE_BYTES
    BroadcastWriteOperator --> QueryConfig : calls minShuffleCompressionPageSizeBytes()
    BroadcastWriteOperator --> getVectorSerdeOptions : passes minShuffleCompressionPageSizeBytes
    getVectorSerdeOptions --> VectorSerdeOptions : returns
Loading

File-Level Changes

Change Details Files
Introduce a native session property for minimum shuffle compression page size and register it in native C++ session properties.
  • Define kMinShuffleCompressionPageSizeBytes constant with the Java-visible property name.
  • Register the property via addSessionProperty with integer type, mapping to QueryConfig::kMinShuffleCompressionPageSizeBytes and defaulting from QueryConfig.
  • Document that this controls the minimum serialized page size in bytes to attempt shuffle compression.
presto-native-execution/presto_cpp/main/properties/session/SessionProperties.h
presto-native-execution/presto_cpp/main/properties/session/SessionProperties.cpp
Expose the new property on the Java side for native workers with a default of 0 (disabled).
  • Add NATIVE_MIN_SHUFFLE_COMPRESSION_PAGE_SIZE_BYTES constant to NativeWorkerSessionPropertyProvider.
  • Register the property as an integer session property with description, default value 0, and disabled when nativeExecution is false.
presto-main-base/src/main/java/com/facebook/presto/sessionpropertyproviders/NativeWorkerSessionPropertyProvider.java
Wire the new configuration into BroadcastWrite so shuffle compression respects the per-query minimum page size threshold.
  • Extend the getVectorSerdeOptions call to pass minShuffleCompressionPageSizeBytes from QueryConfig as the minimum compression page size.
  • Use std::nullopt for the optional codec parameter to preserve existing behavior aside from the new threshold.
presto-native-execution/presto_cpp/main/operators/BroadcastWrite.cpp
Keep Java↔Velox session property name mapping validated in tests.
  • Add a mapping entry for kMinShuffleCompressionPageSizeBytes to core::QueryConfig::kMinShuffleCompressionPageSizeBytes in SessionPropertiesTest::validateMapping.
presto-native-execution/presto_cpp/main/properties/session/tests/SessionPropertiesTest.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • For native_min_shuffle_compression_page_size_bytes, consider using a DataSize/longProperty (or equivalent wider type) instead of integerProperty, since this is a size-in-bytes threshold and may reasonably exceed 32-bit limits in some deployments.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- For `native_min_shuffle_compression_page_size_bytes`, consider using a `DataSize`/`longProperty` (or equivalent wider type) instead of `integerProperty`, since this is a size-in-bytes threshold and may reasonably exceed 32-bit limits in some deployments.

## Individual Comments

### Comment 1
<location path="presto-native-execution/presto_cpp/main/properties/session/SessionProperties.cpp" line_range="379-382" />
<code_context>
+      kMinShuffleCompressionPageSizeBytes,
+      "Native Execution only. Minimum serialized page size in bytes to attempt "
+      "shuffle compression.",
+      INTEGER(),
+      false,
+      QueryConfig::kMinShuffleCompressionPageSizeBytes,
+      std::to_string(c.minShuffleCompressionPageSizeBytes()));
+
   // If `legacy_timestamp` is true, the coordinator expects timestamp
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider enforcing a non-negative (or minimum) constraint for the page size session property

This is currently a plain `INTEGER()` with no lower bound. If a user sets a negative or unrealistically small value, it may break assumptions about "bytes" (e.g., requiring `>= 0` or `>= 1`) and cause confusing shuffle compression behavior. Please either switch to a constrained numeric type or add config-layer validation so invalid values are rejected during session parsing.

Suggested implementation:

```cpp
  addSessionProperty(
      kMinShuffleCompressionPageSizeBytes,
      "Native Execution only. Minimum serialized page size in bytes to attempt "
      "shuffle compression.",
      NON_NEGATIVE_INTEGER(),
      false,
      QueryConfig::kMinShuffleCompressionPageSizeBytes,
      std::to_string(c.minShuffleCompressionPageSizeBytes()));

```

1. Ensure that a `NON_NEGATIVE_INTEGER()` type helper exists and is wired to enforce `value >= 0` at session parsing time. This is typically defined alongside the other `INTEGER()`/numeric type helpers used for session properties. If such a helper does not exist:
   - Introduce it in the appropriate type helper header/source (where `INTEGER()` is defined).
   - Make it wrap the same base integer type as `INTEGER()` but with a minimum value of 0.
2. If your codebase uses a different naming convention (e.g., `NONNEGATIVE_INTEGER()` or `UNSIGNED_INTEGER(min, max)`), adjust the replacement accordingly to match the existing constrained numeric type used for other session properties.
3. Optionally, add a corresponding validation or unit test that attempts to set this session property to negative and very small values to confirm they are rejected or normalized as intended.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +379 to +382
INTEGER(),
false,
QueryConfig::kMinShuffleCompressionPageSizeBytes,
std::to_string(c.minShuffleCompressionPageSizeBytes()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider enforcing a non-negative (or minimum) constraint for the page size session property

This is currently a plain INTEGER() with no lower bound. If a user sets a negative or unrealistically small value, it may break assumptions about "bytes" (e.g., requiring >= 0 or >= 1) and cause confusing shuffle compression behavior. Please either switch to a constrained numeric type or add config-layer validation so invalid values are rejected during session parsing.

Suggested implementation:

  addSessionProperty(
      kMinShuffleCompressionPageSizeBytes,
      "Native Execution only. Minimum serialized page size in bytes to attempt "
      "shuffle compression.",
      NON_NEGATIVE_INTEGER(),
      false,
      QueryConfig::kMinShuffleCompressionPageSizeBytes,
      std::to_string(c.minShuffleCompressionPageSizeBytes()));
  1. Ensure that a NON_NEGATIVE_INTEGER() type helper exists and is wired to enforce value >= 0 at session parsing time. This is typically defined alongside the other INTEGER()/numeric type helpers used for session properties. If such a helper does not exist:
    • Introduce it in the appropriate type helper header/source (where INTEGER() is defined).
    • Make it wrap the same base integer type as INTEGER() but with a minimum value of 0.
  2. If your codebase uses a different naming convention (e.g., NONNEGATIVE_INTEGER() or UNSIGNED_INTEGER(min, max)), adjust the replacement accordingly to match the existing constrained numeric type used for other session properties.
  3. Optionally, add a corresponding validation or unit test that attempts to set this session property to negative and very small values to confirm they are rejected or normalized as intended.

@meta-codesync meta-codesync Bot changed the title feat(native): Add native_min_shuffle_compression_page_size_bytes session property feat(native): Add native_min_shuffle_compression_page_size_bytes session property (#27683) May 7, 2026
@hdikeman hdikeman force-pushed the export-D100420473 branch from 72b5f9f to 09cf663 Compare May 7, 2026 22:20
hdikeman added a commit to hdikeman/presto that referenced this pull request May 7, 2026
…ion property (prestodb#27683)

Summary:

Wires the new Velox `min_shuffle_compression_page_size_bytes` property through to a Presto-native session property and the BroadcastWrite operator so users can tune the small-page shuffle-compression skip threshold per query.

Adds:
- `native_min_shuffle_compression_page_size_bytes` Java session property in NativeWorkerSessionPropertyProvider, grouped with the other PartitionedOutput / output-buffer properties.
- Matching constant and addSessionProperty() registration in the Prestissimo SessionProperties (placed near kPartitionedOutputEagerFlush, alongside the other shuffle/output-related session properties).
- Mapping entry in SessionPropertiesTest::validateMapping to keep the Java↔Velox name correspondence asserted.
- BroadcastWrite operator: passes the new threshold through to getVectorSerdeOptions so broadcast shuffle pages also honor the skip behavior.

Default value is 0 (disabled), preserving existing behavior unless the user opts in.

Differential Revision: D100420473
hdikeman added a commit to hdikeman/presto that referenced this pull request May 7, 2026
…ion property (prestodb#27683)

Summary:

Wires the new Velox `min_shuffle_compression_page_size_bytes` property through to a Presto-native session property and the BroadcastWrite operator so users can tune the small-page shuffle-compression skip threshold per query.

Adds:
- `native_min_shuffle_compression_page_size_bytes` Java session property in NativeWorkerSessionPropertyProvider, grouped with the other PartitionedOutput / output-buffer properties.
- Matching constant and addSessionProperty() registration in the Prestissimo SessionProperties (placed near kPartitionedOutputEagerFlush, alongside the other shuffle/output-related session properties).
- Mapping entry in SessionPropertiesTest::validateMapping to keep the Java↔Velox name correspondence asserted.
- BroadcastWrite operator: passes the new threshold through to getVectorSerdeOptions so broadcast shuffle pages also honor the skip behavior.

Default value is 0 (disabled), preserving existing behavior unless the user opts in.

Differential Revision: D100420473
@hdikeman hdikeman force-pushed the export-D100420473 branch from 09cf663 to f24ca59 Compare May 7, 2026 22:47
…ion property (prestodb#27683)

Summary:

Wires the new Velox `min_shuffle_compression_page_size_bytes` property through to a Presto-native session property and the BroadcastWrite operator so users can tune the small-page shuffle-compression skip threshold per query.

Adds:
- `native_min_shuffle_compression_page_size_bytes` Java session property in NativeWorkerSessionPropertyProvider, grouped with the other PartitionedOutput / output-buffer properties.
- Matching constant and addSessionProperty() registration in the Prestissimo SessionProperties (placed near kPartitionedOutputEagerFlush, alongside the other shuffle/output-related session properties).
- Mapping entry in SessionPropertiesTest::validateMapping to keep the Java↔Velox name correspondence asserted.
- BroadcastWrite operator: passes the new threshold through to getVectorSerdeOptions so broadcast shuffle pages also honor the skip behavior.

Default value is 0 (disabled), preserving existing behavior unless the user opts in.

Differential Revision: D100420473
@hdikeman hdikeman force-pushed the export-D100420473 branch from f24ca59 to 60cecba Compare May 7, 2026 22:59
Copy link
Copy Markdown
Contributor

@amitkdutta amitkdutta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hdikeman

@hdikeman hdikeman merged commit 89ceb16 into prestodb:master May 8, 2026
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants