Skip to content

Support sorted shuffle write in Sapphire-Velox stack#24739

Merged
facebook-github-bot merged 1 commit into
prestodb:masterfrom
emilysun201309:export-D71236519
Apr 22, 2025
Merged

Support sorted shuffle write in Sapphire-Velox stack#24739
facebook-github-bot merged 1 commit into
prestodb:masterfrom
emilysun201309:export-D71236519

Conversation

@emilysun201309

Copy link
Copy Markdown
Contributor

Summary:
Support sorted shuffle write in Sapphire-Velox stack:

  • Copy the key value serializer: fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h from Spruce to github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h
    This is so that we don't have dependency on the internal implementation.

  • Allow PartitionAndSerialize Node to take sortingOrder and sortingKeys` as optional parameters.

  • If both sortingOrder and sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519

@emilysun201309 emilysun201309 requested a review from a team as a code owner March 17, 2025 16:54
@linux-foundation-easycla

linux-foundation-easycla Bot commented Mar 17, 2025

Copy link
Copy Markdown

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: emilysun201309 (694355d)

@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Mar 17, 2025
Summary:

Support sorted shuffle write in Sapphire-Velox stack: 
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h` 
This is so that we don't have dependency on the internal implementation. 

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters. 

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Mar 17, 2025
Summary:
Pull Request resolved: prestodb#24739

Support sorted shuffle write in Sapphire-Velox stack:
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h`
This is so that we don't have dependency on the internal implementation.

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters.

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Mar 20, 2025
Summary:

Support sorted shuffle write in Sapphire-Velox stack: 
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h` 
This is so that we don't have dependency on the internal implementation. 

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters. 

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Mar 24, 2025
Summary:

Support sorted shuffle write in Sapphire-Velox stack: 
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h` 
This is so that we don't have dependency on the internal implementation. 

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters. 

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Mar 24, 2025
Summary:

Support sorted shuffle write in Sapphire-Velox stack: 
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h` 
This is so that we don't have dependency on the internal implementation. 

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters. 

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

@steveburnett

Copy link
Copy Markdown
Contributor

Hi @emilysun201309, would you sign the Presto CLA? The information to do so is in this earlier comment.

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 3, 2025
Summary:

Support sorted shuffle write in Sapphire-Velox stack: 
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h` 
This is so that we don't have dependency on the internal implementation. 

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters. 

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 3, 2025
Summary:

Support sorted shuffle write in Sapphire-Velox stack: 
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h` 
This is so that we don't have dependency on the internal implementation. 

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters. 

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 3, 2025
Summary:

Support sorted shuffle write in Sapphire-Velox stack: 
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h` 
This is so that we don't have dependency on the internal implementation. 

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters. 

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 3, 2025
Summary:
Pull Request resolved: prestodb#24739

Support sorted shuffle write in Sapphire-Velox stack:
- Copy the key value serializer: `fbcode/spark_cpp/src/main/serializer/CoscoKeyValueSerializer.h` from Spruce to `github/presto-trunk/presto-native-execution/presto_cpp/main/operators/KeyValueSerializer.h`
This is so that we don't have dependency on the internal implementation.

- Allow PartitionAndSerialize Node to take `sortingOrder and `sortingKeys` as optional parameters.

- If both `sortingOrder and `sortingKeys` are provided, serialize the Keys and invoke Cosco sorted shuffle by providing an non-empty key.

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

1 similar comment
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 11, 2025
Summary:
Pull Request resolved: prestodb#24739

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 16, 2025
Summary:

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 21, 2025
Summary:

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 21, 2025
Summary:

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

1 similar comment
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 21, 2025
Summary:
Pull Request resolved: prestodb#24739

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
@emilysun201309

Copy link
Copy Markdown
Contributor Author

/easycla

emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 22, 2025
Summary:

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
emilysun201309 added a commit to emilysun201309/presto that referenced this pull request Apr 22, 2025
Summary:

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
Summary:
Pull Request resolved: prestodb#24739

To later support sorted shuffle in Sapphire-Velox, we need to have a way to serializer sort keys, so that they are binary sortable. This diff creates a BinarySortableSerializer.
- BinarySortableSerializer is based on Hive's BinarySortableSerDe
- The implementation is borrowed from the CoscoKeyValueSerializer from Spruce.
- Create a copy of the above in presto-trunk this is so that we don't have dependency on the internal implementation.
- Add benchmarks in BinarySortableSerializerBenchmark.cpp

Reviewed By: xiaoxmeng

Differential Revision: D71236519
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

1 similar comment
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71236519

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants