Skip to content

[native] Add DeleteNode translation to TableWriteNode in Velox query plan#24772

Merged
ghelmling merged 1 commit into
prestodb:masterfrom
ghelmling:export-D71643790
Mar 30, 2025
Merged

[native] Add DeleteNode translation to TableWriteNode in Velox query plan#24772
ghelmling merged 1 commit into
prestodb:masterfrom
ghelmling:export-D71643790

Conversation

@ghelmling

Copy link
Copy Markdown
Contributor

Summary:
This adds hooks to:

  • PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
  • PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790

@ghelmling ghelmling requested a review from a team as a code owner March 21, 2025 19:59
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71643790

ghelmling added a commit to ghelmling/presto that referenced this pull request Mar 23, 2025
…stodb#24772)

Summary:

This adds hooks to:
- PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
- PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71643790

@aditi-pandit aditi-pandit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::string connectorId;
std::shared_ptr<connector::ConnectorInsertTableHandle> connectorInsertHandle;
if (
auto deleteHandle = std::dynamic_pointer_cast<protocol::DeleteHandle>(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit maybe abstract a toConnectorDeleteTableHandle method like other functions (e.g. toConnectorTableHandle) in this file.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look! I added a for toVeloxQueryPlan(DeleteNode). It was a bit complex due to not having an existing connector actually implementing the delete portions, but I think I've successfully mocked up what was needed.

ghelmling added a commit to ghelmling/presto that referenced this pull request Mar 25, 2025
…stodb#24772)

Summary:

This adds hooks to:
- PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
- PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71643790

ghelmling added a commit to ghelmling/presto that referenced this pull request Mar 25, 2025
…stodb#24772)

Summary:

This adds hooks to:
- PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
- PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71643790

ghelmling added a commit to ghelmling/presto that referenced this pull request Mar 25, 2025
…stodb#24772)

Summary:

This adds hooks to:
- PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
- PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71643790

@ghelmling

Copy link
Copy Markdown
Contributor Author

The 2 failing signals here can be ignored. The Meta-internal changes failure is due to a change required for internal builds. The linter failure is due to unused parameters in PrestoToVeloxConnector::toVeloxInsertTableHandle, which should be ignored for style consistency with other function definitions in the header.

@ethanyzhang ethanyzhang added the from:Meta PR from Meta label Mar 26, 2025
@aditi-pandit aditi-pandit changed the title Add DeleteNode translation to TableWriteNode in Velox query plan [native] Add DeleteNode translation to TableWriteNode in Velox query plan Mar 26, 2025

@aditi-pandit aditi-pandit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ghelmling for these code changes. They look good minus the question on CMakeLists file.

presto_types
velox_dwio_common
velox_exec_test_lib
velox_functions_prestosql

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this dependency ? Seems a bit odd.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a mistake. A similar dependency was necessary in an internal build for the function register* calls in the test, but those were pre-existing so probably not needed here. I'll try removing it.

ghelmling added a commit to ghelmling/presto that referenced this pull request Mar 26, 2025
…stodb#24772)

Summary:

This adds hooks to:
- PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
- PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790
@facebook-github-bot

Copy link
Copy Markdown
Collaborator

This pull request was exported from Phabricator. Differential Revision: D71643790

@ghelmling

Copy link
Copy Markdown
Contributor Author

@aditi-pandit I've dropped the CMakeLists change. Any remaining concerns on the changes?

@aditi-pandit aditi-pandit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ghelmling : Your code changes look good now. Though I had a high level question about the e2e use of this code before approval.

So this code translates DeleteNode -> Table write with InsertHandle of delete files. Are these the only worker side changes needed for your feature ? Say this is the second deletion of a table, then are the positional deletes appended to the existing ones, or these are always new files ? How do we know the original deletes are not over-written ? Does the bucketing and partitioning segregation of usual inserts also apply to the delete files (seems like) ?

There likely should be more co-ordinator side commit logic to handle Delete commit updates differently from Insert commit updates. And these too get wired through a TableWriterNode I imagine. Can you share bit more about how the TableCommit handles this further ? Will there be changes to any TableCommit Context messages for this feature ?

Are you going to write e2e tests for these changes ? Are there plans for Hive or Iceberg connector to use this translation ?

@ghelmling

Copy link
Copy Markdown
Contributor Author

@aditi-pandit The overall flow we are working towards for DELETE handling is:

  1. DELETE query is parsed and we generate a DeleteNode for the logical plan
  2. A ConnectorOptimizer inspects the DeleteNode and projects in a couple special columns that we need to write the delete files, and modifies the plan to allow writes to be partitioned to delta file, since we don't use a straight 1-1 base to delete file mapping. We don't use row_id directly for delete files since we only need a subset of the information there.
  3. ConnectorMetadata.beginDelete generates a ConnectorDeleteTableHandle with the metadata needed for delete file naming and the later commit of delete files to partitions
  4. Workers receiving the DeleteNode plan fragment translate the ConnectorDeleteTableHandle to a HiveInsertTableHandle, and the DeleteNode into TableWriteNode. We could have carried through delete-specific types here, but in the end a positional delete just outputs normal files but with special columns to tombstone the rows that have been deleted.
  5. The TableWriter operator outputs the files with a HiveDataSink (with some overriding for file naming and delete file partitioning), then serializes back normal PartitionUpdates to the coordinator
  6. The coordinator commits the delete files to each partition with ConnectorMetadata.commitPageSinkAsync and ConnectorMetadata.finishDelete. Any additional metadata we need for commit is already present in the ConnectorDeleteTableHandle instance passed to finishDelete, so we don't need workers to serialize back any additional context

Delete files are added to a partition via metadata that associates a set of deletes with the base write. Partitioning applies to delete files that are written, but in our internal implementation bucketing does not, since we don't use a 1-1 mapping of base to delete files.

We do have end-to-end tests for the entire flow, but the delete file metadata and merging is heavily dependent on internal only systems, so these are not easily extractable into an OSS implementation. I would love to see Iceberg pick this up to support DELETE on native workers, but don't know enough about how it handles delete file associates to be able to say exactly what would be required. It seems like the hooks here and TableWriteNode translation should be directly applicable, and Iceberg would just need to pipe through the appropriate column projections in the query plan, but I don't know what changes might be needed in their current commit handling.

I hope this helps provide a clearer picture of how this all fits together. Let me know if you have other questions or concerns.

@aditi-pandit

Copy link
Copy Markdown
Contributor

@ghelmling : Thanks for your explanation. Makes sense.

Fwding few Iceberg folks to look at this when they implement the DELETES.
@imjalpreet @agrawalreetika @ZacBlanco

@aditi-pandit

Copy link
Copy Markdown
Contributor

@ghelmling : Please can you rebase. I'll approve the PR.

…stodb#24772)

Summary:

This adds hooks to:
- PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
- PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790

@aditi-pandit aditi-pandit left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@ghelmling ghelmling merged commit d22d3c2 into prestodb:master Mar 30, 2025
pradeepvaka pushed a commit to pradeepvaka/presto that referenced this pull request Apr 11, 2025
…stodb#24772)

Summary:

This adds hooks to:
- PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans
- PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan

TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes

Differential Revision: D71643790
@ZacBlanco ZacBlanco mentioned this pull request May 29, 2025
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants