[native] Add DeleteNode translation to TableWriteNode in Velox query plan#24772
Conversation
|
This pull request was exported from Phabricator. Differential Revision: D71643790 |
…stodb#24772) Summary: This adds hooks to: - PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans - PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes Differential Revision: D71643790
8a4cb61 to
5cb0a30
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71643790 |
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks @ghelmling.
Would be great to have a test in PrestoToVeloxQueryPlanTest https://github.com/prestodb/presto/blob/master/presto-native-execution/presto_cpp/main/types/tests/PrestoToVeloxQueryPlanTest.cpp for this code.
| std::string connectorId; | ||
| std::shared_ptr<connector::ConnectorInsertTableHandle> connectorInsertHandle; | ||
| if ( | ||
| auto deleteHandle = std::dynamic_pointer_cast<protocol::DeleteHandle>( |
There was a problem hiding this comment.
Nit maybe abstract a toConnectorDeleteTableHandle method like other functions (e.g. toConnectorTableHandle) in this file.
There was a problem hiding this comment.
Thanks for taking a look! I added a for toVeloxQueryPlan(DeleteNode). It was a bit complex due to not having an existing connector actually implementing the delete portions, but I think I've successfully mocked up what was needed.
5cb0a30 to
6ee42db
Compare
…stodb#24772) Summary: This adds hooks to: - PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans - PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes Differential Revision: D71643790
|
This pull request was exported from Phabricator. Differential Revision: D71643790 |
…stodb#24772) Summary: This adds hooks to: - PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans - PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes Differential Revision: D71643790
6ee42db to
4086c74
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71643790 |
…stodb#24772) Summary: This adds hooks to: - PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans - PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes Differential Revision: D71643790
4086c74 to
c3c8b85
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71643790 |
|
The 2 failing signals here can be ignored. The Meta-internal changes failure is due to a change required for internal builds. The linter failure is due to unused parameters in PrestoToVeloxConnector::toVeloxInsertTableHandle, which should be ignored for style consistency with other function definitions in the header. |
aditi-pandit
left a comment
There was a problem hiding this comment.
Thanks @ghelmling for these code changes. They look good minus the question on CMakeLists file.
| presto_types | ||
| velox_dwio_common | ||
| velox_exec_test_lib | ||
| velox_functions_prestosql |
There was a problem hiding this comment.
Why do we need this dependency ? Seems a bit odd.
There was a problem hiding this comment.
This might be a mistake. A similar dependency was necessary in an internal build for the function register* calls in the test, but those were pre-existing so probably not needed here. I'll try removing it.
…stodb#24772) Summary: This adds hooks to: - PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans - PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes Differential Revision: D71643790
c3c8b85 to
eb61b17
Compare
|
This pull request was exported from Phabricator. Differential Revision: D71643790 |
|
@aditi-pandit I've dropped the CMakeLists change. Any remaining concerns on the changes? |
There was a problem hiding this comment.
@ghelmling : Your code changes look good now. Though I had a high level question about the e2e use of this code before approval.
So this code translates DeleteNode -> Table write with InsertHandle of delete files. Are these the only worker side changes needed for your feature ? Say this is the second deletion of a table, then are the positional deletes appended to the existing ones, or these are always new files ? How do we know the original deletes are not over-written ? Does the bucketing and partitioning segregation of usual inserts also apply to the delete files (seems like) ?
There likely should be more co-ordinator side commit logic to handle Delete commit updates differently from Insert commit updates. And these too get wired through a TableWriterNode I imagine. Can you share bit more about how the TableCommit handles this further ? Will there be changes to any TableCommit Context messages for this feature ?
Are you going to write e2e tests for these changes ? Are there plans for Hive or Iceberg connector to use this translation ?
|
@aditi-pandit The overall flow we are working towards for DELETE handling is:
Delete files are added to a partition via metadata that associates a set of deletes with the base write. Partitioning applies to delete files that are written, but in our internal implementation bucketing does not, since we don't use a 1-1 mapping of base to delete files. We do have end-to-end tests for the entire flow, but the delete file metadata and merging is heavily dependent on internal only systems, so these are not easily extractable into an OSS implementation. I would love to see Iceberg pick this up to support DELETE on native workers, but don't know enough about how it handles delete file associates to be able to say exactly what would be required. It seems like the hooks here and TableWriteNode translation should be directly applicable, and Iceberg would just need to pipe through the appropriate column projections in the query plan, but I don't know what changes might be needed in their current commit handling. I hope this helps provide a clearer picture of how this all fits together. Let me know if you have other questions or concerns. |
|
@ghelmling : Thanks for your explanation. Makes sense. Fwding few Iceberg folks to look at this when they implement the DELETES. |
|
@ghelmling : Please can you rebase. I'll approve the PR. |
…stodb#24772) Summary: This adds hooks to: - PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans - PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes Differential Revision: D71643790
eb61b17 to
7873090
Compare
…stodb#24772) Summary: This adds hooks to: - PrestoToVeloxConnector to handled the DeleteHandle provided in TableWriteInfo for DELETE plans - PrestoToVeloxQueryPlan to generate a velox::core::TableWriteNode from a protocol::DeleteNode plan TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes Differential Revision: D71643790
Summary:
This adds hooks to:
TableWriteNode is used to allow us to leverage the TableWriter operator to generate the delta files needed for deletes
Differential Revision: D71643790