Skip to content

feat(native): Insert into bucketed but unpartitioned Hive table#25139

Merged
anandamideShakyan merged 1 commit into
prestodb:masterfrom
anandamideShakyan:insert-bucketed-unpar-hive
Jun 5, 2026
Merged

feat(native): Insert into bucketed but unpartitioned Hive table#25139
anandamideShakyan merged 1 commit into
prestodb:masterfrom
anandamideShakyan:insert-bucketed-unpar-hive

Conversation

@anandamideShakyan

@anandamideShakyan anandamideShakyan commented May 18, 2025

Copy link
Copy Markdown
Contributor

Description

Addresses #25104
Currently, Presto does not support INSERT INTO operations on bucketed but unpartitioned Hive tables. This limitation originates from a hard check in HiveWriterFactory:

https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L480

Motivation and Context

Supporting writes to bucketed unpartitioned Hive tables in Presto would improve compatibility and enhance Presto’s ability to handle modern Hive table layouts. It's a reasonable and useful feature for users who wish to leverage bucketing for performance optimizations even without partitioning.

Impact

This change would align Presto’s behavior with the broader SQL-on-Hadoop ecosystem and remove an artificial limitation that may block valid use cases — particularly in data warehousing environments where bucketing is used independently of partitioning.

Release Notes

== RELEASE NOTES ==

Hive Connector Changes

* Add support for INSERT into bucketed but unpartitioned Hive tables in Hive, including follow-up fixes for native validation and insert handling.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label May 18, 2025
@anandamideShakyan anandamideShakyan marked this pull request as ready for review May 18, 2025 16:29
@anandamideShakyan anandamideShakyan requested a review from a team as a code owner May 18, 2025 16:29
@prestodb-ci prestodb-ci requested review from a team, namya28 and pramodsatya and removed request for a team May 18, 2025 16:29
@aditi-pandit

Copy link
Copy Markdown
Contributor

@anandamideShakyan : Thanks for this PR.

Have you tried this functionality with Prestissimo ? You might need facebookincubator/velox#13283 as well for it.

@anandamideShakyan

Copy link
Copy Markdown
Contributor Author

@aditi-pandit Sure I will add the support in Prestissimo after facebookincubator/velox#13283 is merged.

@aditi-pandit

Copy link
Copy Markdown
Contributor

@anandamideShakyan : Ther are failures in product tests. PTAL.

2025-05-18 19:49:10 INFO: [78 of 435] com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: )
2025-05-18 19:49:11 INFO: FAILURE     /    com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: ) took 1.1 seconds
2025-05-18 19:49:11 SEVERE: Failure cause:
java.lang.IllegalArgumentException: No mutable table instance found for name TableHandle{name=bucket_nation}
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:64)
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:48)
	at com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables(TestHiveBucketedTables.java:173)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch 4 times, most recently from 0e437dd to 38805a8 Compare May 26, 2025 07:21
@anandamideShakyan

anandamideShakyan commented Jun 18, 2025

Copy link
Copy Markdown
Contributor Author

@anandamideShakyan : Thanks for this PR.

Have you tried this functionality with Prestissimo ? You might need facebookincubator/velox#13283 as well for it.

I tried it on Prestissimo, with one coordinator and one worker. I created a table in hive schema and tpcds catalog using:

CREATE TABLE cars (
    id BIGINT,
    name VARCHAR,
    brand VARCHAR
)
WITH (
    format = 'PARQUET',
    bucketed_by = ARRAY['id'],
    bucket_count = 4
);

Inserted values:

INSERT INTO cars (id, name, brand) VALUES
  (1, 'Model S', 'Tesla'),
  (2, 'Civic', 'Honda'),
  (3, 'Mustang', 'Ford'),
  (4, 'A4', 'Audi');

Was able to see the entries on running the select query:

Screenshot 2025-06-18 at 1 07 59 PM

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 38805a8 to 817f7df Compare June 25, 2025 07:33
@anandamideShakyan anandamideShakyan requested a review from a team as a code owner June 25, 2025 07:33
import static java.lang.Boolean.parseBoolean;
import static org.testng.Assert.assertEquals;

public class TestHivePartitionedInsertNative

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move these testcases to presto-tests or presto-product-tests? Ideally, we don't want to add new testcases to presto-native-tests, instead we should just extend the existing e2e tests (such as the ones added to presto-product-tests in this PR) to run with with the native query runner.

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 817f7df to ed26ecc Compare June 27, 2025 22:41
@steveburnett

Copy link
Copy Markdown
Contributor

Consider adding an example of how to use this new ability, or at least a mention that this is now possible for users to do and why it's useful (as you wrote in the Description), to the documentation.

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from ed26ecc to e8591d3 Compare January 31, 2026 11:15
@anandamideShakyan anandamideShakyan changed the title Insert into bucketed but unpartitioned Hive table feat(native): Insert into bucketed but unpartitioned Hive table Jan 31, 2026
@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from e8591d3 to f68fe3d Compare January 31, 2026 12:36
@aditi-pandit

Copy link
Copy Markdown
Contributor

@anandamideShakyan : It will be good to complete this work as it has been a long pending item. Please can you take a look at the failures.

@anandamideShakyan

Copy link
Copy Markdown
Contributor Author

Inserts into bucketed Hive tables using the C++ (Velox) worker were failing during finishInsert with:
Screenshot 2026-02-06 at 8 12 48 PM

VerifyException: computeFileNamesForMissingBuckets

This happens because Presto’s Hive metadata layer assumes exactly one file per bucket per partition.
If any bucket does not produce a file, Presto attempts to synthesize “missing bucket” files during commit.

The Java worker never hits this path because it always creates one file per bucket, even when a bucket receives zero rows.

The Velox (C++) HiveDataSink, however, only created writers for buckets that actually received rows. When a bucket was empty, no writer → no file, causing Presto to think the bucket was missing and fail verification.

This is why inserts succeeded when data happened to hit all buckets, and failed otherwise.

Fix

The fix ensures that Velox creates one writer (and therefore one output file) per bucket, matching Java worker behavior and Presto’s expectations.

Specifically:

During HiveDataSink::splitInputRowsAndEnsureWriters(), we now pre-create writers for all buckets (for each partition, if partitioned).

This guarantees that every bucket produces exactly one file, even if it contains zero rows.

As a result, computeFileNamesForMissingBuckets() is never triggered and finishInsert succeeds.

To Do

  • This is a Velox-side fix (C++ worker behavior).

  • The original PR is in Presto, but the correct fix belongs in Velox, so a separate Velox PR is required. Will create velox PR soon.

  • This change aligns C++ worker semantics with Java worker semantics and Hive’s bucketing contract.

With this fix locally, I am able to insert into bucketed hive tables with and without sidecar. I am now looking at resolving the unit test failure that came after these changes : #25115

@aditi-pandit

Copy link
Copy Markdown
Contributor

@anandamideShakyan : Presto has a property hive.create-empty-bucket-files to control whether to create empty bucket files. Seems like this should always be false for native engine.

But in any case, doesn't Presto server create the missing buckets on the co-ordinator in the TableFinish logic and not in the worker ? I feel it should be on co-ordinator in TableFinish as its only after seeing all the worker files should we know which buckets are empty. The individual worker cannot make this decision.

This error seems like a local problem between Hive and Presto on the co-ordinator.

Please can you recheck if something else is missing.

@anandamideShakyan

Copy link
Copy Markdown
Contributor Author

@aditi-pandit You were right, I am able to run insert queries successfully when I set hive.create-empty-bucket-files=false. I am getting this errror while running the tests in the queryrunner:

java.lang.RuntimeException: I/O error getting native plan checker response

	at com.facebook.presto.tests.AbstractTestingPrestoClient.execute(AbstractTestingPrestoClient.java:127)
	at com.facebook.presto.tests.AbstractTestingPrestoClient.execute(AbstractTestingPrestoClient.java:92)
	at com.facebook.presto.tests.DistributedQueryRunner.execute(DistributedQueryRunner.java:894)
	at com.facebook.presto.tests.DistributedQueryRunner.execute(DistributedQueryRunner.java:868)
	at com.facebook.presto.nativetests.TestHivePartitionedInsertNative.testInsertIntoBucketedTables(TestHivePartitionedInsertNative.java:92)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.testng.TestRunner.privateRun(TestRunner.java:808)
	at org.testng.TestRunner.run(TestRunner.java:603)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:429)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:423)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:383)
	at org.testng.SuiteRunner.run(SuiteRunner.java:326)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1249)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1169)
	at org.testng.TestNG.runSuites(TestNG.java:1092)
	at org.testng.TestNG.run(TestNG.java:1060)
	at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:65)
	at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:105)
Caused by: com.facebook.presto.spi.PrestoException: I/O error getting native plan checker response
	at com.facebook.presto.sidecar.nativechecker.NativePlanChecker.runValidation(NativePlanChecker.java:136)
	at com.facebook.presto.sidecar.nativechecker.NativePlanChecker.validateFragment(NativePlanChecker.java:92)

I checked the sidecar worker startup logs:

I20260219 13:07:33.486436 1918031 PrestoServer.cpp:690] [PRESTO_STARTUP] Server listening at :::52393 - https false
I20260219 13:07:33.486797 1918322 CoordinatorDiscoverer.cpp:44] Coordinator address changed to 127.0.0.1:52378
I20260219 13:07:33.486809 1918322 PeriodicServiceInventoryManager.cpp:80] Service Inventory changed to 127.0.0.1:52378
I20260219 13:07:33.489058 1918322 PeriodicServiceInventoryManager.cpp:126] Announcement succeeded: HTTP 202. State: active.
*** Aborted at 1771486681 (unix time) try "date -d @1771486681" if you are using GNU date ***
PC: @                0x0 __folly_leaf_frame_store
*** SIGSEGV (@0x0) received by PID 95477 (TID 0x16c18b000) stack trace: ***
    @        0x19371b744 _sigtramp
    @        0x10450d70c facebook::presto::VeloxQueryPlanConverterBase::toVeloxQueryPlan()
    @        0x10450d70c facebook::presto::VeloxQueryPlanConverterBase::toVeloxQueryPlan()
    @        0x1047e9bc4 facebook::presto::prestoToVeloxPlanConversion()
    @        0x104a95104 facebook::presto::PrestoServer::registerSidecarEndpoints()::$_4::operator()()
    @        0x104a95090 _ZNSt3__18__invokeB8ne190102IRZN8facebook6presto12PrestoServer24registerSidecarEndpointsEvE3$_4JPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISC_EEEENS_9allocatorISF_EEEEPNS6_15ResponseHandlerEEEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOSM_DpOSN_
    @        0x104a95028 std::__1::__invoke_void_return_wrapper<>::__call<>()
    @        0x104a94fec _ZNSt3__110__function12__alloc_funcIZN8facebook6presto12PrestoServer24registerSidecarEndpointsEvE3$_4NS_9allocatorIS5_EEFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISE_EEEENS6_ISH_EEEEPNS8_15ResponseHandlerEEEclB8ne190102EOSA_SK_OSM_
    @        0x104a93e74 std::__1::__function::__func<>::operator()()
    @        0x104a1bd24 _ZNKSt3__110__function12__value_funcIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteIS8_EEEENS_9allocatorISB_EEEEPNS2_15ResponseHandlerEEEclB8ne190102EOS4_SF_OSH_
    @        0x104a1bcc4 std::__1::function<>::operator()()
    @        0x104a1bc20 _ZZN8facebook6presto4http22CallbackRequestHandler4wrapENSt3__18functionIFvPN8proxygen11HTTPMessageERNS3_6vectorINS3_10unique_ptrIN5folly5IOBufENS3_14default_deleteISB_EEEENS3_9allocatorISE_EEEEPNS5_15ResponseHandlerEEEEENKUlS7_SI_SK_NS3_10shared_ptrINS1_27CallbackRequestHandlerStateEEEE_clES7_SI_SK_SP_
    @        0x104a1bbb4 _ZNSt3__18__invokeB8ne190102IRZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISC_EEEENS_9allocatorISF_EEEEPNS6_15ResponseHandlerEEEEEUlS8_SJ_SL_NS_10shared_ptrINS3_27CallbackRequestHandlerStateEEEE_JS8_SJ_SL_SQ_EEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOST_DpOSU_
    @        0x104a1bb14 _ZNSt3__128__invoke_void_return_wrapperIvLb1EE6__callB8ne190102IJRZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISE_EEEENS_9allocatorISH_EEEEPNS8_15ResponseHandlerEEEEEUlSA_SL_SN_NS_10shared_ptrINS5_27CallbackRequestHandlerStateEEEE_SA_SL_SN_SS_EEEvDpOT_
    @        0x104a1bad0 _ZNSt3__110__function12__alloc_funcIZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISD_EEEENS_9allocatorISG_EEEEPNS7_15ResponseHandlerEEEEEUlS9_SK_SM_NS_10shared_ptrINS4_27CallbackRequestHandlerStateEEEE_NSH_ISS_EEFvS9_SK_SM_SR_EEclB8ne190102EOS9_SK_OSM_OSR_
    @        0x104a1a884 _ZNSt3__110__function6__funcIZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISD_EEEENS_9allocatorISG_EEEEPNS7_15ResponseHandlerEEEEEUlS9_SK_SM_NS_10shared_ptrINS4_27CallbackRequestHandlerStateEEEE_NSH_ISS_EEFvS9_SK_SM_SR_EEclEOS9_SK_OSM_OSR_
    @        0x104a1c3c4 _ZNKSt3__110__function12__value_funcIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteIS8_EEEENS_9allocatorISB_EEEEPNS2_15ResponseHandlerENS_10shared_ptrIN8facebook6presto4http27CallbackRequestHandlerStateEEEEEclB8ne190102EOS4_SF_OSH_OSN_
    @        0x104a1c328 std::__1::function<>::operator()()
    @        0x104a19ba4 facebook::presto::http::CallbackRequestHandler::onEOM()
    @        0x104cd41e4 facebook::presto::http::filters::InternalAuthenticationFilter::onEOM()
    @        0x104cd522c proxygen::RequestHandlerAdaptor::onEOM()
    @        0x104e704cc proxygen::HTTPTransaction::processIngressEOM()
    @        0x104e701d4 proxygen::HTTPTransaction::onIngressEOM()
    @        0x104e395b0 proxygen::HTTPSession::onMessageComplete()
    @        0x104db39e8 proxygen::PassThroughHTTPCodecFilter::onMessageComplete()
    @        0x104d850dc proxygen::HTTP1xCodec::onMessageComplete()
    @        0x104d7d95c proxygen::HTTP1xCodec::onMessageCompleteCB()
    @        0x104d434f4 proxygen::http_parser_execute_options()
    @        0x104d7e504 proxygen::HTTP1xCodec::onIngressImpl()
    @        0x104d7e178 proxygen::HTTP1xCodec::onIngress()
    @        0x104db440c proxygen::PassThroughHTTPCodecFilter::onIngress()
    @        0x104e32bac proxygen::HTTPSession::processReadData()

It is failing at registerSidecarEndpoints(). Have I missed out any configuration that is causing the sidecar registration to fail?

@anandamideShakyan

Copy link
Copy Markdown
Contributor Author

I was getting a SIGSEGV crash ("Error getting native plan checker response" in UI) when inserting into Hive bucketed unpartitioned tables with native sidecar enabled. After investigating with @pdabre12, we found that the native sidecar crashes at line 2228 in PrestoToVeloxQueryPlan.cpp when it tries to dereference partitioningScheme.bucketToPartition, which is null. The root cause is a design mismatch: the Java planner intentionally creates PartitioningScheme with bucketToPartition = Optional.empty() because it's runtime information that gets populated during the scheduling phase. However, sidecar validation happens BEFORE scheduling, so bucketToPartition is never populated when the plan reaches the sidecar. During normal execution (sidecar disabled), the scheduler populates this field before sending to workers, which is why the issue only occurs with sidecar enabled. The C++ code assumes this pointer is always non-null and crashes when it's not. We traced the flow from Java planner → SimplePlanFragment creation → sidecar validation → native conversion and confirmed that bucketToPartition remains null throughout the validation path, while it gets populated in the execution path that sidecar bypasses.


assertEquals(computeActual("SELECT * from " + tableName).getRowCount(), 0);

// make sure that we will get one file per bucket regardless of writer count configured

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you validating there is only one file ? You could use a query with $path hidden column for it. https://prestodb.io/docs/0.272/connector/hive.html#extra-hidden-columns

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$path only exposes files that contain rows. Since this test inserts only two records, multiple records could hash to the same bucket and empty bucket files would not be visible through $path. To verify that all 11 bucket files were created, we'd need to inspect the table location directly (or insert data guaranteed to populate every bucket).

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 99d12a5 to ce1f8d6 Compare April 26, 2026 21:04
@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from ce1f8d6 to 61b78a5 Compare May 18, 2026 14:11
@aditi-pandit

Copy link
Copy Markdown
Contributor

Velox based PR review.

@anandamideShakyan : Please fix the test issue found.

Summary
The PR moves core behavior in the right direction (enabling inserts into bucketed, unpartitioned Hive tables) and adds broad test coverage, but I found one new reliability issue in the new smoke tests.

Issues Found
🟡 Suggestion: presto-hive/src/test/java/com/facebook/presto/hive/TestHiveIntegrationSmokeTest.java (around testBucketedTable(...), added block near lines ~1000–1040 in this diff)
testBucketedTable uses a fixed table name ("test_bucketed_table") and does not use try/finally cleanup. Since this helper runs repeatedly across formats/settings, a mid-test failure can leak the table and cascade failures into subsequent invocations.
Suggested fix: use a unique table name per invocation (as done in testEmptyBucketedTable) and wrap the method body in try/finally with DROP TABLE IF EXISTS.

Positive Observations
Good functional change in HiveWriterFactory removing the hard blocker for bucketed-unpartitioned inserts.
Nice expansion of coverage across hive integration, product tests, and native tests.
Good cleanup/robustness improvement already applied in testEmptyBucketedTable (unique name + finally cleanup).

@anandamideShakyan

Copy link
Copy Markdown
Contributor Author

@aditi-pandit I have fixed the above test issue.

Another thing is that the insertion fails in native execution because C++ workers add .parquet extension to target file names (e.g., "000000_0_<'queryId'>.parquet") while Java's getFileExtension() returns empty string for PARQUET format, causing the coordinator to look for files without extensions (e.g., "000000_0_<'queryId'>"). The coordinator's string-based matching in HiveMetadata.computeFileNamesForMissingBuckets() fails when it receives file names with extensions from native workers, triggering a VerifyException.

2026-05-18T15:47:46.461-0600	ERROR	SplitRunner-5-503	com.facebook.presto.execution.executor.TaskExecutor	Error processing Split 20260518_214746_00004_gfxuf.0.0.0.0-0  (start = 1.52748596380708E8, wall = 4 ms, cpu = 1 ms, wait = 0 ms, calls = 2)
com.google.common.base.VerifyException
	at com.google.common.base.Verify.verify(Verify.java:102)
	at com.facebook.presto.hive.HiveMetadata.computeFileNamesForMissingBuckets(HiveMetadata.java:2064)
	at com.facebook.presto.hive.HiveMetadata.computePartitionUpdatesForMissingBuckets(HiveMetadata.java:2022)
	at com.facebook.presto.hive.HiveMetadata.finishInsertInternal(HiveMetadata.java:2211)
	at com.facebook.presto.hive.HiveMetadata.finishInsert(HiveMetadata.java:2190)
	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:497)
	at com.facebook.presto.metadata.MetadataManager.finishInsert(MetadataManager.java:1008)
	at com.facebook.presto.metadata.StatsRecordingMetadataManager.finishInsert(StatsRecordingMetadataManager.java:324)
	at com.facebook.presto.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3671)
	at com.facebook.presto.operator.TableFinishOperator.getOutput(TableFinishOperator.java:290)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:448)
	at com.facebook.presto.operator.Driver.lambda$processFor$11(Driver.java:331)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:757)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:324)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1078)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:619)
	at com.facebook.presto.$gen.Presto_null__testversion____20260518_214546_460.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)


2026-05-18T15:47:46.463-0600	ERROR	task-event-loop-6	com.facebook.presto.execution.StageExecutionStateMachine	Stage execution 20260518_214746_00004_gfxuf.0.0 failed
com.google.common.base.VerifyException
	at com.google.common.base.Verify.verify(Verify.java:102)
	at com.facebook.presto.hive.HiveMetadata.computeFileNamesForMissingBuckets(HiveMetadata.java:2064)
	at com.facebook.presto.hive.HiveMetadata.computePartitionUpdatesForMissingBuckets(HiveMetadata.java:2022)
	at com.facebook.presto.hive.HiveMetadata.finishInsertInternal(HiveMetadata.java:2211)
	at com.facebook.presto.hive.HiveMetadata.finishInsert(HiveMetadata.java:2190)
	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:497)
	at com.facebook.presto.metadata.MetadataManager.finishInsert(MetadataManager.java:1008)
	at com.facebook.presto.metadata.StatsRecordingMetadataManager.finishInsert(StatsRecordingMetadataManager.java:324)
	at com.facebook.presto.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3671)
	at com.facebook.presto.operator.TableFinishOperator.getOutput(TableFinishOperator.java:290)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:448)
	at com.facebook.presto.operator.Driver.lambda$processFor$11(Driver.java:331)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:757)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:324)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1078)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:619)
	at com.facebook.presto.$gen.Presto_null__testversion____20260518_214546_460.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

I've fixed this by changing the coordinator logic to extract and compare bucket numbers instead of exact file names, which works regardless of extension presence, but I'd like guidance on whether we should also align C++ to match Java's approach, or if the bucket-based matching solution is the preferred fix.

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from c2a68e7 to efa89ca Compare May 19, 2026 00:09
@aditi-pandit

Copy link
Copy Markdown
Contributor

@tdcmeehan : Please can you help review this PR.

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from efa89ca to d78bbe5 Compare May 30, 2026 23:08
@anandamideShakyan

Copy link
Copy Markdown
Contributor Author

I have added some checks to disable validation on connector-specific partitions which contains null buckettopartition value and fail validation. @tdcmeehan Please let me know if my changes are okay or if we can do it in a better way.

tdcmeehan
tdcmeehan previously approved these changes Jun 1, 2026
@aditi-pandit

Copy link
Copy Markdown
Contributor

@anandamideShakyan : Please can you add a release note and documentation for this issue.

@steveburnett

@steveburnett steveburnett left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add documentation for this to the Presto documentation. Perhaps here:

https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst

As I wrote in this comment:

"Consider adding an example of how to use this new ability, or at least a mention that this is now possible for users to do and why it's useful (as you wrote in the Description), to the documentation."

@steveburnett

steveburnett commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

@anandamideShakyan : Please can you add a release note and documentation for this issue.

@steveburnett

Thank you for adding the release note @anandamideShakyan!

@anandamideShakyan

Copy link
Copy Markdown
Contributor Author

@steveburnett I have added the documentation. PTAL and let me know if anything else is needed. Thanks.

@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 0b1c11d to e36e425 Compare June 2, 2026 22:05
steveburnett
steveburnett previously approved these changes Jun 3, 2026

@steveburnett steveburnett left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

Comment thread presto-docs/src/main/sphinx/connector/hive.rst Outdated
steveburnett
steveburnett previously approved these changes Jun 3, 2026

@steveburnett steveburnett left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Documentation removed per discussion.

tdcmeehan
tdcmeehan previously approved these changes Jun 3, 2026
@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from cefdf12 to b1183b2 Compare June 3, 2026 21:29
@anandamideShakyan anandamideShakyan dismissed stale reviews from steveburnett and tdcmeehan via 5a63db2 June 4, 2026 02:33
@anandamideShakyan anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 5a63db2 to b1183b2 Compare June 4, 2026 02:34
@anandamideShakyan anandamideShakyan merged commit fde4ee0 into prestodb:master Jun 5, 2026
300 of 315 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants