feat(native): Insert into bucketed but unpartitioned Hive table by anandamideShakyan · Pull Request #25139 · prestodb/presto

anandamideShakyan · 2025-05-18T11:30:21Z

Description

Addresses #25104
Currently, Presto does not support INSERT INTO operations on bucketed but unpartitioned Hive tables. This limitation originates from a hard check in HiveWriterFactory:

https://github.com/prestodb/presto/blob/master/presto-hive/src/main/java/com/facebook/presto/hive/HiveWriterFactory.java#L480

Motivation and Context

Supporting writes to bucketed unpartitioned Hive tables in Presto would improve compatibility and enhance Presto’s ability to handle modern Hive table layouts. It's a reasonable and useful feature for users who wish to leverage bucketing for performance optimizations even without partitioning.

Impact

This change would align Presto’s behavior with the broader SQL-on-Hadoop ecosystem and remove an artificial limitation that may block valid use cases — particularly in data warehousing environments where bucketing is used independently of partitioning.

Release Notes

== RELEASE NOTES ==

Hive Connector Changes

* Add support for INSERT into bucketed but unpartitioned Hive tables in Hive, including follow-up fixes for native validation and insert handling.

aditi-pandit · 2025-05-20T21:55:06Z

@anandamideShakyan : Thanks for this PR.

Have you tried this functionality with Prestissimo ? You might need facebookincubator/velox#13283 as well for it.

anandamideShakyan · 2025-05-22T08:44:11Z

@aditi-pandit Sure I will add the support in Prestissimo after facebookincubator/velox#13283 is merged.

aditi-pandit · 2025-05-22T19:04:13Z

@anandamideShakyan : Ther are failures in product tests. PTAL.

2025-05-18 19:49:10 INFO: [78 of 435] com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: )
2025-05-18 19:49:11 INFO: FAILURE     /    com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables (Groups: ) took 1.1 seconds
2025-05-18 19:49:11 SEVERE: Failure cause:
java.lang.IllegalArgumentException: No mutable table instance found for name TableHandle{name=bucket_nation}
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:64)
	at io.prestodb.tempto.fulfillment.table.TablesState.get(TablesState.java:48)
	at com.facebook.presto.tests.hive.TestHiveBucketedTables.testInsertIntoBucketedTables(TestHiveBucketedTables.java:173)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

anandamideShakyan · 2025-06-18T07:40:55Z

@anandamideShakyan : Thanks for this PR.

Have you tried this functionality with Prestissimo ? You might need facebookincubator/velox#13283 as well for it.

I tried it on Prestissimo, with one coordinator and one worker. I created a table in hive schema and tpcds catalog using:

CREATE TABLE cars (
    id BIGINT,
    name VARCHAR,
    brand VARCHAR
)
WITH (
    format = 'PARQUET',
    bucketed_by = ARRAY['id'],
    bucket_count = 4
);

Inserted values:

INSERT INTO cars (id, name, brand) VALUES
  (1, 'Model S', 'Tesla'),
  (2, 'Civic', 'Honda'),
  (3, 'Mustang', 'Ford'),
  (4, 'A4', 'Audi');

Was able to see the entries on running the select query:

pramodsatya · 2025-06-25T16:49:20Z

+import static java.lang.Boolean.parseBoolean;
+import static org.testng.Assert.assertEquals;
+
+public class TestHivePartitionedInsertNative


Could we move these testcases to presto-tests or presto-product-tests? Ideally, we don't want to add new testcases to presto-native-tests, instead we should just extend the existing e2e tests (such as the ones added to presto-product-tests in this PR) to run with with the native query runner.

steveburnett · 2025-06-30T13:55:08Z

Consider adding an example of how to use this new ability, or at least a mention that this is now possible for users to do and why it's useful (as you wrote in the Description), to the documentation.

aditi-pandit · 2026-02-02T16:29:49Z

@anandamideShakyan : It will be good to complete this work as it has been a long pending item. Please can you take a look at the failures.

anandamideShakyan · 2026-02-06T14:45:35Z

Inserts into bucketed Hive tables using the C++ (Velox) worker were failing during finishInsert with:

VerifyException: computeFileNamesForMissingBuckets

This happens because Presto’s Hive metadata layer assumes exactly one file per bucket per partition.
If any bucket does not produce a file, Presto attempts to synthesize “missing bucket” files during commit.

The Java worker never hits this path because it always creates one file per bucket, even when a bucket receives zero rows.

The Velox (C++) HiveDataSink, however, only created writers for buckets that actually received rows. When a bucket was empty, no writer → no file, causing Presto to think the bucket was missing and fail verification.

This is why inserts succeeded when data happened to hit all buckets, and failed otherwise.

Fix

The fix ensures that Velox creates one writer (and therefore one output file) per bucket, matching Java worker behavior and Presto’s expectations.

Specifically:

During HiveDataSink::splitInputRowsAndEnsureWriters(), we now pre-create writers for all buckets (for each partition, if partitioned).

This guarantees that every bucket produces exactly one file, even if it contains zero rows.

As a result, computeFileNamesForMissingBuckets() is never triggered and finishInsert succeeds.

To Do

This is a Velox-side fix (C++ worker behavior).
The original PR is in Presto, but the correct fix belongs in Velox, so a separate Velox PR is required. Will create velox PR soon.
This change aligns C++ worker semantics with Java worker semantics and Hive’s bucketing contract.

With this fix locally, I am able to insert into bucketed hive tables with and without sidecar. I am now looking at resolving the unit test failure that came after these changes : #25115

aditi-pandit · 2026-02-06T23:14:52Z

@anandamideShakyan : Presto has a property hive.create-empty-bucket-files to control whether to create empty bucket files. Seems like this should always be false for native engine.

But in any case, doesn't Presto server create the missing buckets on the co-ordinator in the TableFinish logic and not in the worker ? I feel it should be on co-ordinator in TableFinish as its only after seeing all the worker files should we know which buckets are empty. The individual worker cannot make this decision.

This error seems like a local problem between Hive and Presto on the co-ordinator.

Please can you recheck if something else is missing.

anandamideShakyan · 2026-02-19T08:01:39Z

@aditi-pandit You were right, I am able to run insert queries successfully when I set hive.create-empty-bucket-files=false. I am getting this errror while running the tests in the queryrunner:

java.lang.RuntimeException: I/O error getting native plan checker response

	at com.facebook.presto.tests.AbstractTestingPrestoClient.execute(AbstractTestingPrestoClient.java:127)
	at com.facebook.presto.tests.AbstractTestingPrestoClient.execute(AbstractTestingPrestoClient.java:92)
	at com.facebook.presto.tests.DistributedQueryRunner.execute(DistributedQueryRunner.java:894)
	at com.facebook.presto.tests.DistributedQueryRunner.execute(DistributedQueryRunner.java:868)
	at com.facebook.presto.nativetests.TestHivePartitionedInsertNative.testInsertIntoBucketedTables(TestHivePartitionedInsertNative.java:92)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
	at org.testng.internal.invokers.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:135)
	at org.testng.internal.invokers.TestInvoker.invokeMethod(TestInvoker.java:673)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethod(TestInvoker.java:220)
	at org.testng.internal.invokers.MethodRunner.runInSequence(MethodRunner.java:50)
	at org.testng.internal.invokers.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:945)
	at org.testng.internal.invokers.TestInvoker.invokeTestMethods(TestInvoker.java:193)
	at org.testng.internal.invokers.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146)
	at org.testng.internal.invokers.TestMethodWorker.run(TestMethodWorker.java:128)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
	at org.testng.TestRunner.privateRun(TestRunner.java:808)
	at org.testng.TestRunner.run(TestRunner.java:603)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:429)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:423)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:383)
	at org.testng.SuiteRunner.run(SuiteRunner.java:326)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:95)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1249)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1169)
	at org.testng.TestNG.runSuites(TestNG.java:1092)
	at org.testng.TestNG.run(TestNG.java:1060)
	at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:65)
	at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:105)
Caused by: com.facebook.presto.spi.PrestoException: I/O error getting native plan checker response
	at com.facebook.presto.sidecar.nativechecker.NativePlanChecker.runValidation(NativePlanChecker.java:136)
	at com.facebook.presto.sidecar.nativechecker.NativePlanChecker.validateFragment(NativePlanChecker.java:92)

I checked the sidecar worker startup logs:

I20260219 13:07:33.486436 1918031 PrestoServer.cpp:690] [PRESTO_STARTUP] Server listening at :::52393 - https false
I20260219 13:07:33.486797 1918322 CoordinatorDiscoverer.cpp:44] Coordinator address changed to 127.0.0.1:52378
I20260219 13:07:33.486809 1918322 PeriodicServiceInventoryManager.cpp:80] Service Inventory changed to 127.0.0.1:52378
I20260219 13:07:33.489058 1918322 PeriodicServiceInventoryManager.cpp:126] Announcement succeeded: HTTP 202. State: active.
*** Aborted at 1771486681 (unix time) try "date -d @1771486681" if you are using GNU date ***
PC: @                0x0 __folly_leaf_frame_store
*** SIGSEGV (@0x0) received by PID 95477 (TID 0x16c18b000) stack trace: ***
    @        0x19371b744 _sigtramp
    @        0x10450d70c facebook::presto::VeloxQueryPlanConverterBase::toVeloxQueryPlan()
    @        0x10450d70c facebook::presto::VeloxQueryPlanConverterBase::toVeloxQueryPlan()
    @        0x1047e9bc4 facebook::presto::prestoToVeloxPlanConversion()
    @        0x104a95104 facebook::presto::PrestoServer::registerSidecarEndpoints()::$_4::operator()()
    @        0x104a95090 _ZNSt3__18__invokeB8ne190102IRZN8facebook6presto12PrestoServer24registerSidecarEndpointsEvE3$_4JPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISC_EEEENS_9allocatorISF_EEEEPNS6_15ResponseHandlerEEEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOSM_DpOSN_
    @        0x104a95028 std::__1::__invoke_void_return_wrapper<>::__call<>()
    @        0x104a94fec _ZNSt3__110__function12__alloc_funcIZN8facebook6presto12PrestoServer24registerSidecarEndpointsEvE3$_4NS_9allocatorIS5_EEFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISE_EEEENS6_ISH_EEEEPNS8_15ResponseHandlerEEEclB8ne190102EOSA_SK_OSM_
    @        0x104a93e74 std::__1::__function::__func<>::operator()()
    @        0x104a1bd24 _ZNKSt3__110__function12__value_funcIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteIS8_EEEENS_9allocatorISB_EEEEPNS2_15ResponseHandlerEEEclB8ne190102EOS4_SF_OSH_
    @        0x104a1bcc4 std::__1::function<>::operator()()
    @        0x104a1bc20 _ZZN8facebook6presto4http22CallbackRequestHandler4wrapENSt3__18functionIFvPN8proxygen11HTTPMessageERNS3_6vectorINS3_10unique_ptrIN5folly5IOBufENS3_14default_deleteISB_EEEENS3_9allocatorISE_EEEEPNS5_15ResponseHandlerEEEEENKUlS7_SI_SK_NS3_10shared_ptrINS1_27CallbackRequestHandlerStateEEEE_clES7_SI_SK_SP_
    @        0x104a1bbb4 _ZNSt3__18__invokeB8ne190102IRZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISC_EEEENS_9allocatorISF_EEEEPNS6_15ResponseHandlerEEEEEUlS8_SJ_SL_NS_10shared_ptrINS3_27CallbackRequestHandlerStateEEEE_JS8_SJ_SL_SQ_EEEDTclclsr3stdE7declvalIT_EEspclsr3stdE7declvalIT0_EEEEOST_DpOSU_
    @        0x104a1bb14 _ZNSt3__128__invoke_void_return_wrapperIvLb1EE6__callB8ne190102IJRZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISE_EEEENS_9allocatorISH_EEEEPNS8_15ResponseHandlerEEEEEUlSA_SL_SN_NS_10shared_ptrINS5_27CallbackRequestHandlerStateEEEE_SA_SL_SN_SS_EEEvDpOT_
    @        0x104a1bad0 _ZNSt3__110__function12__alloc_funcIZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISD_EEEENS_9allocatorISG_EEEEPNS7_15ResponseHandlerEEEEEUlS9_SK_SM_NS_10shared_ptrINS4_27CallbackRequestHandlerStateEEEE_NSH_ISS_EEFvS9_SK_SM_SR_EEclB8ne190102EOS9_SK_OSM_OSR_
    @        0x104a1a884 _ZNSt3__110__function6__funcIZN8facebook6presto4http22CallbackRequestHandler4wrapENS_8functionIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteISD_EEEENS_9allocatorISG_EEEEPNS7_15ResponseHandlerEEEEEUlS9_SK_SM_NS_10shared_ptrINS4_27CallbackRequestHandlerStateEEEE_NSH_ISS_EEFvS9_SK_SM_SR_EEclEOS9_SK_OSM_OSR_
    @        0x104a1c3c4 _ZNKSt3__110__function12__value_funcIFvPN8proxygen11HTTPMessageERNS_6vectorINS_10unique_ptrIN5folly5IOBufENS_14default_deleteIS8_EEEENS_9allocatorISB_EEEEPNS2_15ResponseHandlerENS_10shared_ptrIN8facebook6presto4http27CallbackRequestHandlerStateEEEEEclB8ne190102EOS4_SF_OSH_OSN_
    @        0x104a1c328 std::__1::function<>::operator()()
    @        0x104a19ba4 facebook::presto::http::CallbackRequestHandler::onEOM()
    @        0x104cd41e4 facebook::presto::http::filters::InternalAuthenticationFilter::onEOM()
    @        0x104cd522c proxygen::RequestHandlerAdaptor::onEOM()
    @        0x104e704cc proxygen::HTTPTransaction::processIngressEOM()
    @        0x104e701d4 proxygen::HTTPTransaction::onIngressEOM()
    @        0x104e395b0 proxygen::HTTPSession::onMessageComplete()
    @        0x104db39e8 proxygen::PassThroughHTTPCodecFilter::onMessageComplete()
    @        0x104d850dc proxygen::HTTP1xCodec::onMessageComplete()
    @        0x104d7d95c proxygen::HTTP1xCodec::onMessageCompleteCB()
    @        0x104d434f4 proxygen::http_parser_execute_options()
    @        0x104d7e504 proxygen::HTTP1xCodec::onIngressImpl()
    @        0x104d7e178 proxygen::HTTP1xCodec::onIngress()
    @        0x104db440c proxygen::PassThroughHTTPCodecFilter::onIngress()
    @        0x104e32bac proxygen::HTTPSession::processReadData()

It is failing at registerSidecarEndpoints(). Have I missed out any configuration that is causing the sidecar registration to fail?

anandamideShakyan · 2026-04-01T00:09:06Z

I was getting a SIGSEGV crash ("Error getting native plan checker response" in UI) when inserting into Hive bucketed unpartitioned tables with native sidecar enabled. After investigating with @pdabre12, we found that the native sidecar crashes at line 2228 in PrestoToVeloxQueryPlan.cpp when it tries to dereference partitioningScheme.bucketToPartition, which is null. The root cause is a design mismatch: the Java planner intentionally creates PartitioningScheme with bucketToPartition = Optional.empty() because it's runtime information that gets populated during the scheduling phase. However, sidecar validation happens BEFORE scheduling, so bucketToPartition is never populated when the plan reaches the sidecar. During normal execution (sidecar disabled), the scheduler populates this field before sending to workers, which is why the issue only occurs with sidecar enabled. The C++ code assumes this pointer is always non-null and crashes when it's not. We traced the flow from Java planner → SimplePlanFragment creation → sidecar validation → native conversion and confirmed that bucketToPartition remains null throughout the validation path, while it gets populated in the execution path that sidecar bypasses.

aditi-pandit · 2026-04-17T16:50:59Z

+
+            assertEquals(computeActual("SELECT * from " + tableName).getRowCount(), 0);
+
+            // make sure that we will get one file per bucket regardless of writer count configured


How are you validating there is only one file ? You could use a query with $path hidden column for it. https://prestodb.io/docs/0.272/connector/hive.html#extra-hidden-columns

$path only exposes files that contain rows. Since this test inserts only two records, multiple records could hash to the same bucket and empty bucket files would not be visible through $path. To verify that all 11 bucket files were created, we'd need to inspect the table location directly (or insert data guaranteed to populate every bucket).

aditi-pandit · 2026-05-18T17:31:46Z

Velox based PR review.

@anandamideShakyan : Please fix the test issue found.

Summary
The PR moves core behavior in the right direction (enabling inserts into bucketed, unpartitioned Hive tables) and adds broad test coverage, but I found one new reliability issue in the new smoke tests.

Issues Found
🟡 Suggestion: presto-hive/src/test/java/com/facebook/presto/hive/TestHiveIntegrationSmokeTest.java (around testBucketedTable(...), added block near lines ~1000–1040 in this diff)
testBucketedTable uses a fixed table name ("test_bucketed_table") and does not use try/finally cleanup. Since this helper runs repeatedly across formats/settings, a mid-test failure can leak the table and cascade failures into subsequent invocations.
Suggested fix: use a unique table name per invocation (as done in testEmptyBucketedTable) and wrap the method body in try/finally with DROP TABLE IF EXISTS.

Positive Observations
Good functional change in HiveWriterFactory removing the hard blocker for bucketed-unpartitioned inserts.
Nice expansion of coverage across hive integration, product tests, and native tests.
Good cleanup/robustness improvement already applied in testEmptyBucketedTable (unique name + finally cleanup).

anandamideShakyan · 2026-05-19T00:06:22Z

@aditi-pandit I have fixed the above test issue.

Another thing is that the insertion fails in native execution because C++ workers add .parquet extension to target file names (e.g., "000000_0_<'queryId'>.parquet") while Java's getFileExtension() returns empty string for PARQUET format, causing the coordinator to look for files without extensions (e.g., "000000_0_<'queryId'>"). The coordinator's string-based matching in HiveMetadata.computeFileNamesForMissingBuckets() fails when it receives file names with extensions from native workers, triggering a VerifyException.

2026-05-18T15:47:46.461-0600	ERROR	SplitRunner-5-503	com.facebook.presto.execution.executor.TaskExecutor	Error processing Split 20260518_214746_00004_gfxuf.0.0.0.0-0  (start = 1.52748596380708E8, wall = 4 ms, cpu = 1 ms, wait = 0 ms, calls = 2)
com.google.common.base.VerifyException
	at com.google.common.base.Verify.verify(Verify.java:102)
	at com.facebook.presto.hive.HiveMetadata.computeFileNamesForMissingBuckets(HiveMetadata.java:2064)
	at com.facebook.presto.hive.HiveMetadata.computePartitionUpdatesForMissingBuckets(HiveMetadata.java:2022)
	at com.facebook.presto.hive.HiveMetadata.finishInsertInternal(HiveMetadata.java:2211)
	at com.facebook.presto.hive.HiveMetadata.finishInsert(HiveMetadata.java:2190)
	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:497)
	at com.facebook.presto.metadata.MetadataManager.finishInsert(MetadataManager.java:1008)
	at com.facebook.presto.metadata.StatsRecordingMetadataManager.finishInsert(StatsRecordingMetadataManager.java:324)
	at com.facebook.presto.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3671)
	at com.facebook.presto.operator.TableFinishOperator.getOutput(TableFinishOperator.java:290)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:448)
	at com.facebook.presto.operator.Driver.lambda$processFor$11(Driver.java:331)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:757)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:324)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1078)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:619)
	at com.facebook.presto.$gen.Presto_null__testversion____20260518_214546_460.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)


2026-05-18T15:47:46.463-0600	ERROR	task-event-loop-6	com.facebook.presto.execution.StageExecutionStateMachine	Stage execution 20260518_214746_00004_gfxuf.0.0 failed
com.google.common.base.VerifyException
	at com.google.common.base.Verify.verify(Verify.java:102)
	at com.facebook.presto.hive.HiveMetadata.computeFileNamesForMissingBuckets(HiveMetadata.java:2064)
	at com.facebook.presto.hive.HiveMetadata.computePartitionUpdatesForMissingBuckets(HiveMetadata.java:2022)
	at com.facebook.presto.hive.HiveMetadata.finishInsertInternal(HiveMetadata.java:2211)
	at com.facebook.presto.hive.HiveMetadata.finishInsert(HiveMetadata.java:2190)
	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:497)
	at com.facebook.presto.metadata.MetadataManager.finishInsert(MetadataManager.java:1008)
	at com.facebook.presto.metadata.StatsRecordingMetadataManager.finishInsert(StatsRecordingMetadataManager.java:324)
	at com.facebook.presto.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3671)
	at com.facebook.presto.operator.TableFinishOperator.getOutput(TableFinishOperator.java:290)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:448)
	at com.facebook.presto.operator.Driver.lambda$processFor$11(Driver.java:331)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:757)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:324)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1078)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:619)
	at com.facebook.presto.$gen.Presto_null__testversion____20260518_214546_460.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

I've fixed this by changing the coordinator logic to extract and compare bucket numbers instead of exact file names, which works regardless of extension presence, but I'd like guidance on whether we should also align C++ to match Java's approach, or if the bucket-based matching solution is the preferred fix.

aditi-pandit · 2026-05-19T22:06:09Z

@tdcmeehan : Please can you help review this PR.

anandamideShakyan · 2026-05-31T13:18:38Z

I have added some checks to disable validation on connector-specific partitions which contains null buckettopartition value and fail validation. @tdcmeehan Please let me know if my changes are okay or if we can do it in a better way.

aditi-pandit · 2026-06-02T15:24:01Z

@anandamideShakyan : Please can you add a release note and documentation for this issue.

@steveburnett

steveburnett

Please add documentation for this to the Presto documentation. Perhaps here:

https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector/hive.rst

As I wrote in this comment:

"Consider adding an example of how to use this new ability, or at least a mention that this is now possible for users to do and why it's useful (as you wrote in the Description), to the documentation."

steveburnett · 2026-06-02T20:26:02Z

@anandamideShakyan : Please can you add a release note and documentation for this issue.

@steveburnett

Thank you for adding the release note @anandamideShakyan!

anandamideShakyan · 2026-06-02T22:03:31Z

@steveburnett I have added the documentation. PTAL and let me know if anything else is needed. Thanks.

steveburnett

LGTM! (docs)

Pull branch, local doc build, looks good. Thanks!

steveburnett

LGTM! (docs)

Documentation removed per discussion.

prestodb-ci added the from:IBM PR from IBM label May 18, 2025

anandamideShakyan mentioned this pull request May 18, 2025

Insert into bucketed but unpartitioned Hive table #25104

Closed

anandamideShakyan marked this pull request as ready for review May 18, 2025 16:29

anandamideShakyan requested a review from a team as a code owner May 18, 2025 16:29

anandamideShakyan requested a review from jaystarshot May 18, 2025 16:29

prestodb-ci requested review from a team, namya28 and pramodsatya and removed request for a team May 18, 2025 16:29

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch 4 times, most recently from 0e437dd to 38805a8 Compare May 26, 2025 07:21

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 38805a8 to 817f7df Compare June 25, 2025 07:33

anandamideShakyan requested a review from a team as a code owner June 25, 2025 07:33

pramodsatya reviewed Jun 25, 2025

View reviewed changes

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 817f7df to ed26ecc Compare June 27, 2025 22:41

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from ed26ecc to e8591d3 Compare January 31, 2026 11:15

anandamideShakyan changed the title ~~Insert into bucketed but unpartitioned Hive table~~ feat(native): Insert into bucketed but unpartitioned Hive table Jan 31, 2026

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from e8591d3 to f68fe3d Compare January 31, 2026 12:36

aditi-pandit reviewed Apr 17, 2026

View reviewed changes

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 99d12a5 to ce1f8d6 Compare April 26, 2026 21:04

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from ce1f8d6 to 61b78a5 Compare May 18, 2026 14:11

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from c2a68e7 to efa89ca Compare May 19, 2026 00:09

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from efa89ca to d78bbe5 Compare May 30, 2026 23:08

tdcmeehan previously approved these changes Jun 1, 2026

View reviewed changes

anandamideShakyan requested a review from steveburnett June 2, 2026 20:15

steveburnett requested changes Jun 2, 2026

View reviewed changes

anandamideShakyan dismissed tdcmeehan’s stale review via 0b1c11d June 2, 2026 22:01

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 0b1c11d to e36e425 Compare June 2, 2026 22:05

steveburnett previously approved these changes Jun 3, 2026

View reviewed changes

tdcmeehan requested changes Jun 3, 2026

View reviewed changes

Comment thread presto-docs/src/main/sphinx/connector/hive.rst Outdated

anandamideShakyan dismissed steveburnett’s stale review via bab5bba June 3, 2026 16:35

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from bab5bba to cefdf12 Compare June 3, 2026 16:37

steveburnett previously approved these changes Jun 3, 2026

View reviewed changes

tdcmeehan previously approved these changes Jun 3, 2026

View reviewed changes

Insert into bucketed but unpartitioned Hive table

b1183b2

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from cefdf12 to b1183b2 Compare June 3, 2026 21:29

anandamideShakyan dismissed stale reviews from steveburnett and tdcmeehan via 5a63db2 June 4, 2026 02:33

anandamideShakyan force-pushed the insert-bucketed-unpar-hive branch from 5a63db2 to b1183b2 Compare June 4, 2026 02:34

tdcmeehan approved these changes Jun 4, 2026

View reviewed changes

anandamideShakyan merged commit fde4ee0 into prestodb:master Jun 5, 2026
300 of 315 checks passed


		assertEquals(computeActual("SELECT * from " + tableName).getRowCount(), 0);

		// make sure that we will get one file per bucket regardless of writer count configured

Conversation

anandamideShakyan commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Release Notes

Uh oh!

aditi-pandit commented May 20, 2025

Uh oh!

anandamideShakyan commented May 22, 2025

Uh oh!

aditi-pandit commented May 22, 2025

Uh oh!

anandamideShakyan commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pramodsatya Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

steveburnett commented Jun 30, 2025

Uh oh!

aditi-pandit commented Feb 2, 2026

Uh oh!

anandamideShakyan commented Feb 6, 2026

Uh oh!

aditi-pandit commented Feb 6, 2026

Uh oh!

anandamideShakyan commented Feb 19, 2026

Uh oh!

anandamideShakyan commented Apr 1, 2026

Uh oh!

aditi-pandit Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

anandamideShakyan May 30, 2026

Choose a reason for hiding this comment

Uh oh!

aditi-pandit commented May 18, 2026

Uh oh!

anandamideShakyan commented May 19, 2026

Uh oh!

aditi-pandit commented May 19, 2026

Uh oh!

anandamideShakyan commented May 31, 2026

Uh oh!

aditi-pandit commented Jun 2, 2026

Uh oh!

steveburnett left a comment

Choose a reason for hiding this comment

Uh oh!

steveburnett commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anandamideShakyan commented Jun 2, 2026

Uh oh!

steveburnett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

steveburnett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

anandamideShakyan commented May 18, 2025 •

edited

Loading

anandamideShakyan commented Jun 18, 2025 •

edited

Loading

steveburnett commented Jun 2, 2026 •

edited

Loading