fix(plugin-hive): Write table statistics inline during CREATE TABLE with NOT NULL constraints#27687
fix(plugin-hive): Write table statistics inline during CREATE TABLE with NOT NULL constraints#27687mehradpk wants to merge 1 commit intoprestodb:masterfrom
Conversation
… NULL constraints HMS registers NOT NULL constraint via an internal alter_table call that invalidates the thrift connection state. The deferred UpdateStatisticsOperation then fails with TableNotFoundException when attempting to read statistics on the invalidated connection, causing the transaction to roll back. Move statistics initialization into CreateTableOperation.run() immediately after createTable(), before any NOT NULL constraint registration can invalidate the connection state.
Reviewer's GuideThis PR changes Hive table creation so that initial table statistics are written inline within CreateTableOperation, on the same metastore connection as createTable(), instead of being deferred to a separate UpdateStatisticsOperation, preventing Hive Metastore connection invalidation when NOT NULL constraints trigger internal alter_table calls. Sequence diagram for inline table statistics write during Hive CREATE TABLEsequenceDiagram
actor Client
participant PrestoCoordinator
participant SemiTransactionalHiveMetastore
participant CreateTableOperation
participant ExtendedHiveMetastore
Client->>PrestoCoordinator: SQL CREATE TABLE ... NOT NULL ...
PrestoCoordinator->>SemiTransactionalHiveMetastore: prepareAddTable()
SemiTransactionalHiveMetastore->>CreateTableOperation: new CreateTableOperation(metastoreContext, newTable, privileges, ignoreExisting, constraints, statisticsUpdate)
SemiTransactionalHiveMetastore->>ExtendedHiveMetastore: addTableOperations
PrestoCoordinator->>SemiTransactionalHiveMetastore: commit()
SemiTransactionalHiveMetastore->>CreateTableOperation: run(metastore)
CreateTableOperation->>ExtendedHiveMetastore: createTable(metastoreContext, newTable, privileges, constraints)
ExtendedHiveMetastore-->>CreateTableOperation: MetastoreOperationResult
CreateTableOperation->>ExtendedHiveMetastore: updateTableStatistics(metastoreContext, dbName, tableName, statisticsUpdate)
ExtendedHiveMetastore-->>CreateTableOperation: statistics updated
CreateTableOperation-->>SemiTransactionalHiveMetastore: operationResult
SemiTransactionalHiveMetastore-->>PrestoCoordinator: CREATE TABLE committed
PrestoCoordinator-->>Client: CREATE TABLE
Updated class diagram for SemiTransactionalHiveMetastore and CreateTableOperationclassDiagram
class SemiTransactionalHiveMetastore {
- List addTableOperations
- List updateStatisticsOperations
+ void prepareAddTable(MetastoreContext metastoreContext, HdfsContext context, TableAndMore tableAndMore)
+ void commit()
}
class TableAndMore {
+ Table getTable()
+ PrincipalPrivileges getPrincipalPrivileges()
+ boolean isIgnoreExisting()
+ List~TableConstraint~ getConstraints()
+ PartitionStatistics getStatisticsUpdate()
}
class CreateTableOperation {
- Table newTable
- PrincipalPrivileges privileges
- boolean ignoreExisting
- List~TableConstraint~ constraints
- String queryId
- MetastoreContext metastoreContext
- Optional~MetastoreOperationResult~ operationResult
- PartitionStatistics statisticsUpdate
+ CreateTableOperation(MetastoreContext metastoreContext, Table newTable, PrincipalPrivileges privileges, boolean ignoreExisting, List~TableConstraint~ constraints, PartitionStatistics statisticsUpdate)
+ String getDescription()
+ void run(ExtendedHiveMetastore metastore)
+ void undo(ExtendedHiveMetastore metastore)
}
class ExtendedHiveMetastore {
+ MetastoreOperationResult createTable(MetastoreContext metastoreContext, Table newTable, PrincipalPrivileges privileges, List~TableConstraint~ constraints)
+ void updateTableStatistics(MetastoreContext metastoreContext, String databaseName, String tableName, PartitionStatistics statisticsUpdate)
}
SemiTransactionalHiveMetastore "1" o-- "*" CreateTableOperation : addTableOperations
SemiTransactionalHiveMetastore "1" o-- "*" TableAndMore
CreateTableOperation --> ExtendedHiveMetastore : uses
CreateTableOperation --> Table : newTable
CreateTableOperation --> PrincipalPrivileges : privileges
CreateTableOperation --> PartitionStatistics : statisticsUpdate
CreateTableOperation --> MetastoreContext : metastoreContext
CreateTableOperation --> TableConstraint : constraints
SemiTransactionalHiveMetastore --> ExtendedHiveMetastore : commit operations
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Previously
UpdateStatisticsOperationwas only enqueued for non-view tables, butCreateTableOperation.run()now unconditionally callsupdateTableStatistics; consider guarding this call (e.g., skipping for views) to preserve the prior behavior and avoid potential failures when creating Presto views. - The inline
updateTableStatisticsnow runs as part ofCreateTableOperation.run()and will cause the whole create to fail if statistics update throws; if the old deferred behavior tolerated stats failures without aborting table creation, you may want to explicitly handle exceptions here to keep that failure mode consistent.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Previously `UpdateStatisticsOperation` was only enqueued for non-view tables, but `CreateTableOperation.run()` now unconditionally calls `updateTableStatistics`; consider guarding this call (e.g., skipping for views) to preserve the prior behavior and avoid potential failures when creating Presto views.
- The inline `updateTableStatistics` now runs as part of `CreateTableOperation.run()` and will cause the whole create to fail if statistics update throws; if the old deferred behavior tolerated stats failures without aborting table creation, you may want to explicitly handle exceptions here to keep that failure mode consistent.
## Individual Comments
### Comment 1
<location path="presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java" line_range="1418-1427" />
<code_context>
- tableAndMore.getStatisticsUpdate(),
- false));
- }
+ // Statistics are written inline by CreateTableOperation to avoid a race with constraint registration.
+ // HMS constraint additions trigger an internal alter_table call that can invalidate the thrift
+ // connection before a deferred UpdateStatisticsOperation runs.
+ addTableOperations.add(new CreateTableOperation(
+ metastoreContext,
+ table,
+ tableAndMore.getPrincipalPrivileges(),
+ tableAndMore.isIgnoreExisting(),
+ tableAndMore.getConstraints(),
+ tableAndMore.getStatisticsUpdate()));
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Restoring the `isPrestoView` guard may be needed to avoid writing stats for views
Previously, stats updates were only scheduled when `!isPrestoView(table)`. With this change, `CreateTableOperation` is always given `tableAndMore.getStatisticsUpdate()`, and `run()` will always call `updateTableStatistics`, changing behavior for views. If `statisticsUpdate` can be non-empty/non-null for views, we’ll now try to write table-level stats for them, which may have been intentionally avoided (per the original guard) and could cause extra metastore traffic or failures. Consider either restoring an `isPrestoView` check here or guaranteeing that `statisticsUpdate` is always a no-op for views before passing it to `CreateTableOperation`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Statistics are written inline by CreateTableOperation to avoid a race with constraint registration. | ||
| // HMS constraint additions trigger an internal alter_table call that can invalidate the thrift | ||
| // connection before a deferred UpdateStatisticsOperation runs. | ||
| addTableOperations.add(new CreateTableOperation( | ||
| metastoreContext, | ||
| table, | ||
| tableAndMore.getPrincipalPrivileges(), | ||
| tableAndMore.isIgnoreExisting(), | ||
| tableAndMore.getConstraints(), | ||
| tableAndMore.getStatisticsUpdate())); |
There was a problem hiding this comment.
issue (bug_risk): Restoring the isPrestoView guard may be needed to avoid writing stats for views
Previously, stats updates were only scheduled when !isPrestoView(table). With this change, CreateTableOperation is always given tableAndMore.getStatisticsUpdate(), and run() will always call updateTableStatistics, changing behavior for views. If statisticsUpdate can be non-empty/non-null for views, we’ll now try to write table-level stats for them, which may have been intentionally avoided (per the original guard) and could cause extra metastore traffic or failures. Consider either restoring an isPrestoView check here or guaranteeing that statisticsUpdate is always a no-op for views before passing it to CreateTableOperation.
Description
CREATE TABLE
withNOT NULLconstraints fails withTableNotFoundException` on the Hive connector.Root cause: HMS registers NOT NULL constraints via an internal
alter_tablecall immediately after table creation. This invalidates the thrift connection state. The deferredUpdateStatisticsOperationthen attempts to read statistics on the invalidated connection and throwsTableNotFoundException, causing the entire transaction to roll back.Fix: Move statistics initialization into
CreateTableOperation.run()immediately aftercreateTable()returns. Statistics are written on the same connection before constraint registration can invalidate the state.Tables without constraints are unaffected, the statistics write is semantically equivalent to the previous deferred write for newly created empty tables.
Impact
Users can now create Hive tables with
NOT NULLconstraints without encounteringTableNotFoundExceptionerrors.Test Plan
Tested via Presto-CLI
Before fix:
After fix:
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
Summary by Sourcery
Handle Hive table statistics initialization inline during table creation to avoid failures when NOT NULL constraints are present.
Bug Fixes:
Enhancements: