Skip to content

fix(plugin-hive): Write table statistics inline during CREATE TABLE with NOT NULL constraints#27687

Open
mehradpk wants to merge 1 commit intoprestodb:masterfrom
mehradpk:hive-constraint-statistics-race
Open

fix(plugin-hive): Write table statistics inline during CREATE TABLE with NOT NULL constraints#27687
mehradpk wants to merge 1 commit intoprestodb:masterfrom
mehradpk:hive-constraint-statistics-race

Conversation

@mehradpk
Copy link
Copy Markdown
Contributor

@mehradpk mehradpk commented Apr 30, 2026

Description

CREATE TABLEwithNOT NULLconstraints fails withTableNotFoundException` on the Hive connector.

Root cause: HMS registers NOT NULL constraints via an internal alter_table call immediately after table creation. This invalidates the thrift connection state. The deferred UpdateStatisticsOperation then attempts to read statistics on the invalidated connection and throws TableNotFoundException, causing the entire transaction to roll back.

Fix: Move statistics initialization into CreateTableOperation.run() immediately after createTable() returns. Statistics are written on the same connection before constraint registration can invalidate the state.

Tables without constraints are unaffected, the statistics write is semantically equivalent to the previous deferred write for newly created empty tables.

Impact

Users can now create Hive tables with NOT NULL constraints without encountering TableNotFoundException errors.

Test Plan

Tested via Presto-CLI

Before fix:

presto> CREATE TABLE hive_data.test_hive.tab1 (
    ->     c_custkey bigint NOT NULL
    -> );
Query 20260410_074001_00323_brxgg failed: Table 'test_hive.tab1' not found

After fix:

presto> CREATE TABLE hive_data.test_hive.tab1 (
    ->     c_custkey bigint NOT NULL
    -> );
CREATE TABLE

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Hive Connector Changes
* Fix `CREATE TABLE` failure when using `NOT NULL` constraints. Statistics are now written inline during table creation to prevent race conditions with HMS constraint registration.

Summary by Sourcery

Handle Hive table statistics initialization inline during table creation to avoid failures when NOT NULL constraints are present.

Bug Fixes:

  • Prevent TableNotFoundException when creating Hive tables with NOT NULL constraints by writing table statistics on the same metastore connection immediately after table creation.

Enhancements:

  • Refine CreateTableOperation to accept and apply initial table statistics directly instead of relying on a deferred UpdateStatisticsOperation.

… NULL constraints

HMS registers NOT NULL constraint via an internal alter_table call that invalidates the thrift connection state. The deferred UpdateStatisticsOperation then fails with TableNotFoundException when attempting to read statistics on the invalidated connection, causing the transaction to roll back.

Move statistics initialization into CreateTableOperation.run() immediately after createTable(), before any NOT NULL constraint registration can invalidate the connection state.
@mehradpk mehradpk requested a review from a team as a code owner April 30, 2026 02:36
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Apr 30, 2026
@prestodb-ci prestodb-ci requested review from a team, jkhaliqi and wanglinsong and removed request for a team April 30, 2026 02:36
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Apr 30, 2026

Reviewer's Guide

This PR changes Hive table creation so that initial table statistics are written inline within CreateTableOperation, on the same metastore connection as createTable(), instead of being deferred to a separate UpdateStatisticsOperation, preventing Hive Metastore connection invalidation when NOT NULL constraints trigger internal alter_table calls.

Sequence diagram for inline table statistics write during Hive CREATE TABLE

sequenceDiagram
    actor Client
    participant PrestoCoordinator
    participant SemiTransactionalHiveMetastore
    participant CreateTableOperation
    participant ExtendedHiveMetastore

    Client->>PrestoCoordinator: SQL CREATE TABLE ... NOT NULL ...
    PrestoCoordinator->>SemiTransactionalHiveMetastore: prepareAddTable()
    SemiTransactionalHiveMetastore->>CreateTableOperation: new CreateTableOperation(metastoreContext, newTable, privileges, ignoreExisting, constraints, statisticsUpdate)
    SemiTransactionalHiveMetastore->>ExtendedHiveMetastore: addTableOperations

    PrestoCoordinator->>SemiTransactionalHiveMetastore: commit()
    SemiTransactionalHiveMetastore->>CreateTableOperation: run(metastore)
    CreateTableOperation->>ExtendedHiveMetastore: createTable(metastoreContext, newTable, privileges, constraints)
    ExtendedHiveMetastore-->>CreateTableOperation: MetastoreOperationResult
    CreateTableOperation->>ExtendedHiveMetastore: updateTableStatistics(metastoreContext, dbName, tableName, statisticsUpdate)
    ExtendedHiveMetastore-->>CreateTableOperation: statistics updated
    CreateTableOperation-->>SemiTransactionalHiveMetastore: operationResult
    SemiTransactionalHiveMetastore-->>PrestoCoordinator: CREATE TABLE committed
    PrestoCoordinator-->>Client: CREATE TABLE
Loading

Updated class diagram for SemiTransactionalHiveMetastore and CreateTableOperation

classDiagram
    class SemiTransactionalHiveMetastore {
        - List addTableOperations
        - List updateStatisticsOperations
        + void prepareAddTable(MetastoreContext metastoreContext, HdfsContext context, TableAndMore tableAndMore)
        + void commit()
    }

    class TableAndMore {
        + Table getTable()
        + PrincipalPrivileges getPrincipalPrivileges()
        + boolean isIgnoreExisting()
        + List~TableConstraint~ getConstraints()
        + PartitionStatistics getStatisticsUpdate()
    }

    class CreateTableOperation {
        - Table newTable
        - PrincipalPrivileges privileges
        - boolean ignoreExisting
        - List~TableConstraint~ constraints
        - String queryId
        - MetastoreContext metastoreContext
        - Optional~MetastoreOperationResult~ operationResult
        - PartitionStatistics statisticsUpdate
        + CreateTableOperation(MetastoreContext metastoreContext, Table newTable, PrincipalPrivileges privileges, boolean ignoreExisting, List~TableConstraint~ constraints, PartitionStatistics statisticsUpdate)
        + String getDescription()
        + void run(ExtendedHiveMetastore metastore)
        + void undo(ExtendedHiveMetastore metastore)
    }

    class ExtendedHiveMetastore {
        + MetastoreOperationResult createTable(MetastoreContext metastoreContext, Table newTable, PrincipalPrivileges privileges, List~TableConstraint~ constraints)
        + void updateTableStatistics(MetastoreContext metastoreContext, String databaseName, String tableName, PartitionStatistics statisticsUpdate)
    }

    SemiTransactionalHiveMetastore "1" o-- "*" CreateTableOperation : addTableOperations
    SemiTransactionalHiveMetastore "1" o-- "*" TableAndMore
    CreateTableOperation --> ExtendedHiveMetastore : uses
    CreateTableOperation --> Table : newTable
    CreateTableOperation --> PrincipalPrivileges : privileges
    CreateTableOperation --> PartitionStatistics : statisticsUpdate
    CreateTableOperation --> MetastoreContext : metastoreContext
    CreateTableOperation --> TableConstraint : constraints
    SemiTransactionalHiveMetastore --> ExtendedHiveMetastore : commit operations
Loading

File-Level Changes

Change Details Files
Inline table statistics write inside CreateTableOperation immediately after table creation instead of deferring to a separate UpdateStatisticsOperation.
  • Extend CreateTableOperation constructor to accept a PartitionStatistics statisticsUpdate parameter and store it as a field
  • Update prepareAddTable to pass statisticsUpdate into CreateTableOperation and stop enqueuing a separate UpdateStatisticsOperation for new tables
  • In CreateTableOperation.run, after metastore.createTable, call metastore.updateTableStatistics on the same metastore context and connection using the captured statisticsUpdate
  • Document in comments that statistics must be written inline to avoid HMS internal alter_table calls for NOT NULL constraints invalidating the thrift connection state
presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@mehradpk mehradpk changed the title fix(hive): Write table statistics inline during CREATE TABLE with NOT NULL constraints fix(plugin-hive): Write table statistics inline during CREATE TABLE with NOT NULL constraints Apr 30, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Previously UpdateStatisticsOperation was only enqueued for non-view tables, but CreateTableOperation.run() now unconditionally calls updateTableStatistics; consider guarding this call (e.g., skipping for views) to preserve the prior behavior and avoid potential failures when creating Presto views.
  • The inline updateTableStatistics now runs as part of CreateTableOperation.run() and will cause the whole create to fail if statistics update throws; if the old deferred behavior tolerated stats failures without aborting table creation, you may want to explicitly handle exceptions here to keep that failure mode consistent.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Previously `UpdateStatisticsOperation` was only enqueued for non-view tables, but `CreateTableOperation.run()` now unconditionally calls `updateTableStatistics`; consider guarding this call (e.g., skipping for views) to preserve the prior behavior and avoid potential failures when creating Presto views.
- The inline `updateTableStatistics` now runs as part of `CreateTableOperation.run()` and will cause the whole create to fail if statistics update throws; if the old deferred behavior tolerated stats failures without aborting table creation, you may want to explicitly handle exceptions here to keep that failure mode consistent.

## Individual Comments

### Comment 1
<location path="presto-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/SemiTransactionalHiveMetastore.java" line_range="1418-1427" />
<code_context>
-                        tableAndMore.getStatisticsUpdate(),
-                        false));
-            }
+            // Statistics are written inline by CreateTableOperation to avoid a race with constraint registration.
+            // HMS constraint additions trigger an internal alter_table call that can invalidate the thrift
+            // connection before a deferred UpdateStatisticsOperation runs.
+            addTableOperations.add(new CreateTableOperation(
+                    metastoreContext,
+                    table,
+                    tableAndMore.getPrincipalPrivileges(),
+                    tableAndMore.isIgnoreExisting(),
+                    tableAndMore.getConstraints(),
+                    tableAndMore.getStatisticsUpdate()));
         }

</code_context>
<issue_to_address>
**issue (bug_risk):** Restoring the `isPrestoView` guard may be needed to avoid writing stats for views

Previously, stats updates were only scheduled when `!isPrestoView(table)`. With this change, `CreateTableOperation` is always given `tableAndMore.getStatisticsUpdate()`, and `run()` will always call `updateTableStatistics`, changing behavior for views. If `statisticsUpdate` can be non-empty/non-null for views, we’ll now try to write table-level stats for them, which may have been intentionally avoided (per the original guard) and could cause extra metastore traffic or failures. Consider either restoring an `isPrestoView` check here or guaranteeing that `statisticsUpdate` is always a no-op for views before passing it to `CreateTableOperation`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1418 to +1427
// Statistics are written inline by CreateTableOperation to avoid a race with constraint registration.
// HMS constraint additions trigger an internal alter_table call that can invalidate the thrift
// connection before a deferred UpdateStatisticsOperation runs.
addTableOperations.add(new CreateTableOperation(
metastoreContext,
table,
tableAndMore.getPrincipalPrivileges(),
tableAndMore.isIgnoreExisting(),
tableAndMore.getConstraints(),
tableAndMore.getStatisticsUpdate()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Restoring the isPrestoView guard may be needed to avoid writing stats for views

Previously, stats updates were only scheduled when !isPrestoView(table). With this change, CreateTableOperation is always given tableAndMore.getStatisticsUpdate(), and run() will always call updateTableStatistics, changing behavior for views. If statisticsUpdate can be non-empty/non-null for views, we’ll now try to write table-level stats for them, which may have been intentionally avoided (per the original guard) and could cause extra metastore traffic or failures. Consider either restoring an isPrestoView check here or guaranteeing that statisticsUpdate is always a no-op for views before passing it to CreateTableOperation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants