Skip to content

Add config to specify metastore catalog name#24235

Merged
ZacBlanco merged 1 commit into
prestodb:masterfrom
AnuragKDwivedi:catalog-name-to-metastore
Apr 23, 2025
Merged

Add config to specify metastore catalog name#24235
ZacBlanco merged 1 commit into
prestodb:masterfrom
AnuragKDwivedi:catalog-name-to-metastore

Conversation

@AnuragKDwivedi

@AnuragKDwivedi AnuragKDwivedi commented Dec 10, 2024

Copy link
Copy Markdown
Contributor

Description

This PR introduces a new configuration that can be applied to Hive, Hudi, Delta, and Iceberg catalog properties. The configuration enables the catalog name to be passed to the metastore, significantly enhancing the metastore's capabilities for managing and organizing schemas and tables based on the catalog name.

By passing the catalog name, the metastore can now support unique schema creation under different catalogs, as it already recognizes the combination of catalog and schema as unique. Additionally, this change allows the metastore to filter schemas at the metastore layer itself, making schema management more efficient.

Motivation and Context

Previously, due to the absence of the catalog name in metastore interactions, all schemas were created under the default "hive" catalog. This limitation made it impossible for users to filter or retrieve schemas associated with a specific catalog. The metastore lacked the ability to distinguish between schemas created under different catalogs.

With this update:

  • Schemas can now be managed and organized by catalog, leveraging the metastore's existing support for catalog-schema uniqueness.
  • Users can create schemas with the same name under different catalogs, enabling better schema organization and reducing naming conflicts.
  • Schema filtering at the metastore layer becomes possible, providing more accurate and efficient responses to schema queries.
    This change addresses a long-standing limitation and significantly improves schema management in environments using Hive, Hudi, Delta, and Iceberg catalogs.

Example:
A user needs to connect to two different metastores that both have the same catalog name (foo) already registered. This can be done by creating two separate properties files in Presto. In each file, set hive.metastore.catalog.name to foo and specify different hive.metastore.uri values for each metastore. In this case, the user can create two properties files with names like foo-a-metastore.properties and foo-b-metastore.properties, and set the catalog name to foo within both files.

Fixes: #22895

Impact

NA

Test Plan

CI passed

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
== RELEASE NOTES ==

General Changes
* Add configuration property ``hive.metastore.catalog.name`` to pass catalog names to the metastore, enabling catalog-based schema management and filtering.

@AnuragKDwivedi AnuragKDwivedi force-pushed the catalog-name-to-metastore branch from 00027a7 to e72d7ff Compare December 12, 2024 05:28
@tdcmeehan tdcmeehan added the from:IBM PR from IBM label Dec 13, 2024
@prestodb-ci prestodb-ci requested review from a team, Dilli-Babu-Godari and ShahimSharafudeen and removed request for a team December 13, 2024 15:18
@prestodb-ci

Copy link
Copy Markdown
Contributor

Saved that user @AnuragKDwivedi is from IBM

@steveburnett

Copy link
Copy Markdown
Contributor

Consider adding documentation for this configuration property. Perhaps in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst.

@AnuragKDwivedi AnuragKDwivedi force-pushed the catalog-name-to-metastore branch from 9b9b2d9 to c97e5bf Compare February 3, 2025 10:41
@AnuragKDwivedi AnuragKDwivedi marked this pull request as ready for review February 3, 2025 12:29
@AnuragKDwivedi AnuragKDwivedi marked this pull request as draft February 3, 2025 12:30
@steveburnett

Copy link
Copy Markdown
Contributor

New release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.

:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

@AnuragKDwivedi AnuragKDwivedi marked this pull request as ready for review February 4, 2025 05:43

@ZacBlanco ZacBlanco left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few things on my first pass. Please take a look a all your usages of Optional and make sure there are no Optional.of[Nullable](Optional::get) occurrences. I found two while skimming

Comment thread presto-common/src/main/java/com/facebook/presto/common/Utils.java Outdated
Comment thread presto-docs/src/main/sphinx/connector/deltalake.rst Outdated
Comment thread presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java Outdated
Comment thread presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergHiveMetadata.java Outdated
@steveburnett

Copy link
Copy Markdown
Contributor

Thanks for the release note! To improve the release note entry, please include the name of the configuration property.

steveburnett
steveburnett previously approved these changes Feb 26, 2025

@steveburnett steveburnett left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (doc)

Pull branch, local doc build, looks good. Thanks for the doc!

private boolean readNullMaskedParquetEncryptedValueEnabled;
private boolean useParquetColumnNames;
private boolean zstdJniDecompressionEnabled;
private String catalogName;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a sane default would be the name of the presto catalog. Would it be possible to set that here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we set a default value here, Presto will start sending the catalog name even when the configuration is not explicitly defined in the properties. This could cause issues with different metastores that do not support catalog names as input. The intention was to keep this configurable and only pass it when the metastore can handle it.

Could you share the advantages you see in adding a default value here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage is that users would not have to specify it in the first place when using HMS. Consider two hive catalog configured through hive1.properties and hive2.properties. If they were configured to use the same HMS, then inside the config files then in the catalog configuration you would need to set this property as hive.metastore.catalog.name=hive1, etc. If presto already has the catalog name available, it would seem to make sense to default the catalog name to the configured presto catalog name

I do however understand the desire for passing the catalog name to be optional for the case where a metastore doesn't support the "catalog" feature. I'm not sure what the best course of action is other than adding a boolean flag in addition to this one to support both having a sane default and verifying that the feature is enabled

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't using a boolean hive.metastore.send-catalog-name=true in the core.config suffice?

Comment thread presto-common/src/main/java/com/facebook/presto/common/Utils.java Outdated
@linux-foundation-easycla

linux-foundation-easycla Bot commented Apr 21, 2025

Copy link
Copy Markdown

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: AnuragKDwivedi / name: Anurag Kumar Dwivedi (23ee6bc)

@AnuragKDwivedi AnuragKDwivedi force-pushed the catalog-name-to-metastore branch 7 times, most recently from 79d55bd to 739f218 Compare April 22, 2025 10:08
ZacBlanco
ZacBlanco previously approved these changes Apr 22, 2025

@majetideepak majetideepak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the catalog name derived from the catalog filename? Why do we need another config?

Comment thread presto-docs/src/main/sphinx/connector/deltalake.rst Outdated
private boolean readNullMaskedParquetEncryptedValueEnabled;
private boolean useParquetColumnNames;
private boolean zstdJniDecompressionEnabled;
private String catalogName;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't using a boolean hive.metastore.send-catalog-name=true in the core.config suffice?

@majetideepak majetideepak changed the title Added new configuration to pass catalog name to metastore when creati… Add config to specify metastore catalog name Apr 23, 2025

@majetideepak majetideepak left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification offline. I included the example you shared in the description.
Please update the commit title and description as well. That is what gets landed.

majetideepak
majetideepak previously approved these changes Apr 23, 2025
### Description:
This commit introduces a new configuration to allow users to specify a catalog name for the metastore in Presto. By using the `hive.metastore.catalog.name` property, users can manage and connect to multiple metastores with the same catalog name. This enhancement enables a more flexible and configurable approach to handling catalogs and schemas across different metastores and storage backends, especially in complex, multi-region, and multi-tenant environments.
@majetideepak

Copy link
Copy Markdown
Collaborator

@ZacBlanco can you re-approve?

@ZacBlanco

Copy link
Copy Markdown
Contributor

prestocpp-macos-build / prestocpp-macos-build-engine (pull_request)
prestocpp-macos-build / prestocpp-macos-build-engine (pull_request) has known issues. Going to merge this change is highly unlikely to have caused more failures in the native build

@mohsaka

mohsaka commented May 5, 2025

Copy link
Copy Markdown
Contributor

@AnuragKDwivedi This PR requires a regeneration of the presto protocol. Opened an issue here
#25049

FYI @aditi-pandit @ZacBlanco @majetideepak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Enhancement: Enable Presto Server to Transmit Catalog Name for Enhanced Functionality in the Metastore Layer

8 participants