Add config to specify metastore catalog name#24235
Conversation
00027a7 to
e72d7ff
Compare
|
Saved that user @AnuragKDwivedi is from IBM |
|
Consider adding documentation for this configuration property. Perhaps in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties.rst. |
9b9b2d9 to
c97e5bf
Compare
|
New release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR. I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link. |
ZacBlanco
left a comment
There was a problem hiding this comment.
Just a few things on my first pass. Please take a look a all your usages of Optional and make sure there are no Optional.of[Nullable](Optional::get) occurrences. I found two while skimming
|
Thanks for the release note! To improve the release note entry, please include the name of the configuration property. |
cb29758 to
ab9a9b9
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (doc)
Pull branch, local doc build, looks good. Thanks for the doc!
| private boolean readNullMaskedParquetEncryptedValueEnabled; | ||
| private boolean useParquetColumnNames; | ||
| private boolean zstdJniDecompressionEnabled; | ||
| private String catalogName; |
There was a problem hiding this comment.
I feel like a sane default would be the name of the presto catalog. Would it be possible to set that here?
There was a problem hiding this comment.
If we set a default value here, Presto will start sending the catalog name even when the configuration is not explicitly defined in the properties. This could cause issues with different metastores that do not support catalog names as input. The intention was to keep this configurable and only pass it when the metastore can handle it.
Could you share the advantages you see in adding a default value here?
There was a problem hiding this comment.
The advantage is that users would not have to specify it in the first place when using HMS. Consider two hive catalog configured through hive1.properties and hive2.properties. If they were configured to use the same HMS, then inside the config files then in the catalog configuration you would need to set this property as hive.metastore.catalog.name=hive1, etc. If presto already has the catalog name available, it would seem to make sense to default the catalog name to the configured presto catalog name
I do however understand the desire for passing the catalog name to be optional for the case where a metastore doesn't support the "catalog" feature. I'm not sure what the best course of action is other than adding a boolean flag in addition to this one to support both having a sane default and verifying that the feature is enabled
There was a problem hiding this comment.
Won't using a boolean hive.metastore.send-catalog-name=true in the core.config suffice?
|
|
79d55bd to
739f218
Compare
majetideepak
left a comment
There was a problem hiding this comment.
Isn't the catalog name derived from the catalog filename? Why do we need another config?
| private boolean readNullMaskedParquetEncryptedValueEnabled; | ||
| private boolean useParquetColumnNames; | ||
| private boolean zstdJniDecompressionEnabled; | ||
| private String catalogName; |
There was a problem hiding this comment.
Won't using a boolean hive.metastore.send-catalog-name=true in the core.config suffice?
majetideepak
left a comment
There was a problem hiding this comment.
Thanks for the clarification offline. I included the example you shared in the description.
Please update the commit title and description as well. That is what gets landed.
739f218 to
531645b
Compare
### Description: This commit introduces a new configuration to allow users to specify a catalog name for the metastore in Presto. By using the `hive.metastore.catalog.name` property, users can manage and connect to multiple metastores with the same catalog name. This enhancement enables a more flexible and configurable approach to handling catalogs and schemas across different metastores and storage backends, especially in complex, multi-region, and multi-tenant environments.
|
@ZacBlanco can you re-approve? |
|
prestocpp-macos-build / prestocpp-macos-build-engine (pull_request) |
|
@AnuragKDwivedi This PR requires a regeneration of the presto protocol. Opened an issue here |
Description
This PR introduces a new configuration that can be applied to Hive, Hudi, Delta, and Iceberg catalog properties. The configuration enables the catalog name to be passed to the metastore, significantly enhancing the metastore's capabilities for managing and organizing schemas and tables based on the catalog name.
By passing the catalog name, the metastore can now support unique schema creation under different catalogs, as it already recognizes the combination of catalog and schema as unique. Additionally, this change allows the metastore to filter schemas at the metastore layer itself, making schema management more efficient.
Motivation and Context
Previously, due to the absence of the catalog name in metastore interactions, all schemas were created under the default "hive" catalog. This limitation made it impossible for users to filter or retrieve schemas associated with a specific catalog. The metastore lacked the ability to distinguish between schemas created under different catalogs.
With this update:
This change addresses a long-standing limitation and significantly improves schema management in environments using Hive, Hudi, Delta, and Iceberg catalogs.
Example:
A user needs to connect to two different metastores that both have the same catalog name (
foo) already registered. This can be done by creating two separate properties files in Presto. In each file, sethive.metastore.catalog.nametofooand specify different hive.metastore.uri values for each metastore. In this case, the user can create two properties files with names likefoo-a-metastore.propertiesandfoo-b-metastore.properties, and set the catalog name to foo within both files.Fixes: #22895
Impact
NA
Test Plan
CI passed
Contributor checklist