Skip to content

HHH-19559 add hibernate.multi_tenant.set_schema #10372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gavinking
Copy link
Member

@gavinking gavinking commented Jun 19, 2025

This is an alternative solution to the problem of schema-based multi-tenancy.

It introduces:

  1. hibernate.multi_tenant.set_schema, and
  2. CurrentTenantIdentifierResolver.schemaName(tenantIdentifier).

I need a review here @sebersole.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license
and can be relicensed under the terms of the LGPL v2.1 license in the future at the maintainers' discretion.
For more information on licensing, please check here.


https://hibernate.atlassian.net/browse/HHH-19559

@gavinking gavinking requested a review from sebersole June 19, 2025 18:46
@gavinking gavinking marked this pull request as draft June 19, 2025 18:46
*
* @since 7.1
*/
String SET_TENANT_SCHEMA = "hibernate.multi_tenant.set_schema";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the naming here at all; unfortunately the existing properties have irregular naming.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was an unfortunate reality of a singular AvailableSettings with a zagillabillion settings. Related things weren't always "geographically" close to one another and naming tended to get disjointed. It was the main reason I started breaking them out.

But we can always "shadow deprecate" them - aka create new ones that replace the old while still recognizing the old. See DeprecationLogger#deprecatedSetting(java.lang.String, java.lang.String)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we can always "shadow deprecate" them - aka create new ones that replace the old while still recognizing the old.

Yeah, that would be reasonable, I think.

Comment on lines +58 to +81

try {
session.afterObtainConnection( connection );
}
catch (SQLException e) {
try {
releaseConnection( connection );
}
catch (SQLException re) {
e.addSuppressed( re );
}
throw e;
}
return connection;
}

@Override
public void releaseConnection(Connection connection) throws SQLException {
try {
session.beforeReleaseConnection( connection );
}
catch (SQLException e) {
log.warn( "Error before releasing JDBC connection", e );
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebersole I'm anticipating that you might object to what I'm doing here. If so, I need a better suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that a user would not need MultiTenantConnectionProvider right? They could just do the normal ConnectionProvider set up and, in conjunction with CurrentTenantIdentifierResolver, we'd just alter the schema/catalog used by the Session - right? Just want to make sure I'm understanding the intent here.

Assuming iiuc couldn't we just imply this? It would be better if there were an actual flag saying that the user wants schema/catalog tenancy. E.g.

hibernate.connection...
hibernate.tenancy.strategy=schema
hibernate.tenant_identifier_resolver=my.ResolverImpl

And that this would just be enough to infer what to do.

Have you looked at incorporating this into the LogicalConnection for the Session? Naively perhaps, that would seem like a better place.

Copy link
Member

@sebersole sebersole Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There used to be a way that users specified the type of tenancy, but I think that got dropped. In a way, this PR adds that distinction back.

Another option would be to infer this when there is a ConnnectionProvider (and NOT a MultiTenantConnectionProvider) with a configured CurrentTenantIdentifierResolver. What I mean is that this may actually be enough:

hibernate.connection...
hibernate.tenant_identifier_resolver=my.ResolverImpl

Trying to get away from the need for a setting here. Or is there some nuance I'm missing that makes you think we need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that a user would not need MultiTenantConnectionProvider right?

Yes.

They could just do the normal ConnectionProvider set up and, in conjunction with CurrentTenantIdentifierResolver, we'd just alter the schema/catalog used by the Session - right?

Yes.

It would be better if there were an actual flag saying that the user wants schema/catalog tenancy.

That's ... pretty much what I have implemented here, I believe. I just called the property something different.

Have you looked at incorporating this into the LogicalConnection for the Session? Naively perhaps, that would seem like a better place.

I have not. Will take a look at that.

OTOH, I think I would prefer this not be too deep down in the layers. It's a fairly high-level construct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, I think I would prefer this not be too deep down in the layers. It's a fairly high-level construct.

I'd argue that this is far deeper than LogicalConnection, which is one of the reasons I mentioned it. Plus LogicalConnection is the "eye of the needle" (in the good way) for Connection-related access for the Session.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a defined tenant id,

You mean the resolver? Because I don't know that asking the resolver, if one, early is always feasible. Not saying it isn't; just saying I'm not sure it's not ;)

Ah, well, no, I meant even if there was one explicitly set on the session.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But given the 2 explicit options I'd defiitely prefer hibernate.tenancy.style=schema over hibernate.multi_tenant.set_schema=true.

Wouldn't it be a reasonable use case to use both datasource-based and schema-based multitenancy?

E.g. we could have:

tenant1.1 => datasource 1, schema1
tenant1.2 => datasource 1, schema2
tenant2.1 => datasource 2, schema1
tenant2.2 => datasource 2, schema2

Thinking about it, we could even throw discriminator-based multitenancy into the mix, if some tenants share the same datasource + schema...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be a reasonable use case to use both datasource-based and schema-based multitenancy?

Well this is already easily possible by just implementing your own MultiTenantConnectionProvider.

For such complicated scenarios, I think it's totally reasonable that you have to write a bit of code.

The problem I'm trying to solve here is to have a way to do schema-based multitenancy with any regular-ol' ConnectionProvider, without having to provide a MultiTenantConnectionProvider.

Copy link
Member

@yrodiere yrodiere Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that if you introduce hibernate.tenancy.style=schema, you'll need to take into account that there's a custom style or whatever you want to call it, that goes beyond just datasource/schema/discriminator. Or that not setting a style is special case. Or that the datasource style actually encompasses all others. Or...

I get you're trying to make simple things simpler, I'd be wary about making complex things more complex or more confusing, because that looks like a very real risk here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that if you introduce hibernate.tenancy.style=schema, you'll need to take into account that there's a custom style or whatever you want to call it, that goes beyond just datasource/schema/discriminator

I think this is a good point, and goes to what I suggested with the 3 situations. Phrased in the terms here we have:

  1. column-based (@TenantId)
  2. explicit MultiTenantConnectionProvider-based
  3. ConnectionProvider + "adjustment" of the Connection for the Session

I get you're trying to make simple things simpler, I'd be wary about making complex things more complex or more confusing, because that looks like a very real risk here.

I'm not following this... What is the complex case that you think is getting more complex here?

@gavinking gavinking force-pushed the HHH-19559-alt branch 3 times, most recently from 22c10f6 to bdea4cf Compare June 19, 2025 19:03
Comment on lines +54 to +68
/**
* The name of the database schema for data belonging to the tenant with the
* given identifier.
* <p>
* Called when {@value org.hibernate.cfg.MultiTenancySettings#SET_TENANT_SCHEMA}
* is enabled.
*
* @param tenantIdentifier The tenant identifier
* @return The name of the database schema belonging to that tenant
*
* @see org.hibernate.cfg.MultiTenancySettings#SET_TENANT_SCHEMA
*/
default @NonNull String schemaName(@NonNull T tenantIdentifier) {
return tenantIdentifier.toString();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels... out of place? I'm always a bit weirded out by "optionally relevant" methods like this.

Beyond feelings, when I think about it... My main argument in this case would be that the tenant identifier can be provided explicitly when opening a session, so I'm not sure a CurrentTenantIdentifierResolver would necessarily always make sense in an application. In which case you'd have no way of leveraging this feature, except perhaps implementing all the other methods by throwing exceptions.

Can't we instead have an interface to turn an identifier name to an schema name, and ask people to have hibernate.multi_tenant.schema_resolver (or whatever) to an implementation? We can totally have a simple implementation that just takes the tenant identifier as the schema name.

That way, I think the "schema setting" part could be implemented similarly to discriminator-based multitenancy. Connections would be irrelevant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way... I get the feeling this could be a much larger concept. Like, an interface that resolves a tenant ID into a "datasource ID" (passed to connection resolvers) and schema name and discriminator (for discriminator-based multitenancy, so that discriminators in the DB aren't necessarily exactly the tenant ID).
Not suggesting to do this here, but it does feel... clean? Maybe I'm just being weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels... out of place? I'm always a bit weirded out by "optionally relevant" methods like this.

Well that's what I meant when I said it's not completely conceptually clean.

But it's sure as hell convenient for the user. If I'm a user of this stuff I would much prefer to have all multitenancy-related shit located in just one class.

Can't we instead have an interface to turn an identifier name to an schema name, and ask people to have hibernate.multi_tenant.schema_resolver (or whatever) to an implementation? We can totally have a simple implementation that just takes the tenant identifier as the schema name.

I mean, sure, of course we can.

But it's just such a god damn clusterfuck as far as usability goes. How many properties are you going to have to set, and how many interfaces are you having to implement, to get this one trivial feature working?

Like, an interface that resolves a tenant ID into a "datasource ID" (passed to connection resolvers) and schema name and discriminator (for discriminator-based multitenancy, so that discriminators in the DB aren't necessarily exactly the tenant ID).

I guess I think this would be jumping the shark.

Copy link
Member

@yrodiere yrodiere Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, an interface that resolves a tenant ID into a "datasource ID" (passed to connection resolvers) and schema name and discriminator (for discriminator-based multitenancy, so that discriminators in the DB aren't necessarily exactly the tenant ID).

I guess I think this would be jumping the shark.

It's not that different from naming strategies, but sure, it's complex.

Anyway, my point was: you want to merge two responsibilities in the same interface to minimize complexity, but:

  1. These particular two responsibilities can be used independently (one without the other), so this creates complexity for those needing only one. Which I guess is fine if we consider people generally need both.
  2. The derivation of a schema name from a tenant ID is itself a responsibility that could possibly have close siblings, such as deriving a discriminator from a tenant ID.

I would add that finding the current tenant ID generally has to do with the higher layers of your application (e.g. inspecting user connection info), while turning a tenant ID to a schema name is clearly about the lower layers (DBA stuff). So it's really a bit weird to mix those.

If I had to start from scratch with a single entry point for the whole thing, with some cohesion, I'd expose something like this:

interface MultiTenancyStrategy {

    // Optionals could just be nullable. Whatever.

    Optional<MultiTenantConnectionProvider> createConnectionProvider( .... );

    Optional<String> tenantIdToSchemaName(String tenantId);

    // Could be a type parameter instead of Object, but IMO if we don't check consistency on startup, that's hiding problems under the carpet.

    Object tenantIdToDiscriminator(String tenantId);

    String discriminatorToTenantId(Object tenantId);

}

EDIT: With default methods, obviously.

Then you can have a default "discriminator" strategy that just takes the tenant ID as a discriminator, an opt-in (but built-in) "schema" strategy that takes the tenant ID as schema name, and custom strategies from everything else. And the entry point is clear and exposes everything you need to know.


That being said... I'm certainly not blocking you from doing that. I just disagree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yrodiere Note that I'm not against having all these three things as (almost) completely independent features, in the sense that the user is allowed to turn each on and off independently. That's more-or-less what I've implemented. Though it's not completely realized because I didn't add support for setSchema() to ContextualJdbcConnectionAccess. Or rather: I added it then removed it.

But Steve on the other hand is pushing more in the direction of making these things mutually-exclusive, more like the model which existed implicitly on the old enum.

I'm fine either way; I just wouldn't add "can use them all together" as a requirement.

Copy link
Member

@yrodiere yrodiere Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Steve on the other hand is pushing more in the direction of making these things mutually-exclusive, more like the model which existed implicitly on the old enum.

Okay, well I certainly don't agree with that. IMO that's an artificial limitation that will go against very real use cases. Having some tenants with a dedicated database, and others with a shared one but separate schema, is really not that much of a stretch.

So users with this use case will have to ignore the feature you're adding and do everything in the connection provider.
And frameworks like Quarkus, where the connection provider needs to be implemented by Quarkus itself due to integrations with e.g. Agroal, will be implementing everything all over again in their own code.

It would be a real waste of effort.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yoann disagrees with me?!? No way ;)

The mapping of identifier in the app to tenant-name in the database can actually be a shared contract. We could even apply it in the @TenantId case. And can be as simple as a conversion:

interface TenantNameMapper {
    /**
     * Maps the name of the tenant used in the application (aka, CurrentTenantIdentifierResolver#resolveCurrentTenantIdentifier) to the corresponding database name which might be a schema name or 
    String toDatabaseName(Object identifier);
    Object toIdentifier(String databaseName);
}

TenantNameMapper#toDatabaseName allows taking, e.g., CurrentTenantIdentifierResolver#resolveCurrentTenantIdentifier and mapping it to a value that depends a bit on the overall strategy:

  1. For column-based, this would be the value stored in the column
  2. For MultiTenantConnectionProvider, this would become the valiue passed to its #getConnection method
  3. Here it would be the name of the schema we switch to.

#toIdentifier is obviously the inverse of this. And maybe not even needed depending

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think I am "pushing more in the direction of making these things mutually-exclusive" so much as trying to recognize the bits that are reusable across the different situations and designing the SPI around that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yoann disagrees with me?!? No way ;)

What can I say, I'm a sucker for traditions! And we both like this one ;)

interface TenantNameMapper {

This makes a lot of sense to me, and is more or less in line with my initial proposal:

Can't we instead have an interface to turn an identifier name to an schema name, and ask people to have hibernate.multi_tenant.schema_resolver (or whatever) to an implementation? We can totally have a simple implementation that just takes the tenant identifier as the schema name.

... but (to keep the tradition alive):

  1. toIdentifier is pointless in the schema case (I think?)

  2. Gavin doesn't like this. Quoting:

    I mean, sure, of course we can.

    But it's just such a god damn clusterfuck as far as usability goes. How many properties are you going to have to set, and how many interfaces are you having to implement, to get this one trivial feature working?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toIdentifier is pointless in the schema case (I think?)

Yep, as I said:

#toIdentifier is obviously the inverse of this. And maybe not even needed depending

However, (1) it does not hurt and (2) I'd much prefer a single way to do that across all "strategies".

Gavin doesn't like this. Quoting:

In the normal case you don't even implement/provide this.

@@ -697,6 +699,32 @@ public JdbcConnectionAccess getJdbcConnectionAccess() {
return jdbcConnectionAccess;
}

private boolean useSchemaBasedMultiTenancy() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are again :)

Comment on lines 681 to +682
if ( !factoryOptions.isMultiTenancyEnabled() ) {
// we might still be using schema-based multitenancy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, what?

This might warrant a renaming of isMultiTenancyEnabled()? Perhaps isMultiTenancyEnabledOrNotIDontKnowAskSomeoneElse()?

Copy link
Member Author

@gavinking gavinking Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yoann, that's the whole point of this.

The design that we inherited here was that "Mutitenancy" = you have a MultiTenantConnectionProvider. And that was just wrong, and I guess we need to get away from it.

The problem is that Hibernate has support for all sorts of ConnectionProviders, and you completely lose all of that as soon as you need a MultiTenantConnectionProvider. The only thing we sorta give you is support for the case of a JNDI-bound DataSource.

Copy link
Member Author

@gavinking gavinking Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the old MultiTenancyStrategy enum we used to have, SCHEMA and DATABASE were synonyms and both meant "use a custom MultiTenantConnectionProvider". (And DISCRIMINATOR was simply unimplemented.) Here I'm introducing a totally new thing that we've never had before: a way to do schema-based multitenancy without a MultiTenantConnectionProvider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know. Sorry, perhaps not the best time for humor.

More seriously, renaming isMultiTenancyEnabled() to isMultiTenantConnectionProviderEnabled would probably be clearer than adding comments like this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More seriously, renaming isMultiTenancyEnabled() to isMultiTenantConnectionProviderEnabled would probably be clearer than adding comments like this?

Sure, but that method's defined by an SPI so it would have to be done via the deprecation process. And note that isMultiTenancyEnabled() is already clearly documented to be about the presence of a MultiTenantConnectionProvider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design that we inherited here was that "Mutitenancy" = you have a MultiTenantConnectionProvider. And that was just wrong, and I guess we need to get away from it.

I disagree with this. And you even disagreed with it above.

There are a few situations here -

  1. You have different "Connection pools" based on tenant - this is the purpose of MultiTenantConnectionProvider
  2. You use column-based tenancy
  3. This new hybrid you are implementing which uses plain-old ConnectionProvider and simply switches the schema/catalog of the Connection based on the tenant-id associated with the Session

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sebersole The specific method isMultiTenancyEnabled() always meant that there was a MultiTenantConnectionProvider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure where I said otherwise, but yep.

Comment on lines +58 to +81

try {
session.afterObtainConnection( connection );
}
catch (SQLException e) {
try {
releaseConnection( connection );
}
catch (SQLException re) {
e.addSuppressed( re );
}
throw e;
}
return connection;
}

@Override
public void releaseConnection(Connection connection) throws SQLException {
try {
session.beforeReleaseConnection( connection );
}
catch (SQLException e) {
log.warn( "Error before releasing JDBC connection", e );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But given the 2 explicit options I'd defiitely prefer hibernate.tenancy.style=schema over hibernate.multi_tenant.set_schema=true.

Wouldn't it be a reasonable use case to use both datasource-based and schema-based multitenancy?

E.g. we could have:

tenant1.1 => datasource 1, schema1
tenant1.2 => datasource 1, schema2
tenant2.1 => datasource 2, schema1
tenant2.2 => datasource 2, schema2

Thinking about it, we could even throw discriminator-based multitenancy into the mix, if some tenants share the same datasource + schema...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants