fix(plugin-iceberg): Do not check access control for MV data table#27728
fix(plugin-iceberg): Do not check access control for MV data table#27728tdcmeehan wants to merge 2 commits intoprestodb:masterfrom
Conversation
Reviewer's GuideAdjusts Iceberg materialized view handling so storage tables are treated as system-managed: analyzer uses the MV schema for view query/refresh, bypasses access control when expanding the storage table and rewrites output field origins to the MV, introduces a configurable default storage schema for storage tables, and adds tests covering access control and schema placement behavior. Sequence diagram for non-legacy Iceberg materialized view query expansionsequenceDiagram
actor User
participant Coordinator as Coordinator
participant Analyzer as StatementAnalyzer_main
participant MVAnalyzer as StatementAnalyzer_materializedView
participant StorageAnalyzer as StatementAnalyzer_storageTable_AllowAll
User->>Coordinator: submit SELECT from materialized_view
Coordinator->>Analyzer: analyze query
Analyzer->>MVAnalyzer: analyze viewQuery with MV schema
MVAnalyzer-->>Analyzer: viewQuery analysis complete
Analyzer->>StorageAnalyzer: analyze storage dataTable with AllowAllAccessControl
StorageAnalyzer-->>Analyzer: Scope with storage table fields
Analyzer->>StorageAnalyzer: rewrite Field origins to materialized_view name
StorageAnalyzer-->>Analyzer: RelationType with MV-qualified fields
Analyzer-->>Coordinator: analysis with column references bound to MV
Coordinator-->>User: execute plan (storage table access control bypassed during expansion)
Sequence diagram for Iceberg materialized view refresh session schema selectionsequenceDiagram
actor Admin
participant Coordinator as Coordinator
participant Analyzer as StatementAnalyzer_refresh
Admin->>Coordinator: REFRESH MATERIALIZED VIEW view_name
Coordinator->>Analyzer: visitRefreshMaterializedView
alt legacy_materialized_views is true
Analyzer->>Analyzer: ownerSessionSchema = view.getSchema()
else non-legacy path
Analyzer->>Analyzer: ownerSessionSchema = viewName.getSchemaName()
end
Analyzer->>Analyzer: buildOwnerSession(ownerSessionSchema)
Analyzer->>Analyzer: analyze refreshQuery with owner session
Analyzer-->>Coordinator: refresh analysis complete
Coordinator-->>Admin: refresh executed using MV schema for name resolution
Updated class diagram for IcebergConfig and related materialized view storage schema handlingclassDiagram
class IcebergConfig {
-String materializedViewStoragePrefix
-String materializedViewDefaultStorageSchema
+String getMaterializedViewStoragePrefix()
+IcebergConfig setMaterializedViewStoragePrefix(String materializedViewStoragePrefix)
+String getMaterializedViewDefaultStorageSchema()
+IcebergConfig setMaterializedViewDefaultStorageSchema(String materializedViewDefaultStorageSchema)
+int getMaterializedViewMaxChangedPartitions()
+IcebergConfig setMaterializedViewMaxChangedPartitions(int materializedViewMaxChangedPartitions)
}
class IcebergSessionProperties {
<<utility>>
+String MATERIALIZED_VIEW_STORAGE_PREFIX
+String MATERIALIZED_VIEW_DEFAULT_STORAGE_SCHEMA
+String MATERIALIZED_VIEW_MAX_CHANGED_PARTITIONS
+IcebergSessionProperties(IcebergConfig icebergConfig)
+static String getMaterializedViewStoragePrefix(ConnectorSession session)
+static Optional~String~ getMaterializedViewDefaultStorageSchema(ConnectorSession session)
+static int getMaterializedViewMaxChangedPartitions(ConnectorSession session)
}
class IcebergAbstractMetadata {
-SchemaTableName getStorageTableName(ConnectorSession session, SchemaTableName viewName, PropertiesMap properties)
}
class ConnectorSession
class SchemaTableName {
+String getSchemaName()
+String getTableName()
}
class PropertiesMap {
}
IcebergSessionProperties --> IcebergConfig
IcebergSessionProperties ..> ConnectorSession
IcebergAbstractMetadata ..> ConnectorSession
IcebergAbstractMetadata ..> SchemaTableName
IcebergAbstractMetadata ..> IcebergSessionProperties
IcebergAbstractMetadata ..> PropertiesMap
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
StatementAnalyzer.processMaterializedView, consider avoidingnew AllowAllAccessControl()per analysis invocation (e.g., use a shared instance or a more targeted access-control wrapper) to reduce object churn and make the intent of bypassing only storage-table checks clearer. - The new
materializedViewDefaultStorageSchemaconfig is treated as nullable (and wrapped in anOptionallater); adding explicit@Nullableannotations on the field and getter would make the null contract clearer and help static analysis.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `StatementAnalyzer.processMaterializedView`, consider avoiding `new AllowAllAccessControl()` per analysis invocation (e.g., use a shared instance or a more targeted access-control wrapper) to reduce object churn and make the intent of bypassing only storage-table checks clearer.
- The new `materializedViewDefaultStorageSchema` config is treated as nullable (and wrapped in an `Optional` later); adding explicit `@Nullable` annotations on the field and getter would make the null contract clearer and help static analysis.
## Individual Comments
### Comment 1
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewsBase.java" line_range="4923-4932" />
<code_context>
+ public void testDefaultStorageSchemaConfig()
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test case where explicit `storage_schema` table property overrides the default-storage-schema config
To fully cover the precedence rules, please add a complementary test where the materialized view is created with an explicit `storage_schema` table property. That test should assert that:
- The storage table is created in the explicitly configured schema rather than the one from `materialized_view_default_storage_schema`.
- Queries and refresh operate correctly in this setup, including enforcing access control on the explicitly chosen storage schema.
This will help catch regressions where the connector mishandles the per-MV `storage_schema` vs. the default schema.
Suggested implementation:
```java
@Test
public void testStorageSchemaTablePropertyOverridesDefault()
{
assertUpdate("CREATE SCHEMA IF NOT EXISTS mv_storage_isolated");
assertUpdate("CREATE SCHEMA IF NOT EXISTS mv_storage_explicit");
assertUpdate("CREATE TABLE test_storage_schema_property_base (id BIGINT, name VARCHAR)");
assertUpdate("INSERT INTO test_storage_schema_property_base VALUES (1, 'Alice'), (2, 'Bob')", 2);
Session sessionWithDefaultSchema = Session.builder(getSession())
.setCatalogSessionProperty("iceberg", "materialized_view_default_storage_schema", "mv_storage_isolated")
.build();
assertUpdate(
sessionWithDefaultSchema,
"CREATE MATERIALIZED VIEW test_storage_schema_property_mv " +
"WITH (storage_schema = 'mv_storage_explicit') AS " +
"SELECT id, name FROM test_storage_schema_property_base");
// Ensure that the underlying storage table is created in the explicit schema,
// not in the default storage schema.
assertQuery(
"SELECT table_schema " +
"FROM iceberg.information_schema.tables " +
"WHERE table_name LIKE 'test_storage_schema_property_mv$%'",
"VALUES 'mv_storage_explicit'");
// Queries against the materialized view should work correctly.
assertQuery(
sessionWithDefaultSchema,
"SELECT * FROM test_storage_schema_property_mv",
"SELECT * FROM test_storage_schema_property_base");
// Refresh should also work correctly with the explicit storage schema.
assertUpdate(
sessionWithDefaultSchema,
"INSERT INTO test_storage_schema_property_base VALUES (3, 'Charlie')",
1);
assertUpdate(
sessionWithDefaultSchema,
"REFRESH MATERIALIZED VIEW test_storage_schema_property_mv",
1);
assertQuery(
sessionWithDefaultSchema,
"SELECT * FROM test_storage_schema_property_mv",
"SELECT * FROM test_storage_schema_property_base");
}
@Test
public void testDefaultStorageSchemaConfig()
```
To also cover access control behavior on the explicit storage schema (as mentioned in your review comment), you will likely need to:
1. Reuse the existing access control setup patterns in this test class (for example, using `getQueryRunner().getAccessControl()` with `deny`/`reset` and the imported `privilege` / `SELECT_COLUMN` helpers) to:
- Deny the appropriate privilege on the storage table (or schema, depending on how other MV tests are structured) in `mv_storage_explicit`.
- Assert that queries/refresh on `test_storage_schema_property_mv` fail with the expected access control error when that privilege is denied.
2. Add the corresponding `try/finally` or cleanup to ensure access control state is reset after the test, following the conventions of the other tests in `TestIcebergMaterializedViewsBase`.
Since the exact privilege helper usage and object naming for the storage table likely mirror other tests in this file (e.g., other MV tests that check access control on storage tables), you should adapt the deny/assert logic to match those existing patterns.
</issue_to_address>
### Comment 2
<location path="presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergMaterializedViewsBase.java" line_range="4990-4999" />
<code_context>
+ public void testStorageSchemaLockdownEndToEnd()
</code_context>
<issue_to_address>
**suggestion (testing):** Consider extending the lockdown test to cover explicit `storage_schema` usage and legacy MVs
The current end-to-end scenario validates the default `materialized_view_default_storage_schema` path. To fully exercise the new behavior, please also add tests for:
1) A MV that specifies an explicit `storage_schema` in a locked-down schema, to confirm the same access-control behavior when the storage location is set per-view.
2) The `legacy_materialized_views=true` configuration (if supported in this test base), to verify legacy MVs keep their existing semantics and are not affected by the new bypass/rewiring logic.
Even lightweight tests here would improve confidence that both configuration modes and MV types behave as intended.
Suggested implementation:
```java
assertUpdate("CREATE TABLE storage_lockdown_base (id BIGINT, value BIGINT)");
assertUpdate("INSERT INTO storage_lockdown_base VALUES (1, 100), (2, 200), (3, 300)", 3);
// Materialized view that explicitly sets its storage_schema to the locked-down schema
assertUpdate("CREATE MATERIALIZED VIEW storage_lockdown_explicit_storage_mv " +
"WITH (storage_schema = 'mv_locked_storage') " +
"SECURITY DEFINER AS " +
"SELECT id, value FROM storage_lockdown_base");
// Legacy materialized view, to ensure legacy semantics are preserved and not affected by rewiring
Session legacyMvSession = Session.builder(getSession())
.setCatalogSessionProperty("iceberg", "legacy_materialized_views", "true")
.build();
assertUpdate(legacyMvSession, "CREATE MATERIALIZED VIEW storage_lockdown_legacy_mv AS " +
"SELECT id, value FROM storage_lockdown_base");
Session sessionWithDefaultSchema = Session.builder(getSession())
.setCatalogSessionProperty("iceberg", "materialized_view_default_storage_schema", "mv_locked_storage")
.build();
assertUpdate(sessionWithDefaultSchema, "CREATE MATERIALIZED VIEW storage_lockdown_definer_mv SECURITY DEFINER AS " +
```
1. Wherever this test currently asserts behavior for `storage_lockdown_definer_mv` (access control on the locked-down storage schema, refresh behavior, query behavior, etc.), add equivalent assertions for:
- `storage_lockdown_explicit_storage_mv` to prove that explicitly setting `WITH (storage_schema = 'mv_locked_storage')` follows the same lockdown/bypass semantics.
- `storage_lockdown_legacy_mv` to prove that enabling `legacy_materialized_views=true` preserves legacy behavior and is not impacted by the new rewiring logic.
2. In the cleanup section of this test (likely a `finally` block), ensure you also drop the new objects:
- `DROP MATERIALIZED VIEW storage_lockdown_explicit_storage_mv`
- `DROP MATERIALIZED VIEW storage_lockdown_legacy_mv`
- (Existing cleanup for `storage_lockdown_definer_mv`, `storage_lockdown_base`, and `mv_locked_storage` should remain.)
3. If the test currently configures a custom `AccessControl` or specific privileges for the locked-down schema, confirm that those checks are exercised against all three MVs (default storage schema MV, explicit storage_schema MV, and legacy MV).
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| public void testDefaultStorageSchemaConfig() | ||
| { | ||
| assertUpdate("CREATE SCHEMA IF NOT EXISTS mv_storage_isolated"); | ||
| assertUpdate("CREATE TABLE test_default_storage_schema_base (id BIGINT, name VARCHAR)"); | ||
| assertUpdate("INSERT INTO test_default_storage_schema_base VALUES (1, 'Alice'), (2, 'Bob')", 2); | ||
|
|
||
| Session sessionWithDefaultSchema = Session.builder(getSession()) | ||
| .setCatalogSessionProperty("iceberg", "materialized_view_default_storage_schema", "mv_storage_isolated") | ||
| .build(); | ||
|
|
There was a problem hiding this comment.
suggestion (testing): Add a test case where explicit storage_schema table property overrides the default-storage-schema config
To fully cover the precedence rules, please add a complementary test where the materialized view is created with an explicit storage_schema table property. That test should assert that:
- The storage table is created in the explicitly configured schema rather than the one from
materialized_view_default_storage_schema. - Queries and refresh operate correctly in this setup, including enforcing access control on the explicitly chosen storage schema.
This will help catch regressions where the connector mishandles the per-MV storage_schema vs. the default schema.
Suggested implementation:
@Test
public void testStorageSchemaTablePropertyOverridesDefault()
{
assertUpdate("CREATE SCHEMA IF NOT EXISTS mv_storage_isolated");
assertUpdate("CREATE SCHEMA IF NOT EXISTS mv_storage_explicit");
assertUpdate("CREATE TABLE test_storage_schema_property_base (id BIGINT, name VARCHAR)");
assertUpdate("INSERT INTO test_storage_schema_property_base VALUES (1, 'Alice'), (2, 'Bob')", 2);
Session sessionWithDefaultSchema = Session.builder(getSession())
.setCatalogSessionProperty("iceberg", "materialized_view_default_storage_schema", "mv_storage_isolated")
.build();
assertUpdate(
sessionWithDefaultSchema,
"CREATE MATERIALIZED VIEW test_storage_schema_property_mv " +
"WITH (storage_schema = 'mv_storage_explicit') AS " +
"SELECT id, name FROM test_storage_schema_property_base");
// Ensure that the underlying storage table is created in the explicit schema,
// not in the default storage schema.
assertQuery(
"SELECT table_schema " +
"FROM iceberg.information_schema.tables " +
"WHERE table_name LIKE 'test_storage_schema_property_mv$%'",
"VALUES 'mv_storage_explicit'");
// Queries against the materialized view should work correctly.
assertQuery(
sessionWithDefaultSchema,
"SELECT * FROM test_storage_schema_property_mv",
"SELECT * FROM test_storage_schema_property_base");
// Refresh should also work correctly with the explicit storage schema.
assertUpdate(
sessionWithDefaultSchema,
"INSERT INTO test_storage_schema_property_base VALUES (3, 'Charlie')",
1);
assertUpdate(
sessionWithDefaultSchema,
"REFRESH MATERIALIZED VIEW test_storage_schema_property_mv",
1);
assertQuery(
sessionWithDefaultSchema,
"SELECT * FROM test_storage_schema_property_mv",
"SELECT * FROM test_storage_schema_property_base");
}
@Test
public void testDefaultStorageSchemaConfig()To also cover access control behavior on the explicit storage schema (as mentioned in your review comment), you will likely need to:
- Reuse the existing access control setup patterns in this test class (for example, using
getQueryRunner().getAccessControl()withdeny/resetand the importedprivilege/SELECT_COLUMNhelpers) to:- Deny the appropriate privilege on the storage table (or schema, depending on how other MV tests are structured) in
mv_storage_explicit. - Assert that queries/refresh on
test_storage_schema_property_mvfail with the expected access control error when that privilege is denied.
- Deny the appropriate privilege on the storage table (or schema, depending on how other MV tests are structured) in
- Add the corresponding
try/finallyor cleanup to ensure access control state is reset after the test, following the conventions of the other tests inTestIcebergMaterializedViewsBase.
Since the exact privilege helper usage and object naming for the storage table likely mirror other tests in this file (e.g., other MV tests that check access control on storage tables), you should adapt the deny/assert logic to match those existing patterns.
| public void testStorageSchemaLockdownEndToEnd() | ||
| { | ||
| assertUpdate("CREATE SCHEMA IF NOT EXISTS mv_locked_storage"); | ||
| assertUpdate("CREATE TABLE storage_lockdown_base (id BIGINT, value BIGINT)"); | ||
| assertUpdate("INSERT INTO storage_lockdown_base VALUES (1, 100), (2, 200), (3, 300)", 3); | ||
|
|
||
| Session sessionWithDefaultSchema = Session.builder(getSession()) | ||
| .setCatalogSessionProperty("iceberg", "materialized_view_default_storage_schema", "mv_locked_storage") | ||
| .build(); | ||
|
|
There was a problem hiding this comment.
suggestion (testing): Consider extending the lockdown test to cover explicit storage_schema usage and legacy MVs
The current end-to-end scenario validates the default materialized_view_default_storage_schema path. To fully exercise the new behavior, please also add tests for:
- A MV that specifies an explicit
storage_schemain a locked-down schema, to confirm the same access-control behavior when the storage location is set per-view. - The
legacy_materialized_views=trueconfiguration (if supported in this test base), to verify legacy MVs keep their existing semantics and are not affected by the new bypass/rewiring logic.
Even lightweight tests here would improve confidence that both configuration modes and MV types behave as intended.
Suggested implementation:
assertUpdate("CREATE TABLE storage_lockdown_base (id BIGINT, value BIGINT)");
assertUpdate("INSERT INTO storage_lockdown_base VALUES (1, 100), (2, 200), (3, 300)", 3);
// Materialized view that explicitly sets its storage_schema to the locked-down schema
assertUpdate("CREATE MATERIALIZED VIEW storage_lockdown_explicit_storage_mv " +
"WITH (storage_schema = 'mv_locked_storage') " +
"SECURITY DEFINER AS " +
"SELECT id, value FROM storage_lockdown_base");
// Legacy materialized view, to ensure legacy semantics are preserved and not affected by rewiring
Session legacyMvSession = Session.builder(getSession())
.setCatalogSessionProperty("iceberg", "legacy_materialized_views", "true")
.build();
assertUpdate(legacyMvSession, "CREATE MATERIALIZED VIEW storage_lockdown_legacy_mv AS " +
"SELECT id, value FROM storage_lockdown_base");
Session sessionWithDefaultSchema = Session.builder(getSession())
.setCatalogSessionProperty("iceberg", "materialized_view_default_storage_schema", "mv_locked_storage")
.build();
assertUpdate(sessionWithDefaultSchema, "CREATE MATERIALIZED VIEW storage_lockdown_definer_mv SECURITY DEFINER AS " +- Wherever this test currently asserts behavior for
storage_lockdown_definer_mv(access control on the locked-down storage schema, refresh behavior, query behavior, etc.), add equivalent assertions for:storage_lockdown_explicit_storage_mvto prove that explicitly settingWITH (storage_schema = 'mv_locked_storage')follows the same lockdown/bypass semantics.storage_lockdown_legacy_mvto prove that enablinglegacy_materialized_views=truepreserves legacy behavior and is not impacted by the new rewiring logic.
- In the cleanup section of this test (likely a
finallyblock), ensure you also drop the new objects:DROP MATERIALIZED VIEW storage_lockdown_explicit_storage_mvDROP MATERIALIZED VIEW storage_lockdown_legacy_mv- (Existing cleanup for
storage_lockdown_definer_mv,storage_lockdown_base, andmv_locked_storageshould remain.)
- If the test currently configures a custom
AccessControlor specific privileges for the locked-down schema, confirm that those checks are exercised against all three MVs (default storage schema MV, explicit storage_schema MV, and legacy MV).
jja725
left a comment
There was a problem hiding this comment.
change make sense, thanks for the fix
|
Thanks for the release note! Please revise the release note entry to start with a keyword from the Order of changes in the Release Notes Guidelines. See the error in the failing CI check. |
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @tdcmeehan, lgtm! Just one nit about the docs.
| ================================================== ============================================================= ======================== | ||
| ``iceberg.materialized-view-storage-prefix`` Prefix applied to auto-generated storage table names. ``__mv_storage__`` | ||
|
|
||
| ``iceberg.materialized-view-default-storage-schema`` Schema in which storage tables are created when the (the view's own |
There was a problem hiding this comment.
nit: looks like this line is not aligned, causing the table to fail to display.

Description
Treat Iceberg materialized view storage tables as system-managed:
the MV, not the storage table.
sessions.
iceberg.materialized-view-default-storage-schemaconfig to place storagetables in a single schema.
Direct queries against a storage table by name still go through normal access
control. Legacy materialized views are unchanged.
Motivation and Context
The non-legacy MV path checked SELECT on the storage table against the session user.
That let an unprivileged user bypass
SECURITY DEFINERby querying__mv_storage__<view>directly, and prevented operators from locking down thestorage schema without breaking legitimate MV reads.
Storage tables are not user-facing — operators should be able to lock down the
storage schema and have that be the entire access-control story.
Impact
iceberg.materialized-view-default-storage-schema(default: unset).Per-MV
storage_schemastill overrides.Test Plan
Test integration tests have been added.
Contributor checklist
Release Notes
Summary by Sourcery
Treat Iceberg materialized view storage tables as system-managed resources and adjust analysis, configuration, and tests accordingly.
New Features:
Bug Fixes:
Enhancements:
Documentation:
Tests: