Utility Version: 1.0 Release Date: 31-Jul-2025
A Python script that extracts permissions from Amazon Redshift (datashares or local databases) and generates AWS Lake Formation permission commands for seamless migration to a SageMaker Lakehouse architecture.
This tool automates the migration of data access permissions from Amazon Redshift to AWS Lake Formation by:
- Extracting user permissions from Redshift datashare databases or local databases
- Analyzing table-level grants using
SHOW GRANTS - Generating AWS CLI commands for Lake Formation permissions
- Supporting AWS Identity Center (IDC) authentication
- Creating rollback scripts for permission cleanup
- Identity Center Authentication: Supports AWS Identity Center (IDC) users/groups
- Permission Granularity: Schema-level or table-level permissions based on configuration
- Role Inheritance: Handles IDC roles with granted Redshift roles. This includes Redshift roles that have been granted permissions via other Redshift roles.
- Automated Scripts: Generates executable bash scripts with AWS CLI commands
- Rollback Support: Creates corresponding rollback scripts for permission cleanup
- Comprehensive Logging: Structured logging to both console and timestamped log files
- Debug Mode: Provides detailed logging and user ID lookup commands
- Python 3.6+
- Required Python packages:
psycopg2,configparser,boto3 - The Redshift cluster must use AWS Identity Center (IDC) authentication. Use the RedshiftIDCMigrationUtility for migrating from Redshift local users and groups/roles.
- An AWS IAM user with an access key to allow programmatic calls to AWS from the AWS CLI
- AWS CLI configured with appropriate permissions
- Admin access to Amazon Redshift cluster with Redshift authentification details stored in AWS Secret Manager
- Lake Formation permissions to grant/revoke access
- A Glue Data Catalog created from the Redshift producer cluster. See Registering a cluster to the AWS Glue Data Catalog
- Install dependencies:
pip install psycopg2-binary configparser boto3- Configure AWS CLI (if not already done):
aws configureCreate a config.ini file with the following structure:
[redshift]
host = your-redshift-cluster.region.redshift.amazonaws.com
port = 5439
dbname = your-database-name
# AWS Secrets Manager secret containing Redshift credentials
secret_name = redshift/credentials
[aws]
account_id = 123456789012
region = us-east-1
producer_catalog = your-producer-catalog-name
# AWS CLI profile to use (optional)
config_profile = your-aws-profile
# Identity Store ID for IDC user lookups
identity_store_id = d-1234567890
[parameters]
# Permission source: 'datashare' for datashare permissions, 'local' for local database permissions
permissions_type = datashare
datashare_database_name = your_datashare_db
# Set to true if the datashare database was created with the 'WITH PERMISSIONS' clause
datashare_object_level_permissions = truehost: Redshift cluster endpointport: Port number (default: 5439)dbname: Database name to connect tosecret_name: AWS Secrets Manager secret name containing Redshift credentials (must contain 'username' and 'password' keys)
-
account_id: AWS account ID for Lake Formation -
region: AWS region for API calls (used for Secrets Manager) -
producer_catalog: Producer catalog name in Lake Formation -
config_profile: AWS CLI profile name (optional, used for both AWS CLI and boto3) -
identity_store_id: Identity Store ID (required)
permissions_type: Permission source -datasharefor datashare permissions,localfor local database permissionsdatashare_database_name: Specific datashare database to process (used when permissions_type = datashare)datashare_object_level_permissions: Enable table-level permissions for datashare mode (true/false). Use true if the datashare was created 'WITH PERMISSIONS' (used when permissions_type = datashare)
Note: If python command doesn't work, try using python3 instead.
python rs_privs_to_lf.pypython rs_privs_to_lf.py --config /path/to/config.inipython rs_privs_to_lf.py --output-dir /path/to/outputpython rs_privs_to_lf.py --debug--config, -c: Path to configuration file (default:config.ini)--output-dir, -o: Base directory for generated scripts (default:output)--debug: Enable debug output and generate user ID lookup commands
- Processes only users/roles prefixed with the identity provider namespace (automatically detected from SVV_IDENTITY_PROVIDERS)
- Looks up actual IDC user/group IDs using AWS CLI
- Generates ARNs in format:
arn:aws:identitystore:::user/USER_IDorarn:aws:identitystore:::group/GROUP_ID - Handles role inheritance from granted Redshift roles in both datashare and local modes
- Extracts permissions from Redshift datashare databases
- Uses
datashare_database_nameparameter to specify target datashare - Processes users with USAGE permissions on datashare databases
- Uses
SHOW GRANTSfor table-level permission analysis
- Extracts permissions from the local Redshift database (specified in
dbname) - Finds IDC users and roles with table-level permissions on local database tables
- Excludes system schemas (
information_schema,pg_catalog,pg_internal) - Supports IDC role inheritance (roles granted to IDC roles)
- Uses
SHOW GRANTSfor table-level permission analysis - Ignores
datashare_database_nameanddatashare_object_level_permissionsparameters
- Grants
SELECTandDESCRIBEpermissions on all tables in each schema - Uses
TableWildcardsyntax for efficient bulk permissions - Based on database
USAGEpermissions
- Datashare mode: Analyzes individual table grants using
SHOW GRANTS - Local mode: Uses
SHOW GRANTSfor table-level permission analysis - Preserves specific permissions (
SELECT,INSERT,UPDATE,DELETE) - Automatically adds
DESCRIBEpermission - Filters out unsupported Lake Formation permissions (
RULE,TRIGGER, etc.) - Handles IDC role inheritance in both modes
The script generates timestamped files in the output directory:
lakeformation_permissions_YYYYMMDD_HHMMSS.sh: Main permission grant scriptrollback_lakeformation_YYYYMMDD_HHMMSS.sh: Rollback script to revoke permissionsget_user_id_commands_YYYYMMDD_HHMMSS.sh: Debug script for IDC user ID lookups (debug mode only)
logs/lakeformation_migration_YYYYMMDD_HHMMSS.log: Detailed execution logs with timestamps- Logs are automatically created in a
logs/directory (created if it doesn't exist) - Both console and file logging are enabled simultaneously
#!/bin/bash
# AWS LakeFormation permission commands generated from Redshift datashare permissions
# Generated on: 2024-01-15 10:30:45
# Authentication type: IDC
# Object-level permissions: true
aws lakeformation grant-permissions \
--catalog-id "123456789012" \
--principal DataLakePrincipalIdentifier=arn:aws:identitystore:::user/12345678-1234-1234-1234-123456789012 \
--resource '{"Table": {"CatalogId": "123456789012:producer-catalog", "DatabaseName": "schema_name", "Name": "table_name"}}' \
--permissions "SELECT" "DESCRIBE"The script includes comprehensive error handling for:
- Missing configuration parameters
- Redshift connection failures
- AWS CLI command failures
- Missing user/group IDs in Identity Center
- Invalid table grants
- Connection Errors: Verify Redshift credentials and network connectivity
- Missing User IDs: Ensure Identity Center users exist and are properly named
- Permission Denied: Verify AWS CLI permissions for Lake Formation operations
- Empty Results:
- Datashare mode: Check datashare database names and user permissions
- Local mode: Verify you're connected to the correct cluster and database
- Ensure identity provider namespace prefixed Redshift users/roles exist
- No Tables Found:
- Local mode: Confirm tables exist in non-system schemas
- Check if you're connected to the intended Redshift cluster
Enable debug mode to see:
- Current database connection details
- Available schemas in the database
- Detailed user and table information
- Sample table grants
- Enhanced logging with DEBUG level messages
- User ID lookup commands for Identity Center authentication
- Query execution details for troubleshooting
The utility provides comprehensive logging functionality:
- Dual Output: Logs to both console and timestamped log files
- Automatic Directory Creation: Creates
logs/directory if it doesn't exist - Structured Format: Timestamp, log level, and message for each entry
- Multiple Log Levels: INFO, ERROR, WARNING, and DEBUG (with --debug flag)
- Operation Tracking: Logs all major operations, errors, and progress updates
- Default:
logs/lakeformation_migration_YYYYMMDD_HHMMSS.login current directory - With Output Directory:
{output-dir}/logs/lakeformation_migration_YYYYMMDD_HHMMSS.log
- Utility startup and version information
- Configuration loading and validation
- AWS Secrets Manager operations
- Redshift database connections
- Permission extraction progress
- Script generation results
- Error messages and troubleshooting information
- Debug details (when --debug flag is used)
- Use least-privilege IAM policies
- Review generated scripts before execution
- Test in non-production environments first
- Ensure your AWS Secrets Manager secret contains the keys 'username' and 'password'
- The IAM role/user running the script needs secretsmanager:GetSecretValue permission
Authors: adamgatt@amazon.co.uk, ziadwali@amazon.fr
NOTE: This utility is continuously enhanced to close any gaps and add additional functionality. Please send your issues, feedback, and enhancement requests to the authors with subject line: “[issue/feedback/enhancement] Amazon Redshift Permissions to Lake Formation Migration Utility”