-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Describe the Bug
All my dbt model runs crash around 5 minutes and 30 seconds into execution with the following error:
Error in GlueCursor execute: An error occurred (IllegalSessionStateException) when calling the GetStatement operation: Session unavailable, fail to call ReplServer.
The run writes files to an S3 bucket, however, the table does not get registered in the Glue Catalog.
The source tables are registered in the Glue Catalog and configured via sources.yml. Example for the conta table:
- name: conta
description: "Raw CDC data from DMS for conta"
meta:
external_location: s3://zap-ingestion/zapweb/conta/
external:
location: "s3://zap-ingestion/zapweb/conta/"
file_format: "parquet"These tables are built on top of compacted Parquet files generated by AWS DMS and processed using https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/samples/compaction . The goal is to build a CDC process and the 1sr run of each table may process from 10MB to 100GB of data.
Steps to Reproduce
dbt Model (brz_conta.sql)
{{ config(
tags=["brz"],
materialized='incremental',
incremental_strategy='append',
pre_hook="SET spark.sql.parquet.datetimeRebaseModeInRead = LEGACY"
) }}
WITH source_data AS (
SELECT
CAST(dms_commit_ts AS TIMESTAMP) AS loaded_at,
CAST(idconsumo AS STRING) AS idconsumo,
CAST(idsubscricao AS STRING) AS idsubscricao,
CAST(idprodutoinstantaneo AS STRING) AS idprodutoinstantaneo,
CAST(datainicio AS STRING) AS datainicio,
CAST(datafim AS STRING) AS datafim,
CAST(valor AS STRING) AS valor,
CAST(estado AS STRING) AS estado,
CAST(nagra AS STRING) AS nagra,
CAST(incarencia AS STRING) AS incarencia,
CAST(ideventoconsumo AS STRING) AS ideventoconsumo,
CAST(pontos AS STRING) AS pontos,
CAST(quota AS STRING) AS quota,
CAST(idstb AS STRING) AS idstb,
CAST(idequipamento AS STRING) AS idequipamento,
CAST(idcontaservico AS STRING) AS idcontaservico,
CAST(datahorainicio AS STRING) AS datahorainicio,
CAST(datahorafim AS STRING) AS datahorafim,
CAST(datacriacao AS STRING) AS datacriacao
FROM {{ source('zap_ingestion', 'consumo') }}
{% if is_incremental() %}
WHERE loaded_at > (SELECT MAX(loaded_at) FROM {{ this }})
{% endif %}
)
SELECT * FROM source_datadbt profiles.yml
prod:
type: glue
query_tag: zap-deltalake
role_arn: ROLE
region: af-south-1
workers: 2
worker_type: G.1X
session_provisioning_timeout_in_seconds: 300
idle_timeout: 15
create_new_session: true
default_arguments: "--enable-auto-scaling=true, --enable-metrics=true, --enable-continuous-cloudwatch-log=true, --enable-continuous-log-filter=true, --enable-spark-ui=true, --spark-event-logs-path=s3://BUCKET/prodLogDir/"
glue_version: "5.0"
schema: ice_lake
catalog_id: ID
location: s3://BRZ_BUCKET
datalake_formats: iceberg
custom_iceberg_catalog_namespace: ""
spark_conf:
spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.glue_catalog: org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.glue_catalog.warehouse: s3://BRZ_BUCKET/ice_lake/
spark.sql.warehouse: s3://BRZ_BUCKET/ice_lake/
spark.sql.catalog.glue_catalog.catalog-impl: org.apache.iceberg.aws.glue.GlueCatalog
spark.sql.catalog.glue_catalog.io-impl: org.apache.iceberg.aws.s3.S3FileIO
spark.serializer: org.apache.spark.serializer.KryoSerializerExpected Behavior
The model brz_conta should:
- Create an Iceberg table named
brz_contain theice_lakeschema. - Register it in the Glue Catalog.
- Store data files in the specified S3 path.
Screenshots and Log Output
DBT Output
Glue adapter: Error in GlueCursor (session_id=dbt-glue__service-role/AWSGlueServiceRole__e0ea176d-fa40-4e95-9709-d4ca72d5508a) execute:
An error occurred (IllegalSessionStateException) when calling the GetStatement operation:
Session dbt-glue__service-role/... unavailable, fail to call ReplServer
Runtime Error in model brz_conta (models/bronze/brz_conta.sql):
module 'dbt.exceptions' has no attribute 'ExecutableError'
AWS Glue Interactive Sessions Logs
There are several successful downloads before the process fails:
25/07/10 13:17:56 ERROR AsyncFileDownloader: TID: 246 - Download failed for ParquetFileChunk(path=s3://zap-ingestion/zapweb/conta/20250527-194357672.parquet, downloadSize=30)
java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.fs.FileSystem.hasPathCapability(org.apache.hadoop.fs.Path, String)" because "this.fileSystem" is null
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.hasPathCapability(EmrFileSystem.java:377) ~[emrfs-hadoop-assembly-2.69.0.jar:?]
ERROR 2025-07-10T13:17:56,872 311398 com.amazonaws.glue.is.LivyServerLauncher [main] 86 Got interrupted
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.InterruptedException: Session did not reach healthy state
Spark Output Sample
{
"Event": "SparkListenerJobEnd",
"Job ID": 3,
"Completion Time": 1752153476899,
"Job Result": {
"Result": "JobFailed",
"Exception": {
"Message": "Job 3 cancelled because SparkContext was shut down"
}
},
"Metrics Summary": {
"Rows Written": 6143958,
"Bytes Written": 259720476,
"Input Records Read": 6143958,
"Bytes Read": 617032200
}
}System Information
The output of dbt --version:
dbt-core==1.9.6
dbt-glue==1.9.4
The operating system you're using:
Docker image: ghcr.io/dbt-labs/dbt-spark:latest