Dbt runs fail after ~350s using Glue 5.0 with llegalSessionStateException

### Describe the Bug

All my dbt model runs crash around **5 minutes and 30 seconds** into execution with the following error:

```
Error in GlueCursor execute: An error occurred (IllegalSessionStateException) when calling the GetStatement operation: Session unavailable, fail to call ReplServer.
```

The run writes files to an S3 bucket, however, the table does **not get registered** in the Glue Catalog.

The source tables are registered in the Glue Catalog and configured via `sources.yml`. Example for the `conta` table:

```yaml
- name: conta
  description: "Raw CDC data from DMS for conta"
  meta:
    external_location: s3://zap-ingestion/zapweb/conta/
  external:
    location: "s3://zap-ingestion/zapweb/conta/"
    file_format: "parquet"
```

These tables are built on top of **compacted Parquet files** generated by AWS DMS and processed using https://github.com/awslabs/aws-glue-blueprint-libs/tree/master/samples/compaction . The goal is to build a CDC process and the 1sr run of each table may process from 10MB to 100GB of data.

---

### Steps to Reproduce

#### dbt Model (`brz_conta.sql`)

```sql
{{ config(
    tags=["brz"],
    materialized='incremental',
    incremental_strategy='append',
    pre_hook="SET spark.sql.parquet.datetimeRebaseModeInRead = LEGACY"
) }}

WITH source_data AS (
    SELECT 
        CAST(dms_commit_ts AS TIMESTAMP) AS loaded_at,
        CAST(idconsumo AS STRING) AS idconsumo,
        CAST(idsubscricao AS STRING) AS idsubscricao,
        CAST(idprodutoinstantaneo AS STRING) AS idprodutoinstantaneo,
        CAST(datainicio AS STRING) AS datainicio,
        CAST(datafim AS STRING) AS datafim,
        CAST(valor AS STRING) AS valor,
        CAST(estado AS STRING) AS estado,
        CAST(nagra AS STRING) AS nagra,
        CAST(incarencia AS STRING) AS incarencia,
        CAST(ideventoconsumo AS STRING) AS ideventoconsumo,
        CAST(pontos AS STRING) AS pontos,
        CAST(quota AS STRING) AS quota,
        CAST(idstb AS STRING) AS idstb,
        CAST(idequipamento AS STRING) AS idequipamento,
        CAST(idcontaservico AS STRING) AS idcontaservico,
        CAST(datahorainicio AS STRING) AS datahorainicio,
        CAST(datahorafim AS STRING) AS datahorafim,
        CAST(datacriacao AS STRING) AS datacriacao
    FROM {{ source('zap_ingestion', 'consumo') }}
    {% if is_incremental() %}
        WHERE loaded_at > (SELECT MAX(loaded_at) FROM {{ this }})
    {% endif %}
)

SELECT * FROM source_data
```

#### dbt `profiles.yml`

```yaml
prod:
  type: glue
  query_tag: zap-deltalake
  role_arn: ROLE
  region: af-south-1
  workers: 2
  worker_type: G.1X
  session_provisioning_timeout_in_seconds: 300
  idle_timeout: 15
  create_new_session: true
  default_arguments: "--enable-auto-scaling=true, --enable-metrics=true, --enable-continuous-cloudwatch-log=true, --enable-continuous-log-filter=true, --enable-spark-ui=true, --spark-event-logs-path=s3://BUCKET/prodLogDir/"
  glue_version: "5.0"
  schema: ice_lake
  catalog_id: ID
  location: s3://BRZ_BUCKET
  datalake_formats: iceberg
  custom_iceberg_catalog_namespace: ""
  spark_conf:
    spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
    spark.sql.catalog.glue_catalog: org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.glue_catalog.warehouse: s3://BRZ_BUCKET/ice_lake/
    spark.sql.warehouse: s3://BRZ_BUCKET/ice_lake/
    spark.sql.catalog.glue_catalog.catalog-impl: org.apache.iceberg.aws.glue.GlueCatalog
    spark.sql.catalog.glue_catalog.io-impl: org.apache.iceberg.aws.s3.S3FileIO
    spark.serializer: org.apache.spark.serializer.KryoSerializer
```

---

### Expected Behavior

The model `brz_conta` should:
- Create an Iceberg table named `brz_conta` in the `ice_lake` schema.
- Register it in the Glue Catalog.
- Store data files in the specified S3 path.

---

### Screenshots and Log Output

#### DBT Output

```
Glue adapter: Error in GlueCursor (session_id=dbt-glue__service-role/AWSGlueServiceRole__e0ea176d-fa40-4e95-9709-d4ca72d5508a) execute:
An error occurred (IllegalSessionStateException) when calling the GetStatement operation:
Session dbt-glue__service-role/... unavailable, fail to call ReplServer

Runtime Error in model brz_conta (models/bronze/brz_conta.sql):
module 'dbt.exceptions' has no attribute 'ExecutableError'
```

#### AWS Glue Interactive Sessions Logs

There are several successful downloads before the process fails:

```
25/07/10 13:17:56 ERROR AsyncFileDownloader: TID: 246 - Download failed for ParquetFileChunk(path=s3://zap-ingestion/zapweb/conta/20250527-194357672.parquet, downloadSize=30)
java.lang.NullPointerException: Cannot invoke "org.apache.hadoop.fs.FileSystem.hasPathCapability(org.apache.hadoop.fs.Path, String)" because "this.fileSystem" is null
	at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.hasPathCapability(EmrFileSystem.java:377) ~[emrfs-hadoop-assembly-2.69.0.jar:?]

ERROR	2025-07-10T13:17:56,872	311398	com.amazonaws.glue.is.LivyServerLauncher	[main]	86	Got interrupted
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.InterruptedException: Session did not reach healthy state
```

#### Spark Output Sample

```json
{
  "Event": "SparkListenerJobEnd",
  "Job ID": 3,
  "Completion Time": 1752153476899,
  "Job Result": {
    "Result": "JobFailed",
    "Exception": {
      "Message": "Job 3 cancelled because SparkContext was shut down"
    }
  },
  "Metrics Summary": {
    "Rows Written": 6143958,
    "Bytes Written": 259720476,
    "Input Records Read": 6143958,
    "Bytes Read": 617032200
  }
}
```

---

### System Information

**The output of `dbt --version`:**

```
dbt-core==1.9.6
dbt-glue==1.9.4
```

**The operating system you're using:**

```
Docker image: ghcr.io/dbt-labs/dbt-spark:latest
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dbt runs fail after ~350s using Glue 5.0 with llegalSessionStateException #576

Describe the Bug

Steps to Reproduce

dbt Model (`brz_conta.sql`)

dbt `profiles.yml`

Expected Behavior

Screenshots and Log Output

DBT Output

AWS Glue Interactive Sessions Logs

Spark Output Sample

System Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dbt runs fail after ~350s using Glue 5.0 with llegalSessionStateException #576

Description

Describe the Bug

Steps to Reproduce

dbt Model (brz_conta.sql)

dbt profiles.yml

Expected Behavior

Screenshots and Log Output

DBT Output

AWS Glue Interactive Sessions Logs

Spark Output Sample

System Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

dbt Model (`brz_conta.sql`)

dbt `profiles.yml`