Skip to content

Python models fail when using ref() and source() functions #635

@PhilHenson82

Description

@PhilHenson82

Describe the bug

I have been trying out the python_model materialization and have found some issues with source() and ref() functions which do not resolve correctly. I think it's because the code in the glue__py_write_table macro in python_utils.sql is replacing the standard dbt definitions for ref() and source() with the following:

    def ref(self, name):
        return self.table_function(name)
    def source(self, source_name, table_name):
        return self.table_function(source_name + "." + table_name)

The source function assumes that the source_name variable will be the same as the schema for the source table but this does not have to be the case. For example, this is how my test renders in the standard source function from dbt:

sources = {"source_data_schema.source_table": "dev_sources.source_table"}

In the overridden version the adaptor is trying to get a dataframe for "source_data_schema.source_table (which doesn't exist) whereas it should be reading from dev_sources.source_table.

The ref() function is not taking the rendered name either, which at least in my case should include the schema_name. Again, the vanilla output is as follows:

refs = {"dim_my_table": "dev_schema.dim_my_table"}

But in the updated version it only uses "dim_my_table" when fetching the dataframe which fails for me with TABLE_OR_VIEW_NOT_FOUND

Steps To Reproduce

  1. Configure a sources.yml with a name and a schema that are not equal:
    sources:
  • name: source_data_schema
    schema: dev_sources
  1. Include a source from this in a python_model model
  2. Build the model
  3. Include a ref() to a model that is not in the default schema
  4. Build the model

Expected behavior

Model should build successfully and correctly read the ref() and source() dataframes

Screenshots and log output

16:17:10 1 of 1 START python python_model model dev_schema.test_object ....... [RUN]
DEBUG: Using Glue session: dbt-glue__GlueInteractiveSessionRole__69fe653a-1a94-46c7-a985-7b7c54d0a8d7
DEBUG: Statement completed successfully. Output: {'Data': {'TextPlain': ''}, 'ExecutionCount': 2, 'Status': 'error', 'ErrorName': 'AnalysisException', 'ErrorValue': "[TABLE_OR_VIEW_NOT_FOUND] The table or view source_data_schema.source_table cannot be found. Verify the spelling and correctness of the schema and catalog.\nIf you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.\nTo tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.;\n'UnresolvedRelation [source_data_schema, source_table], [], false\n", 'Traceback': ['Traceback (most recent call last):\n', ' File "", line 120, in \n', ' File "", line 7, in model\n', ' File "", line 99, in source\n', ' File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1667, in table\n return DataFrame(self._jsparkSession.table(tableName), self)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n', ' File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in call\n return_value = get_return_value(\n ^^^^^^^^^^^^^^^^^\n', ' File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 185, in deco\n raise converted from None\n', "pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view source_data_schema.source_table cannot be found. Verify the spelling and correctness of the schema and catalog.\nIf you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.\nTo tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.;\n'UnresolvedRelation [source_data_schema, source_table], [], false\n\n"]}
16:18:47 1 of 1 OK created python python_model model dev_schema.test_object .. [OK in 96.29s]
16:18:47
16:18:47 Finished running 1 python model model in 0 hours 1 minutes and 38.27 seconds (98.27s).
16:18:47
16:18:47 Completed successfully
16:18:47
16:18:47 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 NO-OP=0 TOTAL=1

System information

The output of dbt --version:
(dbt_venv) [airflow@19c2acef9fa7 se_intel]$ dbt --version
Core:

  • installed: 1.10.13
  • latest: 1.10.13 - Up to date!

Plugins:

  • glue: 1.10.13 - Up to date!
  • spark: 1.9.3 - Up to date!

The operating system you're using:
Ubuntu

The output of python --version:
Python 3.11.7

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions