-
Notifications
You must be signed in to change notification settings - Fork 88
Description
Describe the bug
I have been trying out the python_model materialization and have found some issues with source() and ref() functions which do not resolve correctly. I think it's because the code in the glue__py_write_table macro in python_utils.sql is replacing the standard dbt definitions for ref() and source() with the following:
def ref(self, name):
return self.table_function(name)
def source(self, source_name, table_name):
return self.table_function(source_name + "." + table_name)
The source function assumes that the source_name variable will be the same as the schema for the source table but this does not have to be the case. For example, this is how my test renders in the standard source function from dbt:
sources = {"source_data_schema.source_table": "dev_sources.source_table"}
In the overridden version the adaptor is trying to get a dataframe for "source_data_schema.source_table (which doesn't exist) whereas it should be reading from dev_sources.source_table.
The ref() function is not taking the rendered name either, which at least in my case should include the schema_name. Again, the vanilla output is as follows:
refs = {"dim_my_table": "dev_schema.dim_my_table"}
But in the updated version it only uses "dim_my_table" when fetching the dataframe which fails for me with TABLE_OR_VIEW_NOT_FOUND
Steps To Reproduce
- Configure a sources.yml with a name and a schema that are not equal:
sources:
- name: source_data_schema
schema: dev_sources
- Include a source from this in a python_model model
- Build the model
- Include a ref() to a model that is not in the default schema
- Build the model
Expected behavior
Model should build successfully and correctly read the ref() and source() dataframes
Screenshots and log output
16:17:10 1 of 1 START python python_model model dev_schema.test_object ....... [RUN]
DEBUG: Using Glue session: dbt-glue__GlueInteractiveSessionRole__69fe653a-1a94-46c7-a985-7b7c54d0a8d7
DEBUG: Statement completed successfully. Output: {'Data': {'TextPlain': ''}, 'ExecutionCount': 2, 'Status': 'error', 'ErrorName': 'AnalysisException', 'ErrorValue': "[TABLE_OR_VIEW_NOT_FOUND] The table or view source_data_schema.source_table cannot be found. Verify the spelling and correctness of the schema and catalog.\nIf you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.\nTo tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.;\n'UnresolvedRelation [source_data_schema, source_table], [], false\n", 'Traceback': ['Traceback (most recent call last):\n', ' File "", line 120, in \n', ' File "", line 7, in model\n', ' File "", line 99, in source\n', ' File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 1667, in table\n return DataFrame(self._jsparkSession.table(tableName), self)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n', ' File "/usr/lib/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in call\n return_value = get_return_value(\n ^^^^^^^^^^^^^^^^^\n', ' File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 185, in deco\n raise converted from None\n', "pyspark.errors.exceptions.captured.AnalysisException: [TABLE_OR_VIEW_NOT_FOUND] The table or view source_data_schema.source_table cannot be found. Verify the spelling and correctness of the schema and catalog.\nIf you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.\nTo tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS.;\n'UnresolvedRelation [source_data_schema, source_table], [], false\n\n"]}
16:18:47 1 of 1 OK created python python_model model dev_schema.test_object .. [OK in 96.29s]
16:18:47
16:18:47 Finished running 1 python model model in 0 hours 1 minutes and 38.27 seconds (98.27s).
16:18:47
16:18:47 Completed successfully
16:18:47
16:18:47 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 NO-OP=0 TOTAL=1
System information
The output of dbt --version:
(dbt_venv) [airflow@19c2acef9fa7 se_intel]$ dbt --version
Core:
- installed: 1.10.13
- latest: 1.10.13 - Up to date!
Plugins:
- glue: 1.10.13 - Up to date!
- spark: 1.9.3 - Up to date!
The operating system you're using:
Ubuntu
The output of python --version:
Python 3.11.7
Additional context
Add any other context about the problem here.