[WIP][SQL] Clarify schema mismatch types in insertInto error #51446
+40
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR improves the error reporting behavior in
INSERT INTO
operations when there is a schema mismatch between a DataFrame and the target table. Specifically, it makes the error message more accurate when Spark attempts to insert data and incorrectly reports the schema of the input DataFrame due to type mismatches.Previously, the exception message did not clearly reflect the actual types involved, sometimes implying that the DataFrame column had a different type than it truly did. This patch ensures the reported types are correct and provides a clearer message, including:
Why are the changes needed?
This change addresses a confusing behavior during schema mismatches in insert operations. It improves the developer experience by giving precise and helpful diagnostics. This is especially important for debugging complex ETL pipelines or schema evolution issues.
Without this fix, developers may misinterpret the root cause of an error due to incorrect or vague type information in the exception message.
Does this PR introduce any user-facing change?
Yes.
This PR changes the error message users see when they attempt to insert a DataFrame into a table with mismatched schemas. While the functionality remains the same, the error message is more descriptive and accurate.
Before:
val df = Seq((2025, "Monaco GP")).toDF("race_year", "race_name") // race_year: INT
df.write.insertInto("target_table") // target_table expects race_year as STRING
Cannot safely cast 'race_year': string to int
After:
InsertInto schema mismatch at column 'race_year':
How was this patch tested?
IntegerType
andStringType
, which previously misreported the input type.sql/catalyst
,sql/core
) still pass.[SELF-TEST] InsertInto error message fix a1noh/spark#1