-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-2890][SQL] Allow reading of data when case insensitive resolution could cause possible ambiguity. #2209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 2209 at commit
|
QA tests have finished for PR 2209 at commit
|
Reading parquet files in |
val deduplicatedFields = convertedFields.groupBy(_.name).map { | ||
case (fieldName, versions) if versions.size == 1 => versions.head | ||
case (fieldName, versions) if versions.size > 1 => | ||
logWarning(s"Resolving attributes case insensitively is ambiguous for $fieldName") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide more information on which column (with the original column name) we will keep in the lowerCaseSchema
?
I actually encountered the error with a jsonRDD, but yeah it could happen with parquet files as well. Your comment about joins though makes me think that we should just get rid of this check entirely. We can throw an error when your query is invalid, but throwing an exception just because at some point in a query something could be ambiguous seems overly restrictive. |
Sounds good. I was not sure how to correctly query those results with ambiguous schemas when I added that check. Seems an more informative logging entry is better than an exception. |
QA tests have started for PR 2209 at commit
|
Tests timed out after a configured wait of |
Jenkins, test this please. |
StructField(f.name.toLowerCase(), lowerCaseSchema(f.dataType), f.nullable))) | ||
val convertedFields = fields.map(f => | ||
StructField(f.name.toLowerCase, lowerCaseSchema(f.dataType), f.nullable)) | ||
val deduplicatedFields = convertedFields.groupBy(_.name).map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, this reorders the schema and breaks things. Props to @andyk.
QA tests have started for PR 2209 at commit
|
QA tests have finished for PR 2209 at commit
|
QA tests have started for PR 2209 at commit
|
QA tests have finished for PR 2209 at commit
|
a703ff4
to
729cca4
Compare
QA tests have started for PR 2209 at commit
|
Tests timed out after a configured wait of |
Jenkins will actually show you how long the tests took, which can be helpful in narrowing down why we're seeing these timeouts. In this case, it looks like the majority of the time is spent in certain Hive compatibility tests: |
@JoshRosen I am hoping that #2164 will fix the test time outs. |
QA tests have started for PR 2209 at commit
|
QA tests have finished for PR 2209 at commit
|
Merged to master. Thanks for looking this over! |
Throwing an error in the constructor makes it possible to run queries, even when there is no actual ambiguity. Remove this check in favor of throwing an error in analysis when they query is actually is ambiguous.
Also took the opportunity to add test cases that would have caught a subtle bug in my first attempt at fixing this and refactor some other test code.