-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-36452][SQL]: Add the support in Spark for having group by map datatype column for the scenario that works in Hive #33679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
2b4bf6f
to
fc95f3f
Compare
fc95f3f
to
8db4d3a
Compare
8db4d3a
to
e6505d1
Compare
@@ -97,13 +97,18 @@ object InterpretedOrdering { | |||
object RowOrdering extends CodeGeneratorWithInterpretedFallback[Seq[SortOrder], BaseOrdering] { | |||
|
|||
/** | |||
* Returns true iff the data type can be ordered (i.e. can be sorted). | |||
* Returns true if the data type can be ordered (i.e. can be sorted). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iff is an abbreviation of if and only if
*/ | ||
def isOrderable(dataType: DataType): Boolean = dataType match { | ||
def isOrderable(dataType: DataType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fix #31967 first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon - Thanks for checking this PR. Yes we can wait for this PR #32552. The fix in this will work with group by, order by , partition by in window.
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Add the support in Spark for having group by map datatype column for the scenario that works in Hive.
In hive this scenario works fine
But in spark where the group by map column failed for this scenario where the map column is used in the select without any aggregation, The one that works in hive.
Why are the changes needed?
There is need to add the this scenario where grouping expression can have map type if aggregated expression does not have the that map type reference. This helps in migrating the user from hive to Spark.
After the code change
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added the unit test and also tested using spark-shell the scenario