You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++
### What changes were proposed in this pull request?
Currently, Spark treats 0.0 and -0.0 semantically equal, while it still retains the difference between them so that users can see -0.0 when displaying the data set.
The comparison expressions in Spark take care of the special floating numbers and implement the correct semantic. However, Spark doesn't always use these comparison expressions to compare values, and we need to normalize the special floating numbers before comparing them in these places:
1. GROUP BY
2. join keys
3. window partition keys
This PR fixes one more place that compares values without using comparison expressions: HyperLogLog++
### Why are the changes needed?
Fix the query result
### Does this PR introduce _any_ user-facing change?
Yes, the result of HyperLogLog++ becomes correct now.
### How was this patch tested?
a new test case, and a few more test cases that pass before this PR to improve test coverage.
Closes#30673 from cloud-fan/bug.
Authored-by: Wenchen Fan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 6fd2345)
Signed-off-by: Dongjoon Hyun <[email protected]>
Copy file name to clipboardExpand all lines: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala
0 commit comments