-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-8803] handle special characters in elements in crosstab #7201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -110,8 +110,12 @@ private[sql] object StatFunctions extends Logging { | |||
logWarning("The maximum limit of 1e6 pairs have been collected, which may not be all of " + | |||
"the pairs. Please try reducing the amount of distinct items in your columns.") | |||
} | |||
def cleanElement(element: Any): String = { | |||
if (element == null) "" else element.toString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably say "null" rather than empty for null
Test build #36442 has finished for PR 7201 at commit
|
Jenkins, retest this please. |
Test build #36452 has finished for PR 7201 at commit
|
@@ -85,6 +85,33 @@ class DataFrameStatSuite extends SparkFunSuite { | |||
} | |||
} | |||
|
|||
test("special crosstab elements (., '', null, ``)") { | |||
val data = Seq( | |||
("a", 1, "ho"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do u know what happens if one of the value is NaN?
Test build #36446 has finished for PR 7201 at commit
|
Test build #36466 has finished for PR 7201 at commit
|
lgtm |
Test build #36469 has finished for PR 7201 at commit
|
Thanks. Merging in master & branch-1.4. |
cc rxin Having back ticks or null as elements causes problems. Since elements become column names, we have to drop them from the element as back ticks are special characters. Having null throws exceptions, we could replace them with empty strings. Handling back ticks should be improved for 1.5 Author: Burak Yavuz <[email protected]> Closes #7201 from brkyvz/weird-ct-elements and squashes the following commits: e06b840 [Burak Yavuz] fix scalastyle 93a0d3f [Burak Yavuz] added tests for NaN and Infinity 9dba6ce [Burak Yavuz] address cr1 db71dbd [Burak Yavuz] handle special characters in elements in crosstab (cherry picked from commit 9b23e92) Signed-off-by: Reynold Xin <[email protected]> Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
This fixes a bug introduced in the cherry-pick of #7201 which led to a NullPointerException when cross-tabulating a data set that contains null values. Author: Josh Rosen <[email protected]> Closes #7295 from JoshRosen/SPARK-8903 and squashes the following commits: 5489948 [Josh Rosen] [SPARK-8903] Fix bug in cherry-pick of SPARK-8803
cc @rxin
Having back ticks or null as elements causes problems.
Since elements become column names, we have to drop them from the element as back ticks are special characters.
Having null throws exceptions, we could replace them with empty strings.
Handling back ticks should be improved for 1.5