Skip to content

[SPARK-8803] handle special characters in elements in crosstab #7201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

brkyvz
Copy link
Contributor

@brkyvz brkyvz commented Jul 2, 2015

cc @rxin

Having back ticks or null as elements causes problems.
Since elements become column names, we have to drop them from the element as back ticks are special characters.
Having null throws exceptions, we could replace them with empty strings.

Handling back ticks should be improved for 1.5

@@ -110,8 +110,12 @@ private[sql] object StatFunctions extends Logging {
logWarning("The maximum limit of 1e6 pairs have been collected, which may not be all of " +
"the pairs. Please try reducing the amount of distinct items in your columns.")
}
def cleanElement(element: Any): String = {
if (element == null) "" else element.toString
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably say "null" rather than empty for null

@SparkQA
Copy link

SparkQA commented Jul 2, 2015

Test build #36442 has finished for PR 7201 at commit db71dbd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 2, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36452 has finished for PR 7201 at commit 9dba6ce.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -85,6 +85,33 @@ class DataFrameStatSuite extends SparkFunSuite {
}
}

test("special crosstab elements (., '', null, ``)") {
val data = Seq(
("a", 1, "ho"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do u know what happens if one of the value is NaN?

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36446 has finished for PR 7201 at commit db71dbd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36466 has finished for PR 7201 at commit 93a0d3f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 3, 2015

lgtm

@SparkQA
Copy link

SparkQA commented Jul 3, 2015

Test build #36469 has finished for PR 7201 at commit e06b840.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 3, 2015

Thanks. Merging in master & branch-1.4.

@asfgit asfgit closed this in 9b23e92 Jul 3, 2015
asfgit pushed a commit that referenced this pull request Jul 3, 2015
cc rxin

Having back ticks or null as elements causes problems.
Since elements become column names, we have to drop them from the element as back ticks are special characters.
Having null throws exceptions, we could replace them with empty strings.

Handling back ticks should be improved for 1.5

Author: Burak Yavuz <[email protected]>

Closes #7201 from brkyvz/weird-ct-elements and squashes the following commits:

e06b840 [Burak Yavuz] fix scalastyle
93a0d3f [Burak Yavuz] added tests for NaN and Infinity
9dba6ce [Burak Yavuz] address cr1
db71dbd [Burak Yavuz] handle special characters in elements in crosstab

(cherry picked from commit 9b23e92)
Signed-off-by: Reynold Xin <[email protected]>

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/execution/stat/StatFunctions.scala
asfgit pushed a commit that referenced this pull request Jul 8, 2015
This fixes a bug introduced in the cherry-pick of #7201 which led to a NullPointerException when cross-tabulating a data set that contains null values.

Author: Josh Rosen <[email protected]>

Closes #7295 from JoshRosen/SPARK-8903 and squashes the following commits:

5489948 [Josh Rosen] [SPARK-8903] Fix bug in cherry-pick of SPARK-8803
@brkyvz brkyvz deleted the weird-ct-elements branch February 3, 2019 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants