Skip to content

[Spark 3922] Refactor spark-core to use Utils.UTF_8 #2781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

[Spark 3922] Refactor spark-core to use Utils.UTF_8 #2781

wants to merge 2 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Oct 13, 2014

A global UTF8 constant is very helpful to handle encoding problems when converting between String and bytes. There are several solutions here:

  1. Add val UTF_8 = Charset.forName("UTF-8") to Utils.scala
  2. java.nio.charset.StandardCharsets.UTF_8 (require JDK7)
  3. io.netty.util.CharsetUtil.UTF_8
  4. com.google.common.base.Charsets.UTF_8
  5. org.apache.commons.lang.CharEncoding.UTF_8
  6. org.apache.commons.lang3.CharEncoding.UTF_8

IMO, I prefer option 1) because people can find it easily.

This is a PR for option 1) and only fixes Spark Core.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 13, 2014

/cc @rxin, @JoshRosen

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@SparkQA
Copy link

SparkQA commented Oct 13, 2014

QA tests have started for PR 2781 at commit 80f4af8.

  • This patch merges cleanly.

@srowen
Copy link
Member

srowen commented Oct 13, 2014

I vote for com.google.common.base.Charsets.UTF_8 now, and java.nio.charset.StandardCharsets.UTF_8 when Spark moves to Java 7+. No need to define this constant yet again.

@SparkQA
Copy link

SparkQA commented Oct 13, 2014

QA tests have finished for PR 2781 at commit 80f4af8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21681/
Test FAILed.

@vanzin
Copy link
Contributor

vanzin commented Oct 13, 2014

Like @srowen, I prefer to use existing constants.

Also, it might be worth it to import the constant directly and just use UTF_8 (instead of something like Utils.UTF_8) so that switching to the JDK7 version later is less noisy.

(Edit: just fixed typo.)

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have started for PR 2781 at commit 80f4af8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have finished for PR 2781 at commit 80f4af8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have started for PR 2781 at commit 80f4af8.

  • This patch merges cleanly.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 14, 2014

Also, it might be worth it to import the constant directory and just use UTF_8 (instead of something like Utils.UTF_8) so that switching to the JDK7 version later is less noisy.

Good point. I updated to use com.google.common.base.Charsets.UTF_8 directly in the spark core. I override my previous commits because they were redundant.

@SparkQA
Copy link

SparkQA commented Oct 14, 2014

QA tests have finished for PR 2781 at commit 80f4af8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have started for PR 2781 at commit 2d27423.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 23, 2014

QA tests have finished for PR 2781 at commit 2d27423.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Oct 27, 2014

retest this please

@rxin
Copy link
Contributor

rxin commented Oct 28, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22344 has started for PR 2781 at commit f974edd.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 28, 2014

Test build #22344 has finished for PR 2781 at commit f974edd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22344/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Oct 28, 2014

LGTM. @JoshRosen any further comment?

@JoshRosen
Copy link
Contributor

This looks good to me. Thanks!

@rxin
Copy link
Contributor

rxin commented Oct 28, 2014

Merging in master. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants