Skip to content

[SPARK-4336] [SQL] support type auto detection from json string #3202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

adrian-wang
Copy link
Contributor

This is for comment from @davies in #3012
By this patch, we can detect TimestampType and DateType from a json string.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23205 has started for PR 3202 at commit 90e3624.

  • This patch merges cleanly.

@@ -131,6 +131,19 @@ protected[sql] object DataTypeConversions {
StructType(structType.getFields.map(asScalaStructField))
}

def guessTypeFromString(s:String): DataType = {
try {
val res = stringToTime(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will be called against every string in json, so it should be very fast. It should fail fast in most cases if the s is not a time, the path in catch is extremely slow, see [1].

[1] http://stackoverflow.com/questions/299068/how-slow-are-java-exceptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will use regex to judge then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex maybe is not fast enough, we still need some quick test to fail fast, such as check it contains some special characters?

Could you post some micro benchmark of it?

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23205 has finished for PR 3202 at commit 90e3624.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23205/
Test PASSed.

@marmbrus
Copy link
Contributor

Thanks for working on this! Since it looks like there is some concern about speed, and since we are trying to keep the PR queue small, I propose we close this issue for now and reopen once the performance comparisons are done.

@asfgit asfgit closed this in ca12608 Dec 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants