Skip to content

[SPARK-2674] [SQL] [PySpark] support datetime type for SchemaRDD #1601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

davies
Copy link
Contributor

@davies davies commented Jul 26, 2014

Datetime and time in Python will be converted into java.util.Calendar after serialization, it will be converted into java.sql.Timestamp during inferSchema().

In javaToPython(), Timestamp will be converted into Calendar, then be converted into datetime in Python after pickling.

@SparkQA
Copy link

SparkQA commented Jul 26, 2014

QA tests have started for PR 1601. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17219/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 26, 2014

QA results for PR 1601:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17219/consoleFull

@@ -395,6 +395,11 @@ class SchemaRDD(
arr.asInstanceOf[Array[Any]].map {
element => rowToMap(element.asInstanceOf[Row], struct)
}
case t: java.sql.Timestamp => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pair of extra brackets can be removed.

@SparkQA
Copy link

SparkQA commented Jul 27, 2014

QA tests have started for PR 1601. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17236/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 27, 2014

QA results for PR 1601:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17236/consoleFull

java.util.{List,Set} => Seq
java.util.Map => Map

but it can not convert Seq into java.util.Set, so set() and tuple()
and array() can not been handled gracefully (back with the original
type).

We can not access items in ArrayType by position, but this is not defined
for set().

Do we still want to support set()/tuple()/array() ?
@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1601. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17278/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA results for PR 1601:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17278/consoleFull

@davies
Copy link
Contributor Author

davies commented Jul 28, 2014

Spark SQL does not support Set/List, so we should treat all sets from Python as Seq, then they can't be converted back. In other way, we could drop the set support right now.

@mateiz @marmbrus Do we need to clean up these in this PR, or do it later in another issue?

@marmbrus
Copy link
Contributor

Lets just remove it now. It should be as easy as adding an error and removing the tests in question.

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA tests have started for PR 1601. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17299/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 28, 2014

QA results for PR 1601:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17299/consoleFull

@davies
Copy link
Contributor Author

davies commented Jul 29, 2014

cc @kanzhang

@marmbrus
Copy link
Contributor

I've merged this into master.

@asfgit asfgit closed this in f0d880e Jul 29, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Datetime and time in Python will be converted into java.util.Calendar after serialization, it will be converted into java.sql.Timestamp during inferSchema().

In javaToPython(), Timestamp will be converted into Calendar, then be converted into datetime in Python after pickling.

Author: Davies Liu <[email protected]>

Closes apache#1601 from davies/date and squashes the following commits:

f0599b0 [Davies Liu] remove tests for sets and tuple in sql, fix list of list
c9d607a [Davies Liu] convert datetype for runtime
709d40d [Davies Liu] remove brackets
96db384 [Davies Liu] support datetime type for SchemaRDD
@davies davies deleted the date branch September 15, 2014 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants