Skip to content

[SPARK-4573] [SQL] Add SettableStructObjectInspector support in "wrap" function #3429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

chenghao-intel
Copy link
Contributor

Hive UDAF may create an customized object constructed by SettableStructObjectInspector, this is critical when integrate Hive UDAF with the refactor-ed UDAF interface.

Performance issue in wrap/unwrap since more match cases added, will do it in another PR.

@SparkQA
Copy link

SparkQA commented Nov 24, 2014

Test build #23782 has started for PR 3429 at commit 72e4332.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 24, 2014

Test build #23782 has finished for PR 3429 at commit 72e4332.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23782/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23826 has started for PR 3429 at commit 932940d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23829 has started for PR 3429 at commit f1b6749.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23826 has finished for PR 3429 at commit 932940d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23826/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23829 has finished for PR 3429 at commit f1b6749.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23829/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23844 has started for PR 3429 at commit 3ed284c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 25, 2014

Test build #23844 has finished for PR 3429 at commit 3ed284c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23844/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23859 has started for PR 3429 at commit 2977e9b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 26, 2014

Test build #23859 has finished for PR 3429 at commit 2977e9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23859/
Test PASSed.

@marmbrus
Copy link
Contributor

marmbrus commented Dec 2, 2014

Thanks for working on this and adding a bunch of tests. All of this is getting pretty complicated, so I think it would be good if you could add some more explanation to the scala doc of wrap/unwrap that discusses what types object inspectors exist and which will be returned for what type of expression.

checkValues(d, unwrap(wrap(null, toInspector(Literal(d, dt))), toInspector(Literal(d, dt))))
}

test("wrap / unwrap #6") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead of #1, #2, etc name them "Maps", "Arrays", etc

@SparkQA
Copy link

SparkQA commented Dec 2, 2014

Test build #24046 has started for PR 3429 at commit f5a40e8.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 2, 2014

Test build #24046 has finished for PR 3429 at commit f5a40e8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24046/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24084 has started for PR 3429 at commit 2b0561d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 3, 2014

Test build #24084 has finished for PR 3429 at commit 2b0561d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24084/
Test PASSed.

@chenghao-intel
Copy link
Contributor Author

@marmbrus I've updated the scala doc a little bit, I know that's quite complicated now, but we need to make thing right (to integrated with Hive UDFs seamlessly), since bugs like #2802 is depended on this.

I will make another PR to refactor / improve the performance described at https://github.com/apache/spark/pull/3429/files#diff-f88c3e731fcb17b1323b778807c35b38R167 once this PR merged.

@marmbrus
Copy link
Contributor

/cc @liancheng can you help review this?

I did a quick pass and this seems reasonable. @chenghao-intel is this ready to merge if @liancheng approves?

* Array[Byte]
* java.sql.Date
* java.sql.Timestamp
* Complicated Types =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Complicated" => "Complex"

@liancheng
Copy link
Contributor

In general this LGTM except for some minor styling comments, thanks!

@SparkQA
Copy link

SparkQA commented Dec 18, 2014

Test build #24590 has started for PR 3429 at commit 9f0aff3.

  • This patch merges cleanly.

@chenghao-intel
Copy link
Contributor Author

Thank you @liancheng , I've updated the code as feedback. @marmbrus I think this PR is ready to be merged once Jenkins agrees too.

@SparkQA
Copy link

SparkQA commented Dec 18, 2014

Test build #24590 has finished for PR 3429 at commit 9f0aff3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24590/
Test PASSed.

@marmbrus
Copy link
Contributor

Thanks! Merged to master.

@asfgit asfgit closed this in ae9f128 Dec 19, 2014
case x: ByteObjectInspector if x.preferWritable() => x.get(data)
case x: HiveDecimalObjectInspector => HiveShim.toCatalystDecimal(x, data)
case x: BinaryObjectInspector if x.preferWritable() =>
x.getPrimitiveWritableObject(data).copyBytes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this PR may have caused a build break for the hadoop1.0 profile:

https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/1245/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/

[warn] Note: Recompile with -Xlint:unchecked for details.
[info] Compiling 21 Scala sources and 1 Java source to /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/sql/hive/target/scala-2.10/classes...
[error] /home/jenkins/workspace/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE/hadoop1.0/label/centos/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala:300: value copyBytes is not a member of org.apache.hadoop.io.BytesWritable
[error]         x.getPrimitiveWritableObject(data).copyBytes()
[error]                                            ^
[error] one error found
[error] (hive/compile:compile) Compilation failed
[error] Total time: 117 s, completed Dec 18, 2014 8:51:43 PM
[error] Got a return code of 1 on line 155 of the run-tests script.
Build step 'Execute shell' marked build as failure

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this didn't break the pull request builder and it's nighttime now (so we're probably not merging tons of stuff), I'm going to hold off on reverting this for a little bit to see if we can come up with a quick hotfix. Otherwise, I'll revert this commit when I get up tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JoshRosen , I've created #3742 for this hot fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've merged that PR, so the build should be fixed. Thanks!

asfgit pushed a commit that referenced this pull request Dec 30, 2014
Since #3429 has been merged, the bug of wrapping to Writable for HiveGenericUDF is resolved, we can safely remove the foldable checking in `HiveGenericUdf.eval`, which discussed in #2802.

Author: Cheng Hao <[email protected]>

Closes #3745 from chenghao-intel/generic_udf and squashes the following commits:

622ad03 [Cheng Hao] Remove the unnecessary code change in Generic UDF
@chenghao-intel chenghao-intel deleted the settable_oi branch January 21, 2015 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants