Skip to content

[SPARK-4244] [SQL] Support Hive Generic UDFs with constant object inspector parameters #3109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

chenghao-intel
Copy link
Contributor

Query SELECT named_struct(lower("AA"), "12", lower("Bb"), "13") FROM src LIMIT 1 will throw exception, some of the Hive Generic UDF/UDAF requires the input object inspector is ConstantObjectInspector, however, we won't get that before the expression optimization executed. (Constant Folding).

This PR is a work around to fix this. (As ideally, the output of LogicalPlan should be identical before and after Optimization).

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22926 has started for PR 3109 at commit a790467.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22926 has finished for PR 3109 at commit a790467.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22926/
Test FAILed.

@chenghao-intel
Copy link
Contributor Author

The failure should be fixed once #3114 merged.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22942 has started for PR 3109 at commit 7b0a080.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 5, 2014

Test build #22942 has finished for PR 3109 at commit 7b0a080.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22942/
Test FAILed.

@chenghao-intel
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23186 has started for PR 3109 at commit 7b0a080.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 11, 2014

Test build #23186 has finished for PR 3109 at commit 7b0a080.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23186/
Test PASSed.

@@ -262,7 +262,7 @@ object NullPropagation extends Rule[LogicalPlan] {
*/
object ConstantFolding extends Rule[LogicalPlan] {
def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case q: LogicalPlan => q transformExpressionsDown {
case q: LogicalPlan => q transformExpressionsUp {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? It seem like this is strictly less efficient because it will call evaluate for each subtree, constructing literals each time instead of doing the evaluation in a single pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry now I read the PR description more carefully and I think I understand. However, this now seems kind of hacky to me. We are now leaving the analysis phase with a tree that isn't actually valid and then hoping that constant folding is going to fix things down the road. This breaks the contract that it should be safe to inspect dataTypes once analysis is done.

Is this query valid in Hive? If so, I'd almost rather add more rules/logic to analysis (maybe we need to have a notion of "resolved" for HiveUDFs and attempt to constant fold them if they aren't resolved? What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is valid in Hive. And you're right, I will try your suggestion, it seems more reasonable.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23458 has started for PR 3109 at commit 2710d59.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23460 has started for PR 3109 at commit 487ff79.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23460 has finished for PR 3109 at commit 487ff79.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23460/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Nov 17, 2014

Test build #23458 has finished for PR 3109 at commit 2710d59.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23458/
Test FAILed.

@chenghao-intel chenghao-intel changed the title [SPARK-4244] [SQL] Workaround for Hive Generic UDFs with constant object inspector parameter [SPARK-4244] [SQL] Support Hive Generic UDFs with constant object inspector parameters Nov 17, 2014
@chenghao-intel
Copy link
Contributor Author

The test failure depends on #3308, should be OK after #3308 merged.

@chenghao-intel
Copy link
Contributor Author

test this please

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23522 has started for PR 3109 at commit 487ff79.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 18, 2014

Test build #23522 has finished for PR 3109 at commit 487ff79.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23522/
Test PASSed.

@@ -298,6 +298,8 @@ private[hive] trait HiveInspectors {
})
ObjectInspectorFactory.getStandardConstantMapObjectInspector(keyOI, valueOI, map)
}
case Literal(_, dt) => sys.error(s"Hive doesn't support the constant type [$dt].")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to throw an error here? Why not just skip creating a constant object inspector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enumerate all of the possible constant data type in this function, this actually gives us a chance to check if we really missed one, just as previously, we did miss all of the constant type by specifying data type in matching (see #3114)

@marmbrus
Copy link
Contributor

Thanks for explaining. Merged to master and 1.2.

andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Nov 21, 2014
…pector parameters

Query `SELECT named_struct(lower("AA"), "12", lower("Bb"), "13") FROM src LIMIT 1` will throw exception, some of the Hive Generic UDF/UDAF requires the input object inspector is `ConstantObjectInspector`, however, we won't get that before the expression optimization executed. (Constant Folding).

This PR is a work around to fix this. (As ideally, the `output` of LogicalPlan should be identical before and after Optimization).

Author: Cheng Hao <[email protected]>

Closes apache#3109 from chenghao-intel/optimized and squashes the following commits:

487ff79 [Cheng Hao] rebase to the latest master & update the unittest

(cherry picked from commit 84d79ee)
Signed-off-by: Michael Armbrust <[email protected]>
@chenghao-intel
Copy link
Contributor Author

I am closing it since it's already merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants