-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-4244] [SQL] Support Hive Generic UDFs with constant object inspector parameters #3109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #22926 has started for PR 3109 at commit
|
Test build #22926 has finished for PR 3109 at commit
|
Test FAILed. |
The failure should be fixed once #3114 merged. |
Test build #22942 has started for PR 3109 at commit
|
Test build #22942 has finished for PR 3109 at commit
|
Test FAILed. |
test this please |
Test build #23186 has started for PR 3109 at commit
|
Test build #23186 has finished for PR 3109 at commit
|
Test PASSed. |
@@ -262,7 +262,7 @@ object NullPropagation extends Rule[LogicalPlan] { | |||
*/ | |||
object ConstantFolding extends Rule[LogicalPlan] { | |||
def apply(plan: LogicalPlan): LogicalPlan = plan transform { | |||
case q: LogicalPlan => q transformExpressionsDown { | |||
case q: LogicalPlan => q transformExpressionsUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? It seem like this is strictly less efficient because it will call evaluate for each subtree, constructing literals each time instead of doing the evaluation in a single pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry now I read the PR description more carefully and I think I understand. However, this now seems kind of hacky to me. We are now leaving the analysis phase with a tree that isn't actually valid and then hoping that constant folding is going to fix things down the road. This breaks the contract that it should be safe to inspect dataTypes once analysis is done.
Is this query valid in Hive? If so, I'd almost rather add more rules/logic to analysis (maybe we need to have a notion of "resolved" for HiveUDFs and attempt to constant fold them if they aren't resolved? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is valid in Hive. And you're right, I will try your suggestion, it seems more reasonable.
Test build #23458 has started for PR 3109 at commit
|
2710d59
to
487ff79
Compare
Test build #23460 has started for PR 3109 at commit
|
Test build #23460 has finished for PR 3109 at commit
|
Test FAILed. |
Test build #23458 has finished for PR 3109 at commit
|
Test FAILed. |
test this please |
Test build #23522 has started for PR 3109 at commit
|
Test build #23522 has finished for PR 3109 at commit
|
Test PASSed. |
@@ -298,6 +298,8 @@ private[hive] trait HiveInspectors { | |||
}) | |||
ObjectInspectorFactory.getStandardConstantMapObjectInspector(keyOI, valueOI, map) | |||
} | |||
case Literal(_, dt) => sys.error(s"Hive doesn't support the constant type [$dt].") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to throw an error here? Why not just skip creating a constant object inspector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should enumerate all of the possible constant data type in this function, this actually gives us a chance to check if we really missed one, just as previously, we did miss all of the constant type by specifying data type in matching (see #3114)
Thanks for explaining. Merged to master and 1.2. |
…pector parameters Query `SELECT named_struct(lower("AA"), "12", lower("Bb"), "13") FROM src LIMIT 1` will throw exception, some of the Hive Generic UDF/UDAF requires the input object inspector is `ConstantObjectInspector`, however, we won't get that before the expression optimization executed. (Constant Folding). This PR is a work around to fix this. (As ideally, the `output` of LogicalPlan should be identical before and after Optimization). Author: Cheng Hao <[email protected]> Closes apache#3109 from chenghao-intel/optimized and squashes the following commits: 487ff79 [Cheng Hao] rebase to the latest master & update the unittest (cherry picked from commit 84d79ee) Signed-off-by: Michael Armbrust <[email protected]>
I am closing it since it's already merged. |
Query
SELECT named_struct(lower("AA"), "12", lower("Bb"), "13") FROM src LIMIT 1
will throw exception, some of the Hive Generic UDF/UDAF requires the input object inspector isConstantObjectInspector
, however, we won't get that before the expression optimization executed. (Constant Folding).This PR is a work around to fix this. (As ideally, the
output
of LogicalPlan should be identical before and after Optimization).