Skip to content

[SPARK-4244] [SQL] Support Hive Generic UDFs with constant object inspector parameters #3109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,8 @@ private[hive] trait HiveInspectors {
})
ObjectInspectorFactory.getStandardConstantMapObjectInspector(keyOI, valueOI, map)
}
case Literal(_, dt) => sys.error(s"Hive doesn't support the constant type [$dt].")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to throw an error here? Why not just skip creating a constant object inspector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should enumerate all of the possible constant data type in this function, this actually gives us a chance to check if we really missed one, just as previously, we did miss all of the constant type by specifying data type in matching (see #3114)

case _ if expr.foldable => toInspector(Literal(expr.eval(), expr.dataType))
case _ => toInspector(expr.dataType)
}

Expand Down
14 changes: 6 additions & 8 deletions sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUdfs.scala
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils.ConversionHelper

import scala.collection.mutable.ArrayBuffer

import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector
import org.apache.hadoop.hive.serde2.objectinspector.{ObjectInspector, ConstantObjectInspector}
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory.ObjectInspectorOptions
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory
import org.apache.hadoop.hive.ql.exec.{UDF, UDAF}
Expand Down Expand Up @@ -108,9 +108,7 @@ private[hive] case class HiveSimpleUdf(functionClassName: String, children: Seq[
udfType != null && udfType.deterministic()
}

override def foldable = {
isUDFDeterministic && children.foldLeft(true)((prev, n) => prev && n.foldable)
}
override def foldable = isUDFDeterministic && children.forall(_.foldable)

// Create parameter converters
@transient
Expand Down Expand Up @@ -154,17 +152,17 @@ private[hive] case class HiveGenericUdf(functionClassName: String, children: Seq
protected lazy val argumentInspectors = children.map(toInspector)

@transient
protected lazy val returnInspector = function.initialize(argumentInspectors.toArray)
protected lazy val returnInspector =
function.initializeAndFoldConstants(argumentInspectors.toArray)

@transient
protected lazy val isUDFDeterministic = {
val udfType = function.getClass().getAnnotation(classOf[HiveUDFType])
(udfType != null && udfType.deterministic())
}

override def foldable = {
isUDFDeterministic && children.foldLeft(true)((prev, n) => prev && n.foldable)
}
override def foldable =
isUDFDeterministic && returnInspector.isInstanceOf[ConstantObjectInspector]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand all the contracts here, so please correct me if I'm missing something, but why does the return type have to be a Constant? It seems like if a UDF is deterministic it should be safe to fold as long as its children are foldable too, independent of the type of inspector it returns.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key change here is we need to get the folded result via Hive the method initializeAndFoldConstants of UDF, not the initialize method, that's why I made the change in L155-L156. UDF itself knows better how to constant fold the computing if it's applicable, and the return value of initializeAndFoldConstants tells us if it's can be or not and what the result it is.


@transient
protected lazy val deferedObjects =
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"aa":"10","aaaaaa":"11","aaaaaa":"12","bb12":"13","s14s14":"14"}
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,14 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter {
Locale.setDefault(originalLocale)
}

createQueryTest("constant object inspector for generic udf",
"""SELECT named_struct(
lower("AA"), "10",
repeat(lower("AA"), 3), "11",
lower(repeat("AA", 3)), "12",
printf("Bb%d", 12), "13",
repeat(printf("s%d", 14), 2), "14") FROM src LIMIT 1""")

createQueryTest("NaN to Decimal",
"SELECT CAST(CAST('NaN' AS DOUBLE) AS DECIMAL(1,1)) FROM src LIMIT 1")

Expand Down