Skip to content

[SPARK-12258] [SQL] passing null into ScalaUDF (follow-up) #10266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1029,24 +1029,27 @@ case class ScalaUDF(
// such as IntegerType, its javaType is `int` and the returned type of user-defined
// function is Object. Trying to convert an Object to `int` will cause casting exception.
val evalCode = evals.map(_.code).mkString
val funcArguments = converterTerms.zipWithIndex.map {
case (converter, i) =>
val eval = evals(i)
val dt = children(i).dataType
s"$converter.apply(${eval.isNull} ? null : (${ctx.boxedType(dt)}) ${eval.value})"
}.mkString(",")
val callFunc = s"${ctx.boxedType(ctx.javaType(dataType))} $resultTerm = " +
s"(${ctx.boxedType(ctx.javaType(dataType))})${catalystConverterTerm}" +
s".apply($funcTerm.apply($funcArguments));"
val (converters, funcArguments) = converterTerms.zipWithIndex.map { case (converter, i) =>
val eval = evals(i)
val argTerm = ctx.freshName("arg")
val convert = s"Object $argTerm = ${eval.isNull} ? null : $converter.apply(${eval.value});"
(convert, argTerm)
}.unzip

evalCode + s"""
${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
Boolean ${ev.isNull};
val callFunc = s"${ctx.boxedType(dataType)} $resultTerm = " +
s"(${ctx.boxedType(dataType)})${catalystConverterTerm}" +
s".apply($funcTerm.apply(${funcArguments.mkString(", ")}));"

s"""
$evalCode
${converters.mkString("\n")}
$callFunc

${ev.value} = $resultTerm;
${ev.isNull} = $resultTerm == null;
boolean ${ev.isNull} = $resultTerm == null;
${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)};
if (!${ev.isNull}) {
${ev.value} = $resultTerm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's it, the result type may be primitive and we should not assign null value to it, or NPE will happen.

Should we create a JIRA for it? I think it's a different bug comparing to the one you fixed in #10259

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we are fine because we check if (!${ev.isNull}) first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not cause NPE, but a compilation error?

For example, if dataType is Integer, line 1049 will be int ev.value = null
This statement will trigger a compilation error incompatible types. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's Integer b = null; int a = (Integer) b; , then NPE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can understand your fix, but I am trying to see what @cloud-fan said above. It sounds like he found another issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile At line 1049, we are using the default value. For primitive types, it will not be null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that he was trying to explain the NPE reported by @markhamstra.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, the value of ${ctx.defaultValue(dataType)} is not null but -1 when data type is Integer. I do not have more questions. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for being vague, I was trying to explain why the NPE happened and @davies has fixed it.

}
"""
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1144,9 +1144,13 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {

// passing null into the UDF that could handle it
val boxedUDF = udf[java.lang.Integer, java.lang.Integer] {
(i: java.lang.Integer) => if (i == null) -10 else i * 2
(i: java.lang.Integer) => if (i == null) -10 else null
}
checkAnswer(df.select(boxedUDF($"age")), Row(44) :: Row(-10) :: Nil)
checkAnswer(df.select(boxedUDF($"age")), Row(null) :: Row(-10) :: Nil)

sqlContext.udf.register("boxedUDF",
(i: java.lang.Integer) => (if (i == null) -10 else null): java.lang.Integer)
checkAnswer(sql("select boxedUDF(null), boxedUDF(-1)"), Row(-10, null) :: Nil)

val primitiveUDF = udf((i: Int) => i * 2)
checkAnswer(df.select(primitiveUDF($"age")), Row(44) :: Row(null) :: Nil)
Expand Down