[SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in elt #19964

viirya · 2017-12-13T10:57:47Z

What changes were proposed in this pull request?

In SPARK-22550 which fixes 64KB JVM bytecode limit problem with elt, buildCodeBlocks is used to split codes. However, we should use splitExpressionsWithCurrentInputs because it considers both normal and wholestage codgen (it is not supported yet, so it simply doesn't split the codes).

How was this patch tested?

Existing tests.

viirya · 2017-12-13T11:01:56Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

+        """.stripMargin,
+      foldFunctions = funcs => s"UTF8String $stringVal = ${funcs.last};",
+      makeFunctionCallback = f => prevFunc = s"$f(${ctx.INPUT_ROW}, $indexVal)",
+      mergeSplit = false)


We don't need to and can't merge split functions in inner classes.

We don't need to to do it because the split functions are not call in a sequence like this:

eltFunc_1(...) eltFunc_2(...) ...

The calls are embedded in the default branch in each split function. So we won't call all split inner functions in outer class.

We can't merge them because the makeSplitFunction will create invalid merged function if used with the given foldFunctions:

private UTF8String eltFunc(InternalRow i, int index) { UTF8String stringVal = null; switch (index) { UTF8String stringVal = eltFunc_999(i, index); default: return nestedClassInstance.eltFunc_999(i, index); } return stringVal; }

yes but in this way we can hit the 64KB limit. Moreover I think that the current implementation is quite complex. What about making it similar to any other implementations using a while loop instead of a switch?
In this way we can ensure the 64KB limit won't be a problem and the code would be easier to understand IMHO.
WDYT?

Why we can hit the 64kb limit?

I have thought about it. Other implementation needs to introduce at least one global variable such as case when case. If we can tolerate it, it is ok for me. Let's see what other reviewers think about it.

let's not complicated the already-complex splitExpressions, I'm ok to use some global variables to simplify the code.

@viirya I think we can hit it, with an outstanding number of parameters to the function. I am not saying that it is likely to happen, but IMHO it is feasible to make it happening

Ok. Let me replace it with simpler codes. Thanks.

viirya · 2017-12-13T11:10:17Z

cc @cloud-fan @kiszk @mgaido91

SparkQA · 2017-12-13T13:54:48Z

Test build #84848 has finished for PR 19964 at commit c677aed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-13T13:58:10Z

Test build #84847 has finished for PR 19964 at commit c40488e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mgaido91 · 2017-12-13T14:22:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

+    val NOT_MATCHED = -1
+    // 0 means the given index matches one of indices of strings in split function.
+    val MATCHED = 0
+    val resultState = ctx.freshName("eltResultState")


this can be a boolean instead of a byte IMHO

oh, right. :-)

cloud-fan · 2017-12-13T14:23:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

+
+    // -1 means the given index doesn't match indices of strings in split function.
+    val NOT_MATCHED = -1
+    // 0 means the given index matches one of indices of strings in split function.


only 2 possible values, we can use boolean

yea, missing it.

mgaido91 · 2017-12-13T14:36:55Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

@@ -289,53 +289,56 @@ case class Elt(children: Seq[Expression])
    val index = indexExpr.genCode(ctx)
    val strings = stringExprs.map(_.genCode(ctx))
    val indexVal = ctx.freshName("index")
+    val resultState = ctx.freshName("eltResultState")


maybe this can have a better name now that it is a boolean.... I am not very good at naming, but something like indexFound or anything you feel appropriate...

mgaido91 · 2017-12-13T14:52:43Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

+         |${index.code}
+         |final int $indexVal = ${index.value};
+         |${ctx.JAVA_BOOLEAN} $indexMatched = false;
+         |$stringVal = ${ctx.defaultValue(dataType)};


nit: I would prefer $stringVal = null to enforce this, Because later we rely on stringVal to be init to null. Anyway the current implementation is right. If we have a UT which checks that it returns null when it should, we should be safe.

I think we have required tests in StringExpressionsSuite.

+1, since at the end we do final boolean ${ev.isNull} = ${ev.value} == null;.

cloud-fan · 2017-12-13T15:10:19Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala

    }

-    val cases = ctx.buildCodeBlocks(assignStringValue)


now ctx.buildCodeBlock doesn't need to be a separate method, can we revert that change and inline buildCodeBlock to splitExpressions?

splitExpressions is quite complicated. I think it is still good to have buildCodeBlock as a separate method. Maybe makes it as private?

sounds good

mgaido91 · 2017-12-13T15:33:01Z

LGTM, thanks

cloud-fan · 2017-12-13T16:19:24Z

LGTM

SparkQA · 2017-12-13T16:59:30Z

Test build #84857 has finished for PR 19964 at commit ac41620.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-13T17:16:52Z

Test build #84859 has finished for PR 19964 at commit f6a4a54.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-13T17:34:30Z

Test build #84860 has finished for PR 19964 at commit 1563a46.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-12-13T18:21:24Z

Test build #84863 has finished for PR 19964 at commit 69aab61.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-12-13T21:54:26Z

Thanks! Merged to master.

viirya mentioned this pull request Dec 13, 2017

[SPARK-22550][SQL] Fix 64KB JVM bytecode limit problem with elt #19778

Closed

viirya commented Dec 13, 2017

View reviewed changes

viirya force-pushed the SPARK-22772 branch from c40488e to c677aed Compare December 13, 2017 11:08

Use splitExpressionsWithCurrentInputs to split codes in elt.

c677aed

Simplified version.

ac41620

mgaido91 reviewed Dec 13, 2017

View reviewed changes

cloud-fan reviewed Dec 13, 2017

View reviewed changes

Use boolean.

f6a4a54

mgaido91 reviewed Dec 13, 2017

View reviewed changes

Rename variable.

1563a46

mgaido91 reviewed Dec 13, 2017

View reviewed changes

cloud-fan reviewed Dec 13, 2017

View reviewed changes

Address comments.

69aab61

asfgit closed this in ba0e79f Dec 13, 2017

viirya deleted the SPARK-22772 branch December 27, 2023 18:35

[SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in elt #19964

[SPARK-22772][SQL] Use splitExpressionsWithCurrentInputs to split codes in elt #19964

Uh oh!

Conversation

viirya commented Dec 13, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 Dec 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

viirya commented Dec 13, 2017

Uh oh!

SparkQA commented Dec 13, 2017

Uh oh!

SparkQA commented Dec 13, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 commented Dec 13, 2017

Uh oh!

cloud-fan commented Dec 13, 2017

Uh oh!

SparkQA commented Dec 13, 2017

Uh oh!

SparkQA commented Dec 13, 2017

Uh oh!

SparkQA commented Dec 13, 2017

Uh oh!

SparkQA commented Dec 13, 2017

Uh oh!

gatorsmile commented Dec 13, 2017

Uh oh!

Uh oh!

mgaido91 Dec 13, 2017 •

edited

Loading