Skip to content

[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit - Class Splitting #18075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

bdrillard
Copy link

What changes were proposed in this pull request?

This pull-request exclusively includes the class splitting feature described in #16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes:

  1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the CodeGenerator class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes.
  2. Changes addNewFunction to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of addNewFunction throughout the codebase are modified so that the returned name is properly used.
  3. Removes instances of the this keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the this keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class.

How was this patch tested?

Added a test case to the GeneratedProjectionSuite that increases the number of columns tested in various projections to a threshold that would previously have triggered a JaninoRuntimeException for the Constant Pool.

Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in #16648): excess global mutable state. A second PR may be opened to resolve that issue.

@@ -629,7 +736,9 @@ class CodegenContext {

/**
* Splits the generated code of expressions into multiple functions, because function has
* 64kb code size limit in JVM
* 64kb code size limit in JVM. If the class the function is to be inlined to would grow beyond
* 1600kb, a private, netsted sub-class is declared, and the function is inlined to it, because
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: netsted -> nested?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -792,7 +887,18 @@ class CodegenContext {
addMutableState(javaType(expr.dataType), value,
s"$value = ${defaultValue(expr.dataType)};")

subexprFunctions += s"$fnName($INPUT_ROW);"
// Generate the code for this expression tree and wrap it in a function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to move this code block from the original place to here?

Copy link
Author

@bdrillard bdrillard May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for the scope of this class splitting change. The previous pull-request that also made addMutableState return an accessor reference required the order to change so that correct variables could be declared prior to the code of fn, but for this pull-request, only ensuring that subexprFunctions gets the return of addNewFunction is necessary. I'll change the ordering back.

Copy link
Author

@bdrillard bdrillard May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed: 90a907a#diff-8bcc5aea39c73d4bf38aef6f6951d42cR872
Note: a rebase with the two commits squashed will show the subexprFunctions as the only changed line, which is what we want.

@kiszk
Copy link
Member

kiszk commented May 24, 2017

Thank you. Absolutely, it is easier to review this change.

*/
private[sql] def declareAddedFunctions(): String = {
classFunctions("OuterClass").map {
case (funcName, funcCode) => funcCode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: funcName -> _

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
val code = functions.map {
case (_, funcCode) =>
s"$funcCode"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s"$funcCode" -> funcCode

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -299,6 +297,9 @@ case class SampleExec(
| }
""".stripMargin.trim)

ctx.addMutableState(s"$samplerClass<UnsafeRow>", sampler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this block at the original place?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should stay the way it is. Notice that the initialization code for the addMutableState call on the line just below is the same code passed to the addNewFunction call. This means that when we create the new function, we may get back a class-qualified function name (which here we store in initSamplerFuncName), so the call to addMutableState must come after the addNewFunction call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, got it. I overlooked the new dependency regarding initSamplerFuncName.

*/
private[sql] def initNestedClasses(): String = {
// Nested, private sub-classes have no mutable state (though they do reference the outer class'
// mutable state), so we declare and initialize them inline ot the OuterClass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ot -> at

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val name = classInfo._1

classSize.update(name, classSize(name) + funcCode.length)
classFunctions.update(name, classFunctions(name) += funcCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure, but is " += " required? How about " = "?

Copy link
Author

@bdrillard bdrillard May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+= will be necessary since classFunctions(name) will return the ListBuffer[String] containing all functions belonging to the given class name from the classFunctions map, and we want to append the new funcCode to that buffer. Also, classFunctions.update is going to expect a ListBuffer[String] as its second argument, but the return type of assignment = is just Unit. += will append the element and then return the modified buffer (which is the behavior we want as per the API doc for ListBuffer).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for confusing you, I made a mistake in my comment. I wanted to say " + " instead of " = ".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sure. The story seems pretty much the same, we still want the append operation that also returns the reference to the modified buffer, and that's given by +=. Also, it doesn't look like + is defined as an operation on a ListBuffer[A] and an element of type A (see ListBuffer).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your clarification.
In that case, is classFunctions(name) += funcCode enough instead of calling classFunctions.update?

Copy link
Author

@bdrillard bdrillard May 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point, since we're using a mutable buffer, we can update the referenced object directly even if its contained inside the map. Since += explicitly returns the reference to the modified buffer, it would probably be most straightforward to use

classFunctions(name).append(funcCode)

since append has Unit return type, and we don't need any results from appending the code to the class's buffer of function code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a commit with that change if you think it checks out: c225f3a#diff-8bcc5aea39c73d4bf38aef6f6951d42cL290

@kiszk
Copy link
Member

kiszk commented May 25, 2017

Thanks, sound good to me for now.
cc @ueshin

Copy link
Member

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

I added some comments and I found some other places we might need to modify:

def addNewFunction(
funcName: String,
funcCode: String,
inlineToOuterClass: Boolean = false): String = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indent. Add 2 more spaces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// limit, 65,536. We cannot know how many constants will be inserted for a class, so we use a
// threshold of 1600k bytes to determine when a function should be inlined to a private, nested
// sub-class.
val classInfo = if (inlineToOuterClass) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can use val (className, classInstance) = if ... here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed (with some refactoring based on those variables being available in the scope earlier): 493113c#diff-8bcc5aea39c73d4bf38aef6f6951d42cL278


// Adds a new class. Requires the class' name, and its instance name.
private def addClass(className: String, classInstance: String): Unit = {
classes.prepend(Tuple2(className, classInstance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about classes.prepend((className, classInstance)) or classes.prepend(className -> classInstance)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

})

// Create the collection.
val wrapperClass = classOf[mutable.WrappedArray[_]].getName
ctx.addMutableState(
s"$wrapperClass<InternalRow>",
ev.value,
s"this.${ev.value} = $wrapperClass$$.MODULE$$.make(this.$rowData);")
s"this.${ev.value} = $wrapperClass$$.MODULE$$.make($rowData);")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should remove one more this. here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

val name = classInfo._1

classSize.update(name, classSize(name) + funcCode.length)
classFunctions(name).append(funcCode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

    classSize(name) += funcCode.length
    classFunctions(name) += funcCode

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that's more concise/readable given the underlying mutable data structures: 493113c#diff-8bcc5aea39c73d4bf38aef6f6951d42cL292

@bdrillard
Copy link
Author

@ueshin As for the remaining this in objects.scala, 493113c#diff-e436c96ea839dfe446837ab2a3531f93L984
and the need for an additional nested classes declaration in GenerateColumnAccessor.scala, 493113c#diff-58a69e526de8182bcb4c840a8cb29e2dR225

@ueshin
Copy link
Member

ueshin commented May 31, 2017

ok to test

@SparkQA
Copy link

SparkQA commented May 31, 2017

Test build #77564 has finished for PR 18075 at commit 493113c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bdrillard bdrillard force-pushed the class_splitting_only branch from 493113c to 7fe5e4a Compare May 31, 2017 17:01
@SparkQA
Copy link

SparkQA commented May 31, 2017

Test build #77597 has finished for PR 18075 at commit 7fe5e4a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@bdrillard
Copy link
Author

The earlier failure occurred when the stopEarly() function is registered to the OuterClass potentially twice. Using a Map of functions holding the function code and its name fixes the issue, as whenever a function of the same name would be added more than once, it updates the older value. The tests pass after the change.

@ueshin
Copy link
Member

ueshin commented Jun 1, 2017

LGTM for now, cc @cloud-fan.

@fbertsch
Copy link

We're really looking forward to this change! This bug is limiting a lot of the work we'd like to do with Spark. Any idea who we can ping to move this along?

@kiszk
Copy link
Member

kiszk commented Jun 13, 2017

@cloud-fan can you have a time to look at this?

@cloud-fan
Copy link
Contributor

reviewing


// A map holding the current size in bytes of each class to be generated.
private val classSize: mutable.Map[String, Int] =
mutable.Map[String, Int]("OuterClass" -> 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can create a variable for this "OuterClass" instead of hardcoding it in many places.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classSize(className) += funcCode.length
classFunctions(className) += funcName -> funcCode

if (className.equals("OuterClass")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: className == "OuterClass"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutable.Map[String, Int]("OuterClass" -> 0)

// Nested maps holding function names and their code belonging to each class.
private val classFunctions: mutable.Map[String, mutable.Map[String, String]] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems we only need a map from class name to function codes? i.e. mutable.Map[String, mutable.ListBuffer[String]]

Copy link
Author

@bdrillard bdrillard Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had originally thought so, but it turns out that there's at least one instance where the code for a given function name is updated during the code-generation process. The generated stopEarly function can actually be inserted twice, once returning a variable returning a different stopEarly variable each time. What would end up occurring is that two functions of the same signature would exist in the class, causing a compile error. So we need to use a map to make sure the implementation gets updated for a given function when necessary.

Note also that the old implementation of addedFunctions was a map also.

@@ -629,7 +730,9 @@ class CodegenContext {

/**
* Splits the generated code of expressions into multiple functions, because function has
* 64kb code size limit in JVM
* 64kb code size limit in JVM. If the class the function is to be inlined to would grow beyond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the class the function -> If the class with the function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the grammatically correct/hopefully more clear form of that same docstring: 678b4ad#diff-8bcc5aea39c73d4bf38aef6f6951d42cR727

private[sql] def initNestedClasses(): String = {
// Nested, private sub-classes have no mutable state (though they do reference the outer class'
// mutable state), so we declare and initialize them inline to the OuterClass.
classes.map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: classes.filterKeys(_ != "OuterClass").map ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -83,6 +83,58 @@ class GeneratedProjectionSuite extends SparkFunSuite {
assert(result === row2)
}

test("SPARK-18016: generated projections on wider table requiring class-splitting") {
val N = 4000
val wideRow1 = new GenericInternalRow((1 to N).toArray[Any])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new GenericInternalRow(N)

Copy link
Author

@bdrillard bdrillard Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I've cleaned up the test you comment on, and the one above it, since they both have the same structure, just different values for N:
678b4ad#diff-a14107cf4a4c41671bba24a82f6042d9R36

val unsafeProj = UnsafeProjection.create(nestedSchema)
val unsafe: UnsafeRow = unsafeProj(nested)
(0 until N).foreach { i =>
val s = UTF8String.fromString((i + 1).toString)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create input data with 0 until N, or test it with 1 to N, to avoid this i + 1

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. See the above comment. Creating the data with 0 until N cleans up the indexing on i.

@@ -93,7 +93,7 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport {
}

val nextBatch = ctx.freshName("nextBatch")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: inline this nextBatch

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, how about we make addNewFunction accepts funcNameHint and generate unique func name inside addNewFunction? Then users can just call: val nextBatch = ctx.addNewFunction("nextBatch", ...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it depends on which implementation we think is cleaner. The freshName generated by the caller is typically used twice, once in the call to addNewFunction, but also immediately in the function code as the method name. If we use a name hint, we'd have to do a string replace inside addNewFunction to update the placeholder method name with the freshname. So it would seem either we keep

    val nextBatch = ctx.freshName("nextBatch")
    val nextBatchFuncName = ctx.addNewFunction(nextBatch,
      s"""
         |private void $nextBatch() throws java.io.IOException {
         |  long getBatchStart = System.nanoTime();
         |  if ($input.hasNext()) {
         |    $batch = ($columnarBatchClz)$input.next();
         |    $numOutputRows.add($batch.numRows());
         |    $idx = 0;
         |    ${columnAssigns.mkString("", "\n", "\n")}
         |  }
         |  $scanTimeTotalNs += System.nanoTime() - getBatchStart;
         |}""".stripMargin)

or we have

    val nextBatchHint = "nextBatch"
    val nextBatch = ctx.addNewFunction(nextBatchHint,
      s"""
         |private void $nextBatchHint() throws java.io.IOException {
         |  long getBatchStart = System.nanoTime();
         |  if ($input.hasNext()) {
         |    $batch = ($columnarBatchClz)$input.next();
         |    $numOutputRows.add($batch.numRows());
         |    $idx = 0;
         |    ${columnAssigns.mkString("", "\n", "\n")}
         |  }
         |  $scanTimeTotalNs += System.nanoTime() - getBatchStart;
         |}""".stripMargin)

where addNewFunction would do the proper replacement over the code for the method with a freshname generated from "nextBatch" as a name hint.

Or in every instance, we just duplicate the string hint without creating a variable for it in both the addNewFunction call and the method name:

    val nextBatch = ctx.addNewFunction("nextBatch",
      s"""
         |private void nextBatch() throws java.io.IOException {
         |  long getBatchStart = System.nanoTime();
         |  if ($input.hasNext()) {
         |    $batch = ($columnarBatchClz)$input.next();
         |    $numOutputRows.add($batch.numRows());
         |    $idx = 0;
         |    ${columnAssigns.mkString("", "\n", "\n")}
         |  }
         |  $scanTimeTotalNs += System.nanoTime() - getBatchStart;
         |}""".stripMargin)

Which would you prefer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep it as it was

*
* @param funcName the class-unqualified name of the function
* @param funcCode the body of the function
* @param inlineToOuterClass whether the given code must be inlined to the `OuterClass`. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example? I'm not very clear when we need this

Copy link
Author

@bdrillard bdrillard Jun 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, see the portion of doConsume in the Limit class where the stopEarly function is registered, https://github.com/apache/spark/pull/18075/files#diff-379cccace8699ca00b76ff5631222adeR73

In this section of code, the registration of the function is separate from the caller code, so unlike other changes in this patch, we have no way of informing the caller code what the potentially class-qualified name of the function would be if it were inlined to a nested class. Instead, the caller code for the function (in WholeStageCodegenExec), makes a hard assumption that stopEarly will be visible globally, that is, in the outer class. The caller is divorced from the function producer across classes, so it's not clear how to make a generated function name visible, but the hint to inline to just inline the function to the outer class fixes that issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me, as the stopEarly in Limit is going to override the stopEarly in BufferedRowIterator, we can only put it in outer class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, whole stage codegen is really tricky...

@cloud-fan
Copy link
Contributor

LGTM except some style comments

@SparkQA
Copy link

SparkQA commented Jun 14, 2017

Test build #78059 has finished for PR 18075 at commit 678b4ad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -233,10 +222,118 @@ class CodegenContext {
// The collection of sub-expression result resetting methods that need to be called on each row.
val subexprFunctions = mutable.ArrayBuffer.empty[String]

def declareAddedFunctions(): String = {
addedFunctions.map { case (funcName, funcCode) => funcCode }.mkString("\n")
val outerClassName = "OuterClass"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: private val

@@ -33,10 +33,10 @@ class GeneratedProjectionSuite extends SparkFunSuite {

test("generated projections on wider table") {
val N = 1000
val wideRow1 = new GenericInternalRow((1 to N).toArray[Any])
val wideRow1 = new GenericInternalRow((0 until N).toArray[Any])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can be new GenericInternalRow(N)


test("SPARK-18016: generated projections on wider table requiring class-splitting") {
val N = 4000
val wideRow1 = new GenericInternalRow((0 until N).toArray[Any])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@cloud-fan
Copy link
Contributor

thanks, merging to master! you can address the remaining comments in your other PRs

@asfgit asfgit closed this in b32b212 Jun 15, 2017

/**
* Adds a function to the generated class. If the code for the `OuterClass` grows too large, the
* function will be inlined into a new private, nested class, and a class-qualified name for the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: class instance-qualified name

* can be necessary when a function is declared outside of the context
* it is eventually referenced and a returned qualified function name
* cannot otherwise be accessed.
* @return the name of the function, qualified by class if it will be inlined to a private,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

dataknocker pushed a commit to dataknocker/spark that referenced this pull request Jun 16, 2017
…lass Splitting

## What changes were proposed in this pull request?

This pull-request exclusively includes the class splitting feature described in apache#16648. When code for a given class would grow beyond 1600k bytes, a private, nested sub-class is generated into which subsequent functions are inlined. Additional sub-classes are generated as the code threshold is met subsequent times. This code includes 3 changes:

1. Includes helper maps, lists, and functions for keeping track of sub-classes during code generation (included in the `CodeGenerator` class). These helper functions allow nested classes and split functions to be initialized/declared/inlined to the appropriate locations in the various projection classes.
2. Changes `addNewFunction` to return a string to support instances where a split function is inlined to a nested class and not the outer class (and so must be invoked using the class-qualified name). Uses of `addNewFunction` throughout the codebase are modified so that the returned name is properly used.
3. Removes instances of the `this` keyword when used on data inside generated classes. All state declared in the outer class is by default global and accessible to the nested classes. However, if a reference to global state in a nested class is prepended with the `this` keyword, it would attempt to reference state belonging to the nested class (which would not exist), rather than the correct variable belonging to the outer class.

## How was this patch tested?

Added a test case to the `GeneratedProjectionSuite` that increases the number of columns tested in various projections to a threshold that would previously have triggered a `JaninoRuntimeException` for the Constant Pool.

Note: This PR does not address the second Constant Pool issue with code generation (also mentioned in apache#16648): excess global mutable state. A second PR may be opened to resolve that issue.

Author: ALeksander Eskilson <[email protected]>

Closes apache#18075 from bdrillard/class_splitting_only.
asfgit pushed a commit that referenced this pull request Jun 22, 2017
…ol Limit - Class Splitting

## What changes were proposed in this pull request?

This is a backport patch for Spark 2.1.x of the class splitting feature over excess generated code as was merged in #18075.

## How was this patch tested?

The same test provided in #18075 is included in this patch.

Author: ALeksander Eskilson <[email protected]>

Closes #18354 from bdrillard/class_splitting_2.1.
asfgit pushed a commit that referenced this pull request Jun 22, 2017
…ol Limit - Class Splitting

## What changes were proposed in this pull request?

This is a backport patch for Spark 2.2.x of the class splitting feature over excess generated code as was merged in #18075.

## How was this patch tested?

The same test provided in #18075 is included in this patch.

Author: ALeksander Eskilson <[email protected]>

Closes #18377 from bdrillard/class_splitting_2.2.
@bdrillard
Copy link
Author

The second part that follows this merged PR is up as #19518.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants