Skip to content

[SPARK-35398][SQL] Simplify the way to get classes from ClassBodyEvaluator in CodeGenerator.updateAndGetCompilationStats method #32536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@
package org.apache.spark.sql.catalyst.expressions.codegen

import java.io.ByteArrayInputStream
import java.util.{Map => JavaMap}

import scala.collection.JavaConverters._
import scala.collection.mutable
Expand All @@ -28,8 +27,7 @@ import scala.util.control.NonFatal
import com.google.common.cache.{CacheBuilder, CacheLoader}
import com.google.common.util.concurrent.{ExecutionError, UncheckedExecutionException}
import org.codehaus.commons.compiler.{CompileException, InternalCompilerException}
import org.codehaus.commons.compiler.util.reflect.ByteArrayClassLoader
import org.codehaus.janino.{ClassBodyEvaluator, SimpleCompiler}
import org.codehaus.janino.ClassBodyEvaluator
import org.codehaus.janino.util.ClassFile

import org.apache.spark.{TaskContext, TaskKilledException}
Expand Down Expand Up @@ -1434,15 +1432,7 @@ object CodeGenerator extends Logging {
*/
private def updateAndGetCompilationStats(evaluator: ClassBodyEvaluator): ByteCodeStats = {
// First retrieve the generated classes.
val classes = {
val scField = classOf[ClassBodyEvaluator].getDeclaredField("sc")
scField.setAccessible(true)
val compiler = scField.get(evaluator).asInstanceOf[SimpleCompiler]
val loader = compiler.getClassLoader.asInstanceOf[ByteArrayClassLoader]
val classesField = loader.getClass.getDeclaredField("classes")
classesField.setAccessible(true)
classesField.get(loader).asInstanceOf[JavaMap[String, Array[Byte]]].asScala
}
val classes = evaluator.getBytecodes.asScala
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems fine, but have you checked if this stat value does not change before/after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did some manual tests to check this.
For example, add a afterEach() method to CodeGenerationSuite to record the CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues after each case and the stat value not change before/after this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the same way, the stat values of CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE has not changed before/after this PR.

Copy link
Contributor Author

@LuciferYang LuciferYang May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu If we want to do some code checking, maybe we can enhanced the case metrics are recorded on compile in CodeGenerationSuite as follows:

test("metrics are recorded on compile") {
    ...
    val metricGeneratedClassByteCodeSizeSnapshot =
      CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
    val metricGeneratedMethodByteCodeSizeSnapshot =
      CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
    GenerateOrdering.generate(Add(Literal(123), Literal(1)).asc :: Nil)
    ...
    // Make sure the stat content doesn't change before/after SPARK-35398
    assert(CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE.getSnapshot.getValues
      .diff(metricGeneratedClassByteCodeSizeSnapshot)
      .sameElements(Array(740, 1293)))
    assert(CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.getSnapshot.getValues
      .diff(metricGeneratedMethodByteCodeSizeSnapshot)
      .sameElements(Array(5, 5, 6, 7, 10, 15, 46)))
  }

The new assertion can be passed before and after this pr, however, if we update the version of janino or change the codegen of Add, we may need to update the content of the assertion because the size of the generated code may change.
For example CodegenMetrics.METRIC_GENERATED_CLASS_BYTECODE_SIZE with janino 3.1.4 are Array(740, 1293), but with janino 3.0.16 are Array(687, 1036), so I'm not sure if we need to add these assertions in this pr.

Copy link
Member

@maropu maropu May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu If we want to do some code checking, maybe we can enhanced the case metrics are recorded on compile in CodeGenerationSuite as follows:

Ah, I see. Thank for the explanation, @LuciferYang. Could you add a new test unit for the assert with the prefix SPARK-35398: ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


// Then walk the classes to get at the method bytecode.
val codeAttr = Utils.classForName("org.codehaus.janino.util.ClassFile$CodeAttribute")
Expand Down