Skip to content

[SPARK-23034][SQL] Override nodeName for all *ScanExec operators #20226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,9 @@ case class RowDataSourceScanExec(

def output: Seq[Attribute] = requiredColumnsIndex.map(fullOutput)

override val nodeName: String =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataSourceScanExec.nodeName is defined as s"Scan $relation ${tableIdentifier.map(_.unquotedString).getOrElse("")}", do we really need to overwrite it here?

Copy link
Contributor Author

@tejasapatil tejasapatil Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to be able to distinguish between RowDataSourceScan and FileSourceScan. Removing those overrides.

s"Scan FileSource ${tableIdentifier.map(_.unquotedString).getOrElse(relation)}"

override lazy val metrics =
Map("numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

Expand Down Expand Up @@ -175,6 +178,9 @@ case class FileSourceScanExec(
}
}

override val nodeName: String =
s"Scan FileSource ${tableIdentifier.map(_.unquotedString).getOrElse(relation.location)}"

override def vectorTypes: Option[Seq[String]] =
relation.fileFormat.vectorTypes(
requiredSchema = requiredSchema,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,8 @@ case class ExternalRDDScanExec[T](
override lazy val metrics = Map(
"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

override val nodeName: String = s"Scan ExternalRDD ${output.map(_.name).mkString("[", ",", "]")}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think including the output in the node name is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention here was to be able to distinguish between ExternalRDDScanExec nodes. If we remove the output part from nodename, then these nodes would be named as Scan ExternalRDD which is generic.


protected override def doExecute(): RDD[InternalRow] = {
val numOutputRows = longMetric("numOutputRows")
val outputDataType = outputObjAttr.dataType
Expand All @@ -116,7 +118,7 @@ case class ExternalRDDScanExec[T](
}

override def simpleString: String = {
s"Scan $nodeName${output.mkString("[", ",", "]")}"
s"Scan ${super.nodeName}${output.mkString("[", ",", "]")}"
}
}

Expand Down Expand Up @@ -169,10 +171,12 @@ case class LogicalRDD(
case class RDDScanExec(
output: Seq[Attribute],
rdd: RDD[InternalRow],
override val nodeName: String,
name: String,
override val outputPartitioning: Partitioning = UnknownPartitioning(0),
override val outputOrdering: Seq[SortOrder] = Nil) extends LeafExecNode {

override val nodeName: String = s"Scan RDD $name ${output.map(_.name).mkString("[", ",", "]")}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed output. The name in there would help in identifying the nodes uniquely


override lazy val metrics = Map(
"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

Expand All @@ -189,6 +193,6 @@ case class RDDScanExec(
}

override def simpleString: String = {
s"Scan $nodeName${Utils.truncatedString(output, "[", ",", "]")}"
s"$nodeName${Utils.truncatedString(output, "[", ",", "]")}"
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ case class LocalTableScanExec(
output: Seq[Attribute],
@transient rows: Seq[InternalRow]) extends LeafExecNode {

override val nodeName: String = s"Scan LocalTable"

override lazy val metrics = Map(
"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ import org.apache.spark.sql.catalyst.expressions.codegen._
import org.apache.spark.sql.catalyst.plans.physical.Partitioning
import org.apache.spark.sql.catalyst.rules.Rule
import org.apache.spark.sql.execution.aggregate.HashAggregateExec
import org.apache.spark.sql.execution.columnar.InMemoryTableScanExec
import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, SortMergeJoinExec}
import org.apache.spark.sql.execution.metric.SQLMetrics
import org.apache.spark.sql.internal.SQLConf
Expand All @@ -45,7 +46,12 @@ trait CodegenSupport extends SparkPlan {
case _: SortMergeJoinExec => "smj"
case _: RDDScanExec => "rdd"
case _: DataSourceScanExec => "scan"
case _ => nodeName.toLowerCase(Locale.ROOT)
Copy link
Contributor Author

@tejasapatil tejasapatil Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused one of the tests to fail as the nodeName generated was not a single word (like before) but something like scan in-memory my_table... which does not compile with codegen. The change done was to retain only the alpha-numeric characters in the nodeName while generating variablePrefix

case _: LocalTableScanExec => "local_scan"
case _: InMemoryTableScanExec => "in_mem_scan"
case _ =>
// Java variable names can only have alpha-numeric characters, underscores and `$` (the use of
// later two is discouraged)
nodeName.toLowerCase(Locale.ROOT).replaceAll("\\P{Alnum}", "")
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ case class InMemoryTableScanExec(
@transient relation: InMemoryRelation)
extends LeafExecNode with ColumnarBatchScan {

override val nodeName: String = s"Scan In-memory ${relation.tableName.getOrElse("")}"

override protected def innerChildren: Seq[QueryPlan[_]] = Seq(relation) ++ super.innerChildren

override def vectorTypes: Option[Seq[String]] =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ case class DataSourceV2ScanExec(
@transient reader: DataSourceV2Reader)
extends LeafExecNode with DataSourceReaderHolder with ColumnarBatchScan {

override val nodeName: String =
s"Scan DataSourceV2 ${fullOutput.map(_.name).mkString("[", ",", "]")}"

override def canEqual(other: Any): Boolean = other.isInstanceOf[DataSourceV2ScanExec]

override def producedAttributes: AttributeSet = AttributeSet(fullOutput)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,6 @@ Join Cross

== Physical Plan ==
BroadcastNestedLoopJoin BuildRight, Cross
:- LocalTableScan [col1#x, col2#x]
:- Scan LocalTable [col1#x, col2#x]
+- BroadcastExchange IdentityBroadcastMode
+- LocalTableScan [col1#x, col2#x]
+- Scan LocalTable [col1#x, col2#x]
12 changes: 6 additions & 6 deletions sql/core/src/test/resources/sql-tests/results/operators.sql.out
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ struct<plan:string>
-- !query 28 output
== Physical Plan ==
*Project [null AS (CAST(concat(a, CAST(1 AS STRING)) AS DOUBLE) + CAST(2 AS DOUBLE))#x]
+- Scan OneRowRelation[]
+- Scan RDD OneRowRelation [][]


-- !query 29
Expand All @@ -243,7 +243,7 @@ struct<plan:string>
-- !query 29 output
== Physical Plan ==
*Project [-1b AS concat(CAST((1 - 2) AS STRING), b)#x]
+- Scan OneRowRelation[]
+- Scan RDD OneRowRelation [][]


-- !query 30
Expand All @@ -253,7 +253,7 @@ struct<plan:string>
-- !query 30 output
== Physical Plan ==
*Project [11b AS concat(CAST(((2 * 4) + 3) AS STRING), b)#x]
+- Scan OneRowRelation[]
+- Scan RDD OneRowRelation [][]


-- !query 31
Expand All @@ -263,7 +263,7 @@ struct<plan:string>
-- !query 31 output
== Physical Plan ==
*Project [4a2.0 AS concat(concat(CAST((3 + 1) AS STRING), a), CAST((CAST(4 AS DOUBLE) / CAST(2 AS DOUBLE)) AS STRING))#x]
+- Scan OneRowRelation[]
+- Scan RDD OneRowRelation [][]


-- !query 32
Expand All @@ -273,7 +273,7 @@ struct<plan:string>
-- !query 32 output
== Physical Plan ==
*Project [true AS ((1 = 1) OR (concat(a, b) = ab))#x]
+- Scan OneRowRelation[]
+- Scan RDD OneRowRelation [][]


-- !query 33
Expand All @@ -283,7 +283,7 @@ struct<plan:string>
-- !query 33 output
== Physical Plan ==
*Project [false AS ((concat(a, c) = ac) AND (2 = 3))#x]
+- Scan OneRowRelation[]
+- Scan RDD OneRowRelation [][]


-- !query 34
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -476,7 +476,7 @@ class StreamSuite extends StreamTest {
.mkString("\n")
assert(explainString.contains("StateStoreRestore"))
assert(explainString.contains("StreamingRelation"))
assert(!explainString.contains("LocalTableScan"))
assert(!explainString.contains("Scan LocalTable"))

// Test StreamingQuery.display
val q = df.writeStream.queryName("memory_explain").outputMode("complete").format("memory")
Expand All @@ -493,15 +493,15 @@ class StreamSuite extends StreamTest {
val explainWithoutExtended = q.explainInternal(false)
// `extended = false` only displays the physical plan.
assert("LocalRelation".r.findAllMatchIn(explainWithoutExtended).size === 0)
assert("LocalTableScan".r.findAllMatchIn(explainWithoutExtended).size === 1)
assert("Scan LocalTable".r.findAllMatchIn(explainWithoutExtended).size === 1)
// Use "StateStoreRestore" to verify that it does output a streaming physical plan
assert(explainWithoutExtended.contains("StateStoreRestore"))

val explainWithExtended = q.explainInternal(true)
// `extended = true` displays 3 logical plans (Parsed/Optimized/Optimized) and 1 physical
// plan.
assert("LocalRelation".r.findAllMatchIn(explainWithExtended).size === 3)
assert("LocalTableScan".r.findAllMatchIn(explainWithExtended).size === 1)
assert("Scan LocalTable".r.findAllMatchIn(explainWithExtended).size === 1)
// Use "StateStoreRestore" to verify that it does output a streaming physical plan
assert(explainWithExtended.contains("StateStoreRestore"))
} finally {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest {
val plan = statement.executeQuery("explain select * from test_table")
plan.next()
plan.next()
assert(plan.getString(1).contains("InMemoryTableScan"))
assert(plan.getString(1).contains("Scan In-memory"))

val rs1 = statement.executeQuery("SELECT key FROM test_table ORDER BY KEY DESC")
val buf1 = new collection.mutable.ArrayBuffer[Int]()
Expand Down Expand Up @@ -364,7 +364,7 @@ class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest {
val plan = statement.executeQuery("explain select key from test_map ORDER BY key DESC")
plan.next()
plan.next()
assert(plan.getString(1).contains("InMemoryTableScan"))
assert(plan.getString(1).contains("Scan In-memory"))

val rs = statement.executeQuery("SELECT key FROM test_map ORDER BY KEY DESC")
val buf = new collection.mutable.ArrayBuffer[Int]()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ case class HiveTableScanExec(

override def conf: SQLConf = sparkSession.sessionState.conf

override def nodeName: String = s"Scan HiveTable ${relation.tableMeta.qualifiedName}"

override lazy val metrics = Map(
"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

Expand Down