Skip to content

Commit 2c55985

Browse files
committed
update comments
1 parent 4a2311c commit 2c55985

File tree

2 files changed

+10
-7
lines changed

2 files changed

+10
-7
lines changed

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStage.scala

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,10 @@ import org.apache.spark.util.ThreadUtils
3333

3434
/**
3535
* In adaptive execution mode, an execution plan is divided into multiple QueryStages. Each
36-
* QueryStage is a sub-tree that runs in a single stage.
36+
* QueryStage is a sub-tree that runs in a single stage. Before executing current stage, we will
37+
* first submit all its child stages, wait for their completions and collect their statistics.
38+
* Based on the collected data, we can potentially optimize the execution plan in current stage,
39+
* change the number of reducer and do other optimizations.
3740
*/
3841
abstract class QueryStage extends UnaryExecNode {
3942

sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStageInput.scala

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partition
2525
import org.apache.spark.sql.execution._
2626

2727
/**
28-
* QueryStageInput is the leaf node of a QueryStage and serves as its input. It is responsible for
29-
* changing the output partition based on the need of its QueryStage. It gets the ShuffledRowRDD
28+
* QueryStageInput is the leaf node of a QueryStage and serves as its input. A QueryStage knows
29+
* its child stages by collecting all the QueryStageInputs. For a ShuffleQueryStageInput, it
30+
* controls how to read the ShuffledRowRDD generated by its child stage. It gets the ShuffledRowRDD
3031
* from its child stage and creates a new ShuffledRowRDD with different partitions by specifying
31-
* an optional array of partition start indices. For example, a ShuffledQueryStage can be reused
32-
* by two different QueryStages. One QueryStageInput can let the first task read partition 0 to 3,
33-
* while in another stage, the QueryStageInput can let the first task read partition 0 to 1.
34-
* A QueryStage knows its child stages by collecting all the QueryStageInputs.
32+
* an array of partition start indices. For example, a ShuffledQueryStage can be reused by two
33+
* different QueryStages. One QueryStageInput can let the first task read partition 0 to 3, while
34+
* in another stage, the QueryStageInput can let the first task read partition 0 to 1.
3535
*/
3636
abstract class QueryStageInput extends LeafExecNode {
3737

0 commit comments

Comments
 (0)