Skip to content

Commit 942847f

Browse files
omgteammengxr
authored andcommitted
Bug Fix: without unpersist method in RandomForest.scala
During trainning Gradient Boosting Decision Tree on large-scale sparse data, spark spill hundreds of data onto disk. And find the bug below: In version 1.1.0 DecisionTree.scala, train Method, treeInput has been persisted in Memory, but without unpersist. It caused heavy DISK usage. In github version(1.2.0 maybe), RandomForest.scala, train Method, baggedInput has been persisted but without unpersisted too. After added unpersist, it works right. https://issues.apache.org/jira/browse/SPARK-3918 Author: omgteam <[email protected]> Closes apache#2775 from omgteam/master and squashes the following commits: 815d543 [omgteam] adjust tab to spaces 1a36f83 [omgteam] Bug: fix without unpersist baggedInput in RandomForest.scala
1 parent 92e017f commit 942847f

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -176,6 +176,8 @@ private class RandomForest (
176176
timer.stop("findBestSplits")
177177
}
178178

179+
baggedInput.unpersist()
180+
179181
timer.stop("total")
180182

181183
logInfo("Internal timing for DecisionTree:")

0 commit comments

Comments
 (0)