Skip to content

Commit 70a75f3

Browse files
committed
updated forest vs boosting comparison
1 parent d1de753 commit 70a75f3

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

docs/mllib-ensembles.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ Both use [decision trees](mllib-decision-tree.html) as their base models.
1616

1717
Both [Gradient-Boosted Trees (GBTs)](mllib-ensembles.html#Gradient-Boosted-Trees-(GBTS)) and [Random Forests](mllib-ensembles.html#Random-Forests) are algorithms for learning ensembles of trees, but the training processes are different. There are several practical trade-offs:
1818

19-
* GBTs may be able to achieve the same accuracy using fewer trees, so the model produced may be smaller (faster for test time prediction).
2019
* GBTs train one tree at a time, so they can take longer to train than random forests. Random Forests can train multiple trees in parallel.
21-
* On the other hand, it is often reasonable to use smaller trees with GBTs than with Random Forests, and training smaller trees takes less time.
22-
* Random Forests can be less prone to overfitting. Training more trees in a Random Forest reduces the likelihood of overfitting, but training more trees with GBTs increases the likelihood of overfitting.
20+
* On the other hand, it is often reasonable to use smaller (shallower) trees with GBTs than with Random Forests, and training smaller trees takes less time.
21+
* Random Forests can be less prone to overfitting. Training more trees in a Random Forest reduces the likelihood of overfitting, but training more trees with GBTs increases the likelihood of overfitting. (In statistical language, Random Forests reduce variance by using more trees, whereas GBTs reduce bias by using more trees.)
22+
* Random Forests can be easier to tune since performance improves monotonically with the number of trees (whereas performance can start to decrease for GBTs if the number of trees grows too large).
2323

2424
In short, both algorithms can be effective, and the choice should be based on the particular dataset.
2525

0 commit comments

Comments
 (0)