[SPARK-6278][MLLIB] Mention the change of objective in linear regression #4978

mengxr · 2015-03-11T07:51:34Z

As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. @srowen

mengxr · 2015-03-11T07:51:57Z

docs/mllib-guide.md

@@ -107,6 +107,7 @@ In the `spark.mllib` package, there were several breaking changes.  The first ch
    * In `DecisionTree`, the deprecated class method `train` has been removed.  (The object/static `train` methods remain.)
    * In `Strategy`, the `checkpointDir` parameter has been removed.  Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training.
 * `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`.  This was never meant for external use.
+* In linear regression (including Lasso and ridge regression), we scaled the squared loss by 0.5. So in order to produce the same result as in 1.2, the step size you chose needs to be scaled by 2.


Apache is out-of-sync again. This is the only change in this PR.

chose -> choose, and I suppose it could be made a little more crystal-clear by saying the step size needs to be multiplied by 2 ('scaled' somehow could mean divide or multiply, to me).

SparkQA · 2015-03-11T07:52:44Z

Test build #28465 has started for PR 4978 at commit f87ae71.

This patch merges cleanly.

SparkQA · 2015-03-11T09:25:10Z

Test build #28465 has finished for PR 4978 at commit f87ae71.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable with Serializable

AmplabJenkins · 2015-03-11T09:25:14Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28465/
Test PASSed.

SparkQA · 2015-03-12T08:52:33Z

Test build #28506 has started for PR 4978 at commit bfd6cff.

This patch merges cleanly.

SparkQA · 2015-03-12T09:48:54Z

Test build #28506 has finished for PR 4978 at commit bfd6cff.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-12T09:48:58Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28506/
Test FAILed.

srowen · 2015-03-12T12:16:56Z

docs/mllib-guide.md

@@ -102,6 +102,7 @@ In the `spark.mllib` package, there were several breaking changes.  The first ch
    * In `DecisionTree`, the deprecated class method `train` has been removed.  (The object/static `train` methods remain.)
    * In `Strategy`, the `checkpointDir` parameter has been removed.  Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training.
 * `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`.  This was never meant for external use.
+* In linear regression (including Lasso and ridge regression), the squared loss is now divided by 2. So in order to produce the same result as in 1.2, the step size you choose needs to be multiplied by 2.


Hm, it also occurred to me that if the step size doubles, then it affects the regularization parameter as well. Doesn't it have to be half as large as well in order to get the same result? I'm probably overlooking something about the formulation, but I didn't see the reg param updated in a96b727 and if the loss term was halved, leaving all else equal, the regularization term is relatively twice as large right?

Right. The L2 regularization term didn't change. So to generate the exact result, we need to reduce the regularization constant by half and multiply the step size by 2.

SparkQA · 2015-03-12T23:08:10Z

Test build #28539 has started for PR 4978 at commit fb3bbe6.

This patch merges cleanly.

SparkQA · 2015-03-13T00:26:28Z

Test build #28539 has finished for PR 4978 at commit fb3bbe6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-13T00:26:32Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28539/
Test PASSed.

mengxr · 2015-03-13T17:28:17Z

Merged into master and branch-1.3.

As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. srowen Author: Xiangrui Meng <[email protected]> Closes #4978 from mengxr/SPARK-6278 and squashes the following commits: fb3bbe6 [Xiangrui Meng] mention regularization parameter bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-6278 375fd09 [Xiangrui Meng] address Sean's comments f87ae71 [Xiangrui Meng] mention step size change (cherry picked from commit 7f13434) Signed-off-by: Xiangrui Meng <[email protected]>

mention step size change

f87ae71

mengxr reviewed Mar 11, 2015
View reviewed changes

mengxr added 2 commits March 12, 2015 01:47

address Sean's comments

375fd09

Merge remote-tracking branch 'apache/master' into SPARK-6278

bfd6cff

srowen reviewed Mar 12, 2015
View reviewed changes

mention regularization parameter

fb3bbe6

asfgit closed this in 7f13434 Mar 13, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-6278][MLLIB] Mention the change of objective in linear regression #4978

[SPARK-6278][MLLIB] Mention the change of objective in linear regression #4978

Uh oh!

mengxr commented Mar 11, 2015

Uh oh!

mengxr Mar 11, 2015

Uh oh!

srowen Mar 11, 2015

Uh oh!

mengxr Mar 12, 2015

Uh oh!

SparkQA commented Mar 11, 2015

Uh oh!

SparkQA commented Mar 11, 2015

Uh oh!

AmplabJenkins commented Mar 11, 2015

Uh oh!

SparkQA commented Mar 12, 2015

Uh oh!

SparkQA commented Mar 12, 2015

Uh oh!

AmplabJenkins commented Mar 12, 2015

Uh oh!

srowen Mar 12, 2015

Uh oh!

mengxr Mar 12, 2015

Uh oh!

SparkQA commented Mar 12, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

AmplabJenkins commented Mar 13, 2015

Uh oh!

mengxr commented Mar 13, 2015

Uh oh!

Uh oh!

[SPARK-6278][MLLIB] Mention the change of objective in linear regression #4978

[SPARK-6278][MLLIB] Mention the change of objective in linear regression #4978

Uh oh!

Conversation

mengxr commented Mar 11, 2015

Uh oh!

mengxr Mar 11, 2015

Choose a reason for hiding this comment

Uh oh!

srowen Mar 11, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr Mar 12, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 11, 2015

Uh oh!

SparkQA commented Mar 11, 2015

Uh oh!

AmplabJenkins commented Mar 11, 2015

Uh oh!

SparkQA commented Mar 12, 2015

Uh oh!

SparkQA commented Mar 12, 2015

Uh oh!

AmplabJenkins commented Mar 12, 2015

Uh oh!

srowen Mar 12, 2015

Choose a reason for hiding this comment

Uh oh!

mengxr Mar 12, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Mar 12, 2015

Uh oh!

SparkQA commented Mar 13, 2015

Uh oh!

AmplabJenkins commented Mar 13, 2015

Uh oh!

mengxr commented Mar 13, 2015

Uh oh!

Uh oh!