-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-6278][MLLIB] Mention the change of objective in linear regression #4978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -107,6 +107,7 @@ In the `spark.mllib` package, there were several breaking changes. The first ch | |||
* In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.) | |||
* In `Strategy`, the `checkpointDir` parameter has been removed. Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training. | |||
* `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`. This was never meant for external use. | |||
* In linear regression (including Lasso and ridge regression), we scaled the squared loss by 0.5. So in order to produce the same result as in 1.2, the step size you chose needs to be scaled by 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apache is out-of-sync again. This is the only change in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chose -> choose, and I suppose it could be made a little more crystal-clear by saying the step size needs to be multiplied by 2 ('scaled' somehow could mean divide or multiply, to me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Test build #28465 has started for PR 4978 at commit
|
Test build #28465 has finished for PR 4978 at commit
|
Test PASSed. |
Test build #28506 has started for PR 4978 at commit
|
Test build #28506 has finished for PR 4978 at commit
|
Test FAILed. |
@@ -102,6 +102,7 @@ In the `spark.mllib` package, there were several breaking changes. The first ch | |||
* In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.) | |||
* In `Strategy`, the `checkpointDir` parameter has been removed. Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training. | |||
* `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`. This was never meant for external use. | |||
* In linear regression (including Lasso and ridge regression), the squared loss is now divided by 2. So in order to produce the same result as in 1.2, the step size you choose needs to be multiplied by 2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, it also occurred to me that if the step size doubles, then it affects the regularization parameter as well. Doesn't it have to be half as large as well in order to get the same result? I'm probably overlooking something about the formulation, but I didn't see the reg param updated in a96b727 and if the loss term was halved, leaving all else equal, the regularization term is relatively twice as large right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The L2 regularization term didn't change. So to generate the exact result, we need to reduce the regularization constant by half and multiply the step size by 2.
Test build #28539 has started for PR 4978 at commit
|
Test build #28539 has finished for PR 4978 at commit
|
Test PASSed. |
Merged into master and branch-1.3. |
As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. srowen Author: Xiangrui Meng <[email protected]> Closes #4978 from mengxr/SPARK-6278 and squashes the following commits: fb3bbe6 [Xiangrui Meng] mention regularization parameter bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-6278 375fd09 [Xiangrui Meng] address Sean's comments f87ae71 [Xiangrui Meng] mention step size change (cherry picked from commit 7f13434) Signed-off-by: Xiangrui Meng <[email protected]>
As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. @srowen