Skip to content

[SPARK-6278][MLLIB] Mention the change of objective in linear regression #4978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

mengxr
Copy link
Contributor

@mengxr mengxr commented Mar 11, 2015

As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. @srowen

@@ -107,6 +107,7 @@ In the `spark.mllib` package, there were several breaking changes. The first ch
* In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.)
* In `Strategy`, the `checkpointDir` parameter has been removed. Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training.
* `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`. This was never meant for external use.
* In linear regression (including Lasso and ridge regression), we scaled the squared loss by 0.5. So in order to produce the same result as in 1.2, the step size you chose needs to be scaled by 2.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apache is out-of-sync again. This is the only change in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chose -> choose, and I suppose it could be made a little more crystal-clear by saying the step size needs to be multiplied by 2 ('scaled' somehow could mean divide or multiply, to me).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@SparkQA
Copy link

SparkQA commented Mar 11, 2015

Test build #28465 has started for PR 4978 at commit f87ae71.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 11, 2015

Test build #28465 has finished for PR 4978 at commit f87ae71.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (val clusterCenters: Array[Vector]) extends Saveable with Serializable

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28465/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 12, 2015

Test build #28506 has started for PR 4978 at commit bfd6cff.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 12, 2015

Test build #28506 has finished for PR 4978 at commit bfd6cff.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28506/
Test FAILed.

@@ -102,6 +102,7 @@ In the `spark.mllib` package, there were several breaking changes. The first ch
* In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.)
* In `Strategy`, the `checkpointDir` parameter has been removed. Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training.
* `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`. This was never meant for external use.
* In linear regression (including Lasso and ridge regression), the squared loss is now divided by 2. So in order to produce the same result as in 1.2, the step size you choose needs to be multiplied by 2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, it also occurred to me that if the step size doubles, then it affects the regularization parameter as well. Doesn't it have to be half as large as well in order to get the same result? I'm probably overlooking something about the formulation, but I didn't see the reg param updated in a96b727 and if the loss term was halved, leaving all else equal, the regularization term is relatively twice as large right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The L2 regularization term didn't change. So to generate the exact result, we need to reduce the regularization constant by half and multiply the step size by 2.

@SparkQA
Copy link

SparkQA commented Mar 12, 2015

Test build #28539 has started for PR 4978 at commit fb3bbe6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 13, 2015

Test build #28539 has finished for PR 4978 at commit fb3bbe6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28539/
Test PASSed.

@mengxr
Copy link
Contributor Author

mengxr commented Mar 13, 2015

Merged into master and branch-1.3.

@asfgit asfgit closed this in 7f13434 Mar 13, 2015
asfgit pushed a commit that referenced this pull request Mar 13, 2015
As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. srowen

Author: Xiangrui Meng <[email protected]>

Closes #4978 from mengxr/SPARK-6278 and squashes the following commits:

fb3bbe6 [Xiangrui Meng] mention regularization parameter
bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-6278
375fd09 [Xiangrui Meng] address Sean's comments
f87ae71 [Xiangrui Meng] mention step size change

(cherry picked from commit 7f13434)
Signed-off-by: Xiangrui Meng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants