[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R #3746

dbtsai · 2014-12-19T20:29:21Z

In most of the academic paper and algorithm implementations,
people use L = 1/2n ||A weights-y||^2 instead of L = 1/n ||A weights-y||^2
for least-squared loss. See Eq. (1) in http://web.stanford.edu/~hastie/Papers/glmnet.pdf

Since MLlib uses different convention, this will result different residuals and
all the stats properties will be different from GLMNET package in R.

The model coefficients will be still the same under this change.

SparkQA · 2014-12-19T20:32:33Z

Test build #24656 has started for PR 3746 at commit 0b2c29c.

This patch merges cleanly.

srowen · 2014-12-19T20:34:23Z

Seems reasonable to me.

SparkQA · 2014-12-19T21:36:15Z

Test build #24656 has finished for PR 3746 at commit 0b2c29c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-19T21:36:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24656/
Test FAILed.

SparkQA · 2014-12-19T21:52:32Z

Test build #24657 has started for PR 3746 at commit 19c2e85.

This patch merges cleanly.

SparkQA · 2014-12-19T23:13:36Z

Test build #24657 has finished for PR 3746 at commit 19c2e85.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-12-19T23:13:40Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24657/
Test PASSed.

bryanyang0528 · 2014-12-22T10:58:47Z

On my opinion, I don't think the parameter of the cost function is 1/m or 1/2m is the critical deference.
Across the cost function L = alpha * 1/2n ||A weights-y||^2 (alpha is the learning rate), we can control the learning rate to acquire the same result no mater 1/m or 1/2m.

srowen · 2014-12-22T11:11:16Z

@bryanyang0528 I don't think anyone's suggesting that the extra factor of 1/2 is more or less correct or desirable per se. The solution doesn't depend on the absolute value of the loss function, but its minimum only. But I think the question here is being consistent with the loss function as implemented by other software packages, so that the absolute value can be compared, for the same setting of learning rate, overfitting param, etc.

bryanyang0528 · 2014-12-22T13:21:07Z

@srowen I agree on that need a absolute value can be compared with others software. Maybe it would add a parameter to control the extra factor?

dbtsai · 2014-12-22T14:33:43Z

@bryanyang0528 The learning rate issue here is different story. With modern optimization algorithms like LBFGS and OWLQN, the learning rate is not required. The algorithms will find the step size in line search step. As @srowen pointed out, the statistical property of model will be different without the 1/2 factor compared with other package. At Alpine Data Labs, I implemented generalized linear model with elastic net (mixing L1 and L2) using OWLQN in Spark, I can train and get exactly the same coefficients and the same statistical property for model including std error, p-value, t-value, residual plot, and QQ plot, etc. For lots of our customers in financial industry, those stats are very important, and it's required to get the same solution compared with well-known R's reference implementation with scalability. Without the PR, the coefficients can be the same, but the stats will be different.

Although I only have limited time on contributing to open source project, I'll try to have most of my work available in Spark 1.3.

bryanyang0528 · 2014-12-22T15:15:56Z

@dbtsai Thank you for your clear explanation which helps me alot!

mengxr · 2014-12-23T00:43:04Z

LGTM. Merged into master. Thanks!

first commit

0b2c29c

make stepsize twice to converge to the same solution

19c2e85

asfgit closed this in a96b727 Dec 23, 2014

dbtsai deleted the lir branch January 14, 2015 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R #3746

[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R #3746

Uh oh!

dbtsai commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

srowen commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

AmplabJenkins commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

AmplabJenkins commented Dec 19, 2014

Uh oh!

bryanyang0528 commented Dec 22, 2014

Uh oh!

srowen commented Dec 22, 2014

Uh oh!

bryanyang0528 commented Dec 22, 2014

Uh oh!

dbtsai commented Dec 22, 2014

Uh oh!

bryanyang0528 commented Dec 22, 2014

Uh oh!

mengxr commented Dec 23, 2014

Uh oh!

Uh oh!

[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R #3746

[SPARK-4907][MLlib] Inconsistent loss and gradient in LeastSquaresGradient compared with R #3746

Uh oh!

Conversation

dbtsai commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

srowen commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

AmplabJenkins commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

SparkQA commented Dec 19, 2014

Uh oh!

AmplabJenkins commented Dec 19, 2014

Uh oh!

bryanyang0528 commented Dec 22, 2014

Uh oh!

srowen commented Dec 22, 2014

Uh oh!

bryanyang0528 commented Dec 22, 2014

Uh oh!

dbtsai commented Dec 22, 2014

Uh oh!

bryanyang0528 commented Dec 22, 2014

Uh oh!

mengxr commented Dec 23, 2014

Uh oh!

Uh oh!