Skip to content

Commit ee07541

Browse files
srowenmengxr
authored andcommitted
SPARK-2748 [MLLIB] [GRAPHX] Loss of precision for small arguments to Math.exp, Math.log
In a few places in MLlib, an expression of the form `log(1.0 + p)` is evaluated. When p is so small that `1.0 + p == 1.0`, the result is 0.0. However the correct answer is very near `p`. This is why `Math.log1p` exists. Similarly for one instance of `exp(m) - 1` in GraphX; there's a special `Math.expm1` method. While the errors occur only for very small arguments, given their use in machine learning algorithms, this is entirely possible. Also note the related PR for Python: #1652 Author: Sean Owen <[email protected]> Closes #1659 from srowen/SPARK-2748 and squashes the following commits: c5926d4 [Sean Owen] Use log1p, expm1 for better precision for tiny arguments
1 parent 7c5fc28 commit ee07541

File tree

2 files changed

+8
-6
lines changed

2 files changed

+8
-6
lines changed

graphx/src/main/scala/org/apache/spark/graphx/util/GraphGenerators.scala

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,10 @@ object GraphGenerators {
100100
*/
101101
private def sampleLogNormal(mu: Double, sigma: Double, maxVal: Int): Int = {
102102
val rand = new Random()
103-
val m = math.exp(mu + (sigma * sigma) / 2.0)
104-
val s = math.sqrt((math.exp(sigma*sigma) - 1) * math.exp(2*mu + sigma*sigma))
103+
val sigmaSq = sigma * sigma
104+
val m = math.exp(mu + sigmaSq / 2.0)
105+
// expm1 is exp(m)-1 with better accuracy for tiny m
106+
val s = math.sqrt(math.expm1(sigmaSq) * math.exp(2*mu + sigmaSq))
105107
// Z ~ N(0, 1)
106108
var X: Double = maxVal
107109

mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,9 @@ class LogisticGradient extends Gradient {
6868
val gradient = brzData * gradientMultiplier
6969
val loss =
7070
if (label > 0) {
71-
math.log(1 + math.exp(margin))
71+
math.log1p(math.exp(margin)) // log1p is log(1+p) but more accurate for small p
7272
} else {
73-
math.log(1 + math.exp(margin)) - margin
73+
math.log1p(math.exp(margin)) - margin
7474
}
7575

7676
(Vectors.fromBreeze(gradient), loss)
@@ -89,9 +89,9 @@ class LogisticGradient extends Gradient {
8989
brzAxpy(gradientMultiplier, brzData, cumGradient.toBreeze)
9090

9191
if (label > 0) {
92-
math.log(1 + math.exp(margin))
92+
math.log1p(math.exp(margin))
9393
} else {
94-
math.log(1 + math.exp(margin)) - margin
94+
math.log1p(math.exp(margin)) - margin
9595
}
9696
}
9797
}

0 commit comments

Comments
 (0)