[SPARK-22033][CORE] BufferHolder, other size checks should account for the specific VM array size limitations #19266

srowen · 2017-09-18T15:00:28Z

What changes were proposed in this pull request?

Try to avoid allocating an array bigger than Integer.MAX_VALUE - 8, which is the actual max size on some JVMs, in several places

How was this patch tested?

Existing tests

…hich is the actual max size on some JVMs, in several places

srowen · 2017-09-18T15:05:29Z

CC @maropu

maropu · 2017-09-18T15:37:02Z

I though, if this limit highly depends on JVM implementations, better to put the limit as a global variable somewhere (e.g., ARRAY_INT_MAX in spark.util.Utils or other places)? As another option, how about adding internal option in SparkConf for that?

Also, how about making a parent jira tcket to track the similar issue cuz it seems difficult to cover all the possible place in this pr.

srowen · 2017-09-18T16:05:42Z

Yeah, agree, it could be some global constant. I don't think it should be configurable. Ideally it's determined from the JVM, but don't know a way to do that.

In many cases, assuming Int.MaxValue is the max array size when it's Int.MaxValue-8 doesn't matter much. For example, arguably I should leave the ML changes alone here, because, in the very rare case that a matrix size is somewhere between Int.MaxValue-8 and Int.MaxValue, it will fail anyway, and it's not avoidable given the user input. It's also, maybe, more conservative to not always assume anything beyond Int.MaxValue-8 is going to fail, and not "proactively" fail at this cutoff.

However I think there are a smallish number of identifiable cases where Spark can very much avoid the failure (like BufferHolder), and they're the instances where an array size keeps doubling. Maybe we can stick to those clear cases? especially any one that seems to have triggered the original error?

Those cases are few enough and related enough that I'm sure they're just one issue, not several.

ascii766164696D · 2017-09-18T17:35:57Z

common/unsafe/src/main/java/org/apache/spark/unsafe/array/LongArray.java

@@ -39,7 +39,7 @@
  private final long length;

  public LongArray(MemoryBlock memory) {
-    assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 4 billion elements";
+    assert memory.size() < (long) Integer.MAX_VALUE * 8: "Array size > 2.1 billion elements";


assert memory.size() <= (long) (Integer.MAX_VALUE - 8) * 8

We need the same assert below?

spark/common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java

Line 53 in c66d64b

public static MemoryBlock fromLongArray(final long[] array) {

maybe also add the exact number in the error message instead of 2.1 billion

I wasn't sure whether the JVM array size limit applied here, because this represents a non-native array. Still wanted to fix the comment as I saw it. If an array exists, its length is valid, so didn't think that part of MemoryBlock represented an issue.

ascii766164696D · 2017-09-18T17:36:41Z

core/src/main/java/org/apache/spark/unsafe/map/HashMapGrowthStrategy.java

@@ -30,11 +30,15 @@
  HashMapGrowthStrategy DOUBLING = new Doubling();

  class Doubling implements HashMapGrowthStrategy {
+
+    private static final int ARRAY_MAX = Integer.MAX_VALUE - 8;


Maybe worth adding a comment why this value is chosen as the max
Like here
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/ArrayList.java#l223

ascii766164696D · 2017-09-18T17:46:24Z

core/src/main/scala/org/apache/spark/util/collection/CompactBuffer.scala

    }
    val capacity = if (otherElements != null) otherElements.length + 2 else 2
    if (newSize > capacity) {
-      var newArrayLen = 8
+      var newArrayLen = 8L
      while (newSize - 2 > newArrayLen) {


I think we can remove - 2 now since newArrayLen is a Long

ah, I see that it's reserved, wasn't clear to me why - 2, seems like a magic number to me, I see it in other places too.

SparkQA · 2017-09-18T18:06:59Z

Test build #81882 has finished for PR 19266 at commit 8dbfa30.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

ascii766164696D · 2017-09-19T07:29:26Z

core/src/main/scala/org/apache/spark/util/collection/PartitionedPairBuffer.scala

@@ -96,5 +96,5 @@ private[spark] class PartitionedPairBuffer[K, V](initialCapacity: Int = 64)
 }

 private object PartitionedPairBuffer {
-  val MAXIMUM_CAPACITY = Int.MaxValue / 2 // 2 ^ 30 - 1
+  val MAXIMUM_CAPACITY = (Int.MaxValue - 8) / 2


maybe worth adding the Int type even though it's already an Int.

Also the comment at the line 28 should be changed to 1073741819 i.e. - 8/2

ascii766164696D · 2017-09-19T07:34:32Z

mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala

@@ -344,7 +344,7 @@ class Word2Vec extends Serializable with Logging {
    val newSentences = sentences.repartition(numPartitions).cache()
    val initRandom = new XORShiftRandom(seed)

-    if (vocabSize.toLong * vectorSize >= Int.MaxValue) {
+    if (vocabSize.toLong * vectorSize >= Int.MaxValue - 8) {


I think this should be just >

ascii766164696D · 2017-09-19T07:34:51Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala

@@ -304,8 +304,8 @@ class BlockMatrix @Since("1.3.0") (
      s"Int.MaxValue. Currently numRows: ${numRows()}")
    require(numCols() < Int.MaxValue, "The number of columns of this matrix should be less than " +
      s"Int.MaxValue. Currently numCols: ${numCols()}")
-    require(numRows() * numCols() < Int.MaxValue, "The length of the values array must be " +
-      s"less than Int.MaxValue. Currently numRows * numCols: ${numRows() * numCols()}")
+    require(numRows() * numCols() < Int.MaxValue - 8, "The length of the values array must be " +


…mediately; stick to 'avoidable' allocations that are too large

SparkQA · 2017-09-19T13:24:45Z

Test build #81921 has finished for PR 19266 at commit 4594b55.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-09-20T12:02:56Z

Test build #81984 has finished for PR 19266 at commit 9bb9e77.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jiangxb1987

LGTM

srowen · 2017-09-23T14:41:16Z

Merged to master

gatorsmile · 2017-10-08T01:03:52Z

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java

+
+  // Some JVMs can't allocate arrays of length Integer.MAX_VALUE; actual max is somewhat
+  // smaller. Be conservative and lower the cap a little.
+  private static final int ARRAY_MAX = Integer.MAX_VALUE - 8;


Just curious how to get this value -8?

Also cc @liufengdb

Why not also fixing line 54?

@srowen I think we can use Integer.MAX_VALUE - 7 instead of Integer.MAX_VALUE - 8 to make the size align with words, otherwise, this check will fail: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java#L170.

This is the reason why all the size inputs to the methods are rounded, for example, https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java#L216.

@gatorsmile have a look at the JIRA for some detail; you can see a similar limit in the JDK at for example http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/ArrayList.java#l229

You are right, I think around line 54 needs to be something straightforward like:

long totalSize = initialSize + bitsetWidthInBytes + 8L * row.numFields(); if (totalSize > ARRAY_MAX) { ...error... } this.buffer = new byte[(int) totalSize];

Yes I agree with your new JIRA @liufengdb though think we'll need to go the other way to Integer.MAX_VALUE - 15 where the value must be divisible by 8.

gatorsmile · 2017-10-09T16:41:45Z

@srowen Thanks!

@liufengdb Could you submit a separate PR to fix the issues and also please include the test cases?

Try to avoid allocating an array bigger than Integer.MAX_VALUE - 8, w…

8dbfa30

…hich is the actual max size on some JVMs, in several places

ascii766164696D reviewed Sep 18, 2017

View reviewed changes

ascii766164696D reviewed Sep 19, 2017

View reviewed changes

Revert several changes that check conditions that will fail anyway im…

4594b55

…mediately; stick to 'avoidable' allocations that are too large

Update from review comments

9bb9e77

jiangxb1987 approved these changes Sep 22, 2017

View reviewed changes

asfgit closed this in 50ada2a Sep 23, 2017

srowen deleted the SPARK-22033 branch September 24, 2017 16:35

gatorsmile reviewed Oct 8, 2017

View reviewed changes

[SPARK-22033][CORE] BufferHolder, other size checks should account for the specific VM array size limitations #19266

[SPARK-22033][CORE] BufferHolder, other size checks should account for the specific VM array size limitations #19266

Uh oh!

Conversation

srowen commented Sep 18, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

srowen commented Sep 18, 2017

Uh oh!

maropu commented Sep 18, 2017

Uh oh!

srowen commented Sep 18, 2017

Uh oh!

ascii766164696D Sep 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ascii766164696D Sep 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ascii766164696D Sep 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 18, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 19, 2017

Uh oh!

SparkQA commented Sep 20, 2017

Uh oh!

jiangxb1987 left a comment

Choose a reason for hiding this comment

Uh oh!

srowen commented Sep 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Oct 9, 2017

Uh oh!

Uh oh!

ascii766164696D Sep 18, 2017 •

edited

Loading

ascii766164696D Sep 18, 2017 •

edited

Loading

ascii766164696D Sep 19, 2017 •

edited

Loading