Skip to content

[SPARK-11078] Ensure spilling tests actually spill #9124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

andrewor14
Copy link
Contributor

#9084 uncovered that many tests that test spilling don't actually spill. This is a follow-up patch to fix that to ensure our unit tests actually catch potential bugs in spilling. The size of this patch is inflated by the refactoring of ExternalSorterSuite, which had a lot of duplicate code and logic.

Andrew Or added 4 commits October 14, 2015 14:33
This commit does several things:
- remove noisy warning in GrantEverythingMemoryManager
- remove duplciate code in ExternalSorterSuite
- add a force spill threshold to make it easier to verify spilling
- ensure spilling tests actually spill in ExternalSorterSuite
@SparkQA
Copy link

SparkQA commented Oct 15, 2015

Test build #43747 has finished for PR 9124 at commit 1b7fa3d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 15, 2015

Test build #1904 has finished for PR 9124 at commit 1b7fa3d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

sc = new SparkContext("local-cluster[1,1,1024]", "test", conf)

def createCombiner(i: String): ArrayBuffer[String] = ArrayBuffer[String](i)
def mergeValue(buffer: ArrayBuffer[String], i: String): ArrayBuffer[String] = buffer += i
def mergeCombiners(buffer1: ArrayBuffer[String], buffer2: ArrayBuffer[String])
: ArrayBuffer[String] = buffer1 ++= buffer2
: ArrayBuffer[String] = buffer1 ++= buffer2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

def mergeCombiners(
    buffer1: ArrayBuffer[String],
    buffer2: ArrayBuffer[String]): ArrayBuffer[String]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I screwed that up

@cloud-fan
Copy link
Contributor

LGTM over all

@andrewor14
Copy link
Contributor Author

OK, merging.

@asfgit asfgit closed this in 3b364ff Oct 15, 2015
@andrewor14 andrewor14 deleted the spilling-tests branch October 15, 2015 21:57
@SparkQA
Copy link

SparkQA commented Oct 16, 2015

Test build #43812 has finished for PR 9124 at commit 7590b77.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

markhamstra pushed a commit to markhamstra/spark that referenced this pull request Oct 16, 2015
Author: Andrew Or <[email protected]>

Closes apache#9124 from andrewor14/spilling-tests.
@srowen
Copy link
Member

srowen commented Jun 24, 2016

@andrewor14 looks like we still have some failures of the form...

- spilling with compression *** FAILED ***
  java.lang.Exception: Test failed with compression using codec org.apache.spark.io.LZ4CompressionCodec:

assertion failed: expected groupByKey to spill, but did not
  at scala.Predef$.assert(Predef.scala:170)
  at org.apache.spark.TestUtils$.assertSpilled(TestUtils.scala:170)
  at org.apache.spark.util.collection.ExternalAppendOnlyMapSuite.org$apache$spark$util$collection$ExternalAppendOnlyMapSuite$$testSimpleSpilling(ExternalAppendOnlyMapSuite.scala:253)
  at org.apache.spark.util.collection.ExternalAppendOnlyMapSuite$$anonfun$10$$anonfun$apply$mcV$sp$8.apply(ExternalAppendOnlyMapSuite.scala:218)
  at org.apache.spark.util.collection.ExternalAppendOnlyMapSuite$$anonfun$10$$anonfun$apply$mcV$sp$8.apply(ExternalAppendOnlyMapSuite.scala:216)
  at scala.collection.immutable.Stream.foreach(Stream.scala:594)
  ...

Do you think it's possible that somehow there's a race here in reporting the metrics? so that things are spilling but not reported to the listener by the time the number of spilled tasks is checked?

I could try putting in an eventually to let it check repeatedly for a short time? any better ideas, or reasons to think that's not it?

@andrewor14
Copy link
Contributor Author

I see, that's possible. The right thing to do here is to add a wait of some sort in the listener to block until we have received the stage completed event. We've done this in some other test listeners as well. (I can't remember which ones off the top of my head).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants