Skip to content

[SPARK-5214][Core] Add EventLoop and change DAGScheduler to an EventLoop #4016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Jan 13, 2015

This PR adds a simple EventLoop and use it to replace Actor in DAGScheduler. EventLoop is a general class to support that posting events in multiple threads and handling events in a single event thread.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 13, 2015

cc @rxin

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25466 has started for PR 4016 at commit 3b2e59c.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25468 has started for PR 4016 at commit 1f73eac.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25466 has finished for PR 4016 at commit 3b2e59c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class EventLoop[E](name: String) extends Logging

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25466/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25468 has finished for PR 4016 at commit 1f73eac.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class EventLoop[E](name: String) extends Logging

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25468/
Test FAILed.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 13, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25469 has started for PR 4016 at commit 1f73eac.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 13, 2015

Test build #25469 has finished for PR 4016 at commit 1f73eac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class EventLoop[E](name: String) extends Logging

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25469/
Test PASSed.

* An event loop to receive events from the caller and process all events in the event thread. It
* will start an exclusive event thread to process all events.
*/
abstract class EventLoop[E](name: String) extends Logging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be private[spark]

@rxin
Copy link
Contributor

rxin commented Jan 14, 2015

Looks pretty good to me. Since it is an important component, might be worth getting more pairs of eyes to look at it.

cc @aarondav, @kayousterhout, @markhamstra

}

def stop(): Unit = {
eventThread.interrupt()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the interrupt() flag may be cleared and an exception never thrown, so I don't think we should rely only on this mechanism to stop the thread (in particular if the event loop is inside the onReceive() method). Can we also set a volatile variable which is checked in the while loop?

import org.scalatest.concurrent.Eventually._
import org.scalatest.FunSuite

class EventLoopSuite extends FunSuite {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand the tests for this class to include things like:

  • Throwing an error within onError()
  • Stopping the event loop if the onReceive() is inside something like a Lock#acquireUninterruptibly()
  • Post events from different threads and make sure nothing throws a ConcurrentModificationException :)

Currently the implementation is simple, but I'd like to make sure that future changes don't break some property of event loops which the DAGScheduler rarely or never exercises.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stopping the event loop if the onReceive() is inside something like a Lock#acquireUninterruptibly()

It cannot interrupt acquireUninterruptibly. I have not found any place in Spark using it. What test are you suggesting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I mean make sure that the event loop is eventually stopped despite stop() being called while the onReceive() was doing a busy-wait of clearing the interrupted flag, or calling an uninterruptible wait on a lock. However, it may be overly much trouble to set up such a condition and the new boolean flag probably works :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added test("EventLoop: onReceive swallows InterruptException") to test clearing the interrupted flag.

@markhamstra
Copy link
Contributor

Yes, of course, we can do this -- it's more-or-less going back to what we had before: 2539c06

I'm not seeing a lot of discussion about why we are considering this and the broader replacement of Akka. Is there some more discussion or motivation than what appears in the two JIRAs, SPARK-5124 and SPARK-5214?

@rxin
Copy link
Contributor

rxin commented Jan 14, 2015

We are not removing Akka or considering removing Akka yet. It is just building it in a way that we can remove the dependency if we want to in the future. If we do consider, we will definitely have a broader discussion. If we ever do that, it'd be for making networking easier (both debugging and deployment), and enabling our users to use Akka (using Akka, especially a different version of it for an app on top of Spark is a mess right now. Spark not depending on Akka will make it easier for applications on top of Spark to use Akka).

@rxin
Copy link
Contributor

rxin commented Jan 14, 2015

BTW I'm not sure why we ever did 2539c06 in the first place, other than making things more Scala-y.

@aarondav
Copy link
Contributor

I hope that this is the beginning of a long, drawn-out series of commits that Akka-ize and de-Akka-ize various components of Spark, followed by the short but bloody Holy Akka Crusades, ultimately resulting in a complete rewrite of Spark in Erlang.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25526 has started for PR 4016 at commit 227bf33.

  • This patch merges cleanly.

}

def stop(): Unit = {
if (stopped.compareAndSet(false ,true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: comma space

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25534 has started for PR 4016 at commit 5cfac83.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25526 has finished for PR 4016 at commit 227bf33.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25526/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25529 has finished for PR 4016 at commit 460f7b3.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25529/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Jan 14, 2015

Test build #25534 has finished for PR 4016 at commit 5cfac83.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25534/
Test PASSed.

val eventLoop = new EventLoop[Int]("test") {

override def onReceive(event: Int): Unit = {
receivedEventsCount += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is += safe on a volatile int? I wouldn't think it's actually rewritten as a compareAndSwap loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is += safe on a volatile int?

It's safe. onReceive must be called in the event thread. Not concurrency here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

volatile is used to make sure we can read receivedEventsCount correctly outside the event thread.

@aarondav
Copy link
Contributor

LGTM, code-wise. You might run the unit tests many times locally to try to find race conditions. For similar situations, I've used something like

echo START: `date` > run_log; while sbt/sbt 'core/test-only *EventLoopSuite'; do echo SUCCESS: `date` >> run_log; done; echo FAILED: `date` >> run_log

and just let it run for a while.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 15, 2015

echo START: date > run_log; while sbt/sbt 'core/test-only *EventLoopSuite'; do echo SUCCESS: date >> run_log; done; echo FAILED: date >> run_log

Cool. Ran it 10 minutes in my machine and it was successful.

}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra line?

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25630 has started for PR 4016 at commit aefa1ce.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 16, 2015

Test build #25630 has finished for PR 4016 at commit aefa1ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25630/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Jan 20, 2015

Thanks. Merging in master.

@asfgit asfgit closed this in e69fb8c Jan 20, 2015
@zsxwing zsxwing deleted the event-loop branch January 20, 2015 02:33
bomeng pushed a commit to Huawei-Spark/spark that referenced this pull request Jan 21, 2015
This PR adds a simple `EventLoop` and use it to replace Actor in DAGScheduler. `EventLoop` is a general class to support that posting events in multiple threads and handling events in a single event thread.

Author: zsxwing <[email protected]>

Closes apache#4016 from zsxwing/event-loop and squashes the following commits:

aefa1ce [zsxwing] Add protected to on*** methods
5cfac83 [zsxwing] Remove null check of eventProcessLoop
dba35b2 [zsxwing] Add a test that onReceive swallows InterruptException
460f7b3 [zsxwing] Use volatile instead of Atomic things in unit tests
227bf33 [zsxwing] Add a stop flag and some tests
37f79c6 [zsxwing] Fix docs
55fb6f6 [zsxwing] Add private[spark] to EventLoop
1f73eac [zsxwing] Fix the import order
3b2e59c [zsxwing] Add EventLoop and change DAGScheduler to an EventLoop
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants