-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-5338][MESOS] Add cluster mode support for Mesos #5144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
e3facdd
Add Mesos Cluster dispatcher
tnachen 67cbc18
Rename StandaloneRestClient to RestClient and add sbin scripts
tnachen 9986731
Kill drivers when shutdown
tnachen 880bc27
Add Mesos Cluster UI to display driver results
tnachen 7179495
Change Driver page output and add logging
tnachen e775001
Support fetching remote uris in driver runner.
tnachen 4b2f5ef
Specify user jar in command to be replaced with local.
tnachen 5b7a12b
WIP: Making a cluster mode a mesos framework.
tnachen 0fa7780
Launch task through the mesos scheduler
b8e7181
Adds a shutdown latch to keep the deamon running
825afa0
Supports more spark-submit parameters
d57d77d
Add documentation
tnachen 8ec76bc
Fix Mesos dispatcher UI.
tnachen 6887e5e
Support looking at SPARK_EXECUTOR_URI env variable in schedulers
tnachen 543a98d
Schedule multiple jobs
tnachen febfaba
Bound the finished drivers in memory
tnachen 3d4dfa1
Adds support to kill submissions
371ce65
Handle cluster mode recovery and state persistence.
tnachen e0f33f7
Add supervise support and persist retries.
tnachen 7f214c2
Fix RetryState visibility
tnachen 862b5b5
Support asking driver status when it's retrying.
tnachen 920fc4b
Fix scala style issues.
tnachen a46ad66
Allow zk cli param override.
tnachen 7252612
Fix tests.
tnachen 20f7284
Address review comments
tnachen df355cd
Add metrics to mesos cluster scheduler.
tnachen 6ff8e5c
Address comments and add logging.
tnachen 17f93a2
Fix head of line blocking in scheduling drivers.
tnachen f7d8046
Change app name to spark cluster.
tnachen c6c6b73
Pass spark properties to mesos cluster tasks.
tnachen 1553230
Address review comments.
tnachen fd5259d
Address review comments.
tnachen e324ac1
Fix merge.
tnachen 390c491
Fix zk conf key for mesos zk engine.
tnachen e24b512
Persist submitted driver.
tnachen 069e946
Fix rebase.
tnachen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
116 changes: 116 additions & 0 deletions
116
core/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcher.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.deploy.mesos | ||
|
||
import java.util.concurrent.CountDownLatch | ||
|
||
import org.apache.spark.deploy.mesos.ui.MesosClusterUI | ||
import org.apache.spark.deploy.rest.mesos.MesosRestServer | ||
import org.apache.spark.scheduler.cluster.mesos._ | ||
import org.apache.spark.util.SignalLogger | ||
import org.apache.spark.{Logging, SecurityManager, SparkConf} | ||
|
||
/* | ||
* A dispatcher that is responsible for managing and launching drivers, and is intended to be | ||
* used for Mesos cluster mode. The dispatcher is a long-running process started by the user in | ||
* the cluster independently of Spark applications. | ||
* It contains a [[MesosRestServer]] that listens for requests to submit drivers and a | ||
* [[MesosClusterScheduler]] that processes these requests by negotiating with the Mesos master | ||
* for resources. | ||
* | ||
* A typical new driver lifecycle is the following: | ||
* - Driver submitted via spark-submit talking to the [[MesosRestServer]] | ||
* - [[MesosRestServer]] queues the driver request to [[MesosClusterScheduler]] | ||
* - [[MesosClusterScheduler]] gets resource offers and launches the drivers that are in queue | ||
* | ||
* This dispatcher supports both Mesos fine-grain or coarse-grain mode as the mode is configurable | ||
* per driver launched. | ||
* This class is needed since Mesos doesn't manage frameworks, so the dispatcher acts as | ||
* a daemon to launch drivers as Mesos frameworks upon request. The dispatcher is also started and | ||
* stopped by sbin/start-mesos-dispatcher and sbin/stop-mesos-dispatcher respectively. | ||
*/ | ||
private[mesos] class MesosClusterDispatcher( | ||
args: MesosClusterDispatcherArguments, | ||
conf: SparkConf) | ||
extends Logging { | ||
|
||
private val publicAddress = Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(args.host) | ||
private val recoveryMode = conf.get("spark.mesos.deploy.recoveryMode", "NONE").toUpperCase() | ||
logInfo("Recovery mode in Mesos dispatcher set to: " + recoveryMode) | ||
|
||
private val engineFactory = recoveryMode match { | ||
case "NONE" => new BlackHoleMesosClusterPersistenceEngineFactory | ||
case "ZOOKEEPER" => new ZookeeperMesosClusterPersistenceEngineFactory(conf) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will cause a compilation warning complaining that the match is not exhaustive. If the user provides a random string then this will fail at run time with a bad message. Did you mean to make |
||
case _ => throw new IllegalArgumentException("Unsupported recovery mode: " + recoveryMode) | ||
} | ||
|
||
private val scheduler = new MesosClusterScheduler(engineFactory, conf) | ||
|
||
private val server = new MesosRestServer(args.host, args.port, conf, scheduler) | ||
private val webUi = new MesosClusterUI( | ||
new SecurityManager(conf), | ||
args.webUiPort, | ||
conf, | ||
publicAddress, | ||
scheduler) | ||
|
||
private val shutdownLatch = new CountDownLatch(1) | ||
|
||
def start(): Unit = { | ||
webUi.bind() | ||
scheduler.frameworkUrl = webUi.activeWebUiUrl | ||
scheduler.start() | ||
server.start() | ||
} | ||
|
||
def awaitShutdown(): Unit = { | ||
shutdownLatch.await() | ||
} | ||
|
||
def stop(): Unit = { | ||
webUi.stop() | ||
server.stop() | ||
scheduler.stop() | ||
shutdownLatch.countDown() | ||
} | ||
} | ||
|
||
private[mesos] object MesosClusterDispatcher extends Logging { | ||
def main(args: Array[String]) { | ||
SignalLogger.register(log) | ||
val conf = new SparkConf | ||
val dispatcherArgs = new MesosClusterDispatcherArguments(args, conf) | ||
conf.setMaster(dispatcherArgs.masterUrl) | ||
conf.setAppName(dispatcherArgs.name) | ||
dispatcherArgs.zookeeperUrl.foreach { z => | ||
conf.set("spark.mesos.deploy.recoveryMode", "ZOOKEEPER") | ||
conf.set("spark.mesos.deploy.zookeeper.url", z) | ||
} | ||
val dispatcher = new MesosClusterDispatcher(dispatcherArgs, conf) | ||
dispatcher.start() | ||
val shutdownHook = new Thread() { | ||
override def run() { | ||
logInfo("Shutdown hook is shutting down dispatcher") | ||
dispatcher.stop() | ||
dispatcher.awaitShutdown() | ||
} | ||
} | ||
Runtime.getRuntime.addShutdownHook(shutdownHook) | ||
dispatcher.awaitShutdown() | ||
} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically not needed since we already do another level of validation in the
RestSubmissionClient
, but OK to keep