[WIP]SPARK-1706: Allow multiple executors per worker in Standalone mode #636

CodingCat · 2014-05-04T22:08:08Z

https://issues.apache.org/jira/browse/SPARK-1706

In current implementation, the user has to start multiple workers in a server for starting multiple executors in a server, which introduces additional overhead due to the more JVM processes...

In this patch, I changed the scheduling logic in master to enable the user to start multiple executor processes within the same JVM process.

Other small changes include

change memoryPerSlave in ApplicationDescription to memoryPerExecutor, as "Slave" is overrided to represent both worker and executor in the documents... (we have some discussion on this before?)

@pwendell, I think we don't need to change anything in scheduler part, as we indexed the executor by executorId instead of host IP address?

AmplabJenkins · 2014-05-04T22:12:57Z

Merged build triggered.

AmplabJenkins · 2014-05-04T22:13:06Z

Merged build started.

AmplabJenkins · 2014-05-04T22:14:24Z

Merged build finished.

AmplabJenkins · 2014-05-04T22:14:24Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14649/

AmplabJenkins · 2014-05-04T22:17:57Z

Merged build triggered.

AmplabJenkins · 2014-05-04T22:18:06Z

Merged build started.

AmplabJenkins · 2014-05-04T23:48:46Z

Merged build finished.

AmplabJenkins · 2014-05-04T23:48:47Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14650/

CodingCat · 2014-05-05T00:23:46Z

anyone can help to re-trigger the test?????Jenkins does not like me....

AmplabJenkins · 2014-05-05T01:42:57Z

Merged build triggered.

AmplabJenkins · 2014-05-05T01:43:07Z

Merged build started.

AmplabJenkins · 2014-05-05T03:13:53Z

Merged build finished.

AmplabJenkins · 2014-05-05T03:13:53Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14654/

lianhuiwang · 2014-05-05T06:41:16Z

core/src/main/scala/org/apache/spark/deploy/master/Master.scala

+        var pos = 0
+        while (toAssign > 0) {
+          val assignedCore = math.min(coreNumPerExecutor, toAssign)
+          if (usableWorkers(pos).coresFree - assigned(pos).sum > 0) {


if (usableWorkers(pos).coresFree - assigned(pos).sum > 0) {
that should update:if (usableWorkers(pos).coresFree - assigned(pos).sum >= assignedCore) {

AmplabJenkins · 2014-05-05T11:27:57Z

Merged build triggered.

AmplabJenkins · 2014-05-05T11:28:05Z

Merged build started.

AmplabJenkins · 2014-05-05T12:58:51Z

Merged build finished.

AmplabJenkins · 2014-05-05T12:58:51Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14663/

AmplabJenkins · 2014-05-05T13:07:57Z

Merged build triggered.

AmplabJenkins · 2014-05-05T13:08:05Z

Merged build started.

AmplabJenkins · 2014-05-05T13:42:20Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-05T13:42:20Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14665/

CodingCat · 2014-05-05T13:43:26Z

please put a hold on the review of this PR, found a bug, fixing it

AmplabJenkins · 2014-05-05T16:47:57Z

Merged build triggered.

AmplabJenkins · 2014-05-05T16:48:07Z

Merged build started.

AmplabJenkins · 2014-05-05T18:18:53Z

Merged build finished.

AmplabJenkins · 2014-05-05T18:18:53Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14666/

CodingCat · 2014-05-05T19:54:54Z

I think it is ready for review, but I'm unsure why the test is always stuck in a certain case in ReplSuite

AmplabJenkins · 2014-05-08T21:23:49Z

Merged build finished.

AmplabJenkins · 2014-05-08T21:23:49Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14820/

AmplabJenkins · 2014-05-09T03:37:58Z

Merged build triggered.

AmplabJenkins · 2014-05-09T03:38:05Z

Merged build started.

lianhuiwang · 2014-05-09T04:36:39Z

core/src/main/scala/org/apache/spark/deploy/ApplicationDescription.scala

@@ -28,6 +28,7 @@ private[spark] class ApplicationDescription(
  extends Serializable {

  val user = System.getProperty("user.name", "<unknown>")
-
+  // only valid when spark.executor.multiPerWorker is set to true
+  var maxCorePerExecutor = maxCores


i think in var maxCorePerExecutor = maxCores the two variables are different. maxCores is total core's value of a application. but maxCorePerExecutor is cores of per executor. in schedule() app's leftCoreToAssign come from maxCores value.so two variables cannot be equal.

it's just an initial value

yes i know. but in ApplicationInfo.scala the coresLeft value is same to the value of desc.maxCores. in schedule leftCoreToAssign actually is equto to maxCorePerExecutor. so i think there are not right because leftCoreToAssign is total cores of all executors and maxCorePerExecutor is cores of one executor. i donot know whether you understand it.

why you think "in schedule leftCoreToAssign actually is equto to maxCorePerExecutor", it's the minimum value between (app.coresLeft) and the sum of all worker free cores... var leftCoreToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)

yes,but in ApplicationInfo,app.coresLeft is equal to app.maxCores. so in schedule when the sum of all worker free cores is greater than app.coresLeft, now leftCoreToAssign actually is equal to maxCorePerExecutor.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala#L84, coreLeft implementation,

https://github.com/CodingCat/spark/blob/SPARK-1706/core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala#L63,

yes thanks i see.

AmplabJenkins · 2014-05-09T05:08:46Z

Merged build finished.

AmplabJenkins · 2014-05-09T05:08:47Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14837/

AmplabJenkins · 2014-05-09T10:57:58Z

Merged build triggered.

AmplabJenkins · 2014-05-09T10:58:07Z

Merged build started.

AmplabJenkins · 2014-05-09T11:37:58Z

Merged build triggered.

AmplabJenkins · 2014-05-09T11:38:08Z

Merged build started.

AmplabJenkins · 2014-05-09T12:28:50Z

Merged build finished.

AmplabJenkins · 2014-05-09T12:28:50Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14846/

AmplabJenkins · 2014-05-09T12:57:58Z

Merged build triggered.

AmplabJenkins · 2014-05-09T12:58:03Z

Merged build started.

AmplabJenkins · 2014-05-09T13:08:48Z

Merged build finished.

AmplabJenkins · 2014-05-09T13:08:49Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14847/

AmplabJenkins · 2014-05-09T14:28:45Z

Merged build finished.

AmplabJenkins · 2014-05-09T14:28:45Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14848/

resubmit of #636 for a totally different algorithm https://issues.apache.org/jira/browse/SPARK-1706 In current implementation, the user has to start multiple workers in a server for starting multiple executors in a server, which introduces additional overhead due to the more JVM processes... In this patch, I changed the scheduling logic in master to enable the user to start multiple executor processes within the same JVM process. 1. user configure spark.executor.maxCoreNumPerExecutor to suggest the maximum core he/she would like to allocate to each executor 2. Master assigns the executors to the workers with the major consideration on the memoryPerExecutor and the worker.freeMemory, and tries to allocate as many as possible cores to the executor ```min(min(memoryPerExecutor, worker.freeCore), maxLeftCoreToAssign)``` where ```maxLeftCoreToAssign = maxExecutorCanAssign * maxCoreNumPerExecutor``` --------------------------------------- Other small changes include change memoryPerSlave in ApplicationDescription to memoryPerExecutor, as "Slave" is overrided to represent both worker and executor in the documents... (we have some discussion on this before?) Author: CodingCat <[email protected]> Closes #731 from CodingCat/SPARK-1706-2 and squashes the following commits: 6dee808 [CodingCat] change filter predicate fbeb7e5 [CodingCat] address the comments 940cb42 [CodingCat] avoid unnecessary allocation b8ca561 [CodingCat] revert a change 45967b4 [CodingCat] remove unused method 2eeff77 [CodingCat] stylistic fixes 12a1b32 [CodingCat] change the semantic of coresPerExecutor to exact core number f035423 [CodingCat] stylistic fix d9c1685 [CodingCat] remove unused var f595bd6 [CodingCat] recover some unintentional changes 63b3df9 [CodingCat] change the description of the parameter in the submit script 4cf61f1 [CodingCat] improve the code and docs ff011e2 [CodingCat] start multiple executors on the worker by rewriting startExeuctor logic 2c2bcc5 [CodingCat] fix wrong usage info 497ec2c [CodingCat] address andrew's comments 878402c [CodingCat] change the launching executor code f64a28d [CodingCat] typo fix 387f4ec [CodingCat] bug fix 35c462c [CodingCat] address Andrew's comments 0b64fea [CodingCat] fix compilation issue 19d3da7 [CodingCat] address the comments 5b81466 [CodingCat] remove outdated comments ec7d421 [CodingCat] test commit e5efabb [CodingCat] more java docs and consolidate canUse function a26096d [CodingCat] stylistic fix a5d629a [CodingCat] java doc b34ec0c [CodingCat] make master support multiple executors per worker

[SPARK-30572][BUILD] Add a fallback Maven repository

lianhuiwang reviewed May 5, 2014
View reviewed changes

CodingCat changed the title ~~SPARK-1706: Allow multiple executors per worker in Standalone mode~~ [WIP]SPARK-1706: Allow multiple executors per worker in Standalone mode May 5, 2014

CodingCat changed the title ~~[WIP]SPARK-1706: Allow multiple executors per worker in Standalone mode~~ SPARK-1706: Allow multiple executors per worker in Standalone mode May 5, 2014

style fix

1be60da

bug fix

072dd61

lianhuiwang reviewed May 9, 2014
View reviewed changes

leftCoreToAssign should be limited by mem also

f936a42

capture noEnoughMemWorker every iteration

5b21f87

limit leftCoreToAssign with Mem

ca70401

CodingCat closed this May 11, 2014

CodingCat mentioned this pull request May 11, 2014

SPARK-1706: Allow multiple executors per worker in Standalone mode #731

Closed

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Fix If statement CPO conformance playbook (apache#636)

5d64c3a

rshkv pushed a commit to rshkv/spark that referenced this pull request Feb 27, 2020

Merge pull request apache#636 from palantir/yh/fix-maven

cc41115

[SPARK-30572][BUILD] Add a fallback Maven repository

[WIP]SPARK-1706: Allow multiple executors per worker in Standalone mode #636

[WIP]SPARK-1706: Allow multiple executors per worker in Standalone mode #636

Uh oh!

Conversation

CodingCat commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

AmplabJenkins commented May 4, 2014

Uh oh!

CodingCat commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

CodingCat commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

AmplabJenkins commented May 5, 2014

Uh oh!

CodingCat commented May 5, 2014

Uh oh!

AmplabJenkins commented May 8, 2014

Uh oh!

AmplabJenkins commented May 8, 2014

Uh oh!

AmplabJenkins commented May 9, 2014

Uh oh!

AmplabJenkins commented May 9, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!