-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[WIP]SPARK-1706: Allow multiple executors per worker in Standalone mode #636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14649/ |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14650/ |
anyone can help to re-trigger the test?????Jenkins does not like me.... |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14654/ |
var pos = 0 | ||
while (toAssign > 0) { | ||
val assignedCore = math.min(coreNumPerExecutor, toAssign) | ||
if (usableWorkers(pos).coresFree - assigned(pos).sum > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (usableWorkers(pos).coresFree - assigned(pos).sum > 0) {
that should update:if (usableWorkers(pos).coresFree - assigned(pos).sum >= assignedCore) {
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14663/ |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
please put a hold on the review of this PR, found a bug, fixing it |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14666/ |
I think it is ready for review, but I'm unsure why the test is always stuck in a certain case in ReplSuite |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14820/ |
Merged build triggered. |
Merged build started. |
@@ -28,6 +28,7 @@ private[spark] class ApplicationDescription( | |||
extends Serializable { | |||
|
|||
val user = System.getProperty("user.name", "<unknown>") | |||
|
|||
// only valid when spark.executor.multiPerWorker is set to true | |||
var maxCorePerExecutor = maxCores |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think in var maxCorePerExecutor = maxCores the two variables are different. maxCores is total core's value of a application. but maxCorePerExecutor is cores of per executor. in schedule() app's leftCoreToAssign come from maxCores value.so two variables cannot be equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just an initial value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i know. but in ApplicationInfo.scala the coresLeft value is same to the value of desc.maxCores. in schedule leftCoreToAssign actually is equto to maxCorePerExecutor. so i think there are not right because leftCoreToAssign is total cores of all executors and maxCorePerExecutor is cores of one executor. i donot know whether you understand it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why you think "in schedule leftCoreToAssign actually is equto to maxCorePerExecutor", it's the minimum value between (app.coresLeft) and the sum of all worker free cores... var leftCoreToAssign = math.min(app.coresLeft, usableWorkers.map(_.coresFree).sum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes,but in ApplicationInfo,app.coresLeft is equal to app.maxCores. so in schedule when the sum of all worker free cores is greater than app.coresLeft, now leftCoreToAssign actually is equal to maxCorePerExecutor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes thanks i see.
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14837/ |
Merged build triggered. |
Merged build started. |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14846/ |
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14847/ |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14848/ |
resubmit of #636 for a totally different algorithm https://issues.apache.org/jira/browse/SPARK-1706 In current implementation, the user has to start multiple workers in a server for starting multiple executors in a server, which introduces additional overhead due to the more JVM processes... In this patch, I changed the scheduling logic in master to enable the user to start multiple executor processes within the same JVM process. 1. user configure spark.executor.maxCoreNumPerExecutor to suggest the maximum core he/she would like to allocate to each executor 2. Master assigns the executors to the workers with the major consideration on the memoryPerExecutor and the worker.freeMemory, and tries to allocate as many as possible cores to the executor ```min(min(memoryPerExecutor, worker.freeCore), maxLeftCoreToAssign)``` where ```maxLeftCoreToAssign = maxExecutorCanAssign * maxCoreNumPerExecutor``` --------------------------------------- Other small changes include change memoryPerSlave in ApplicationDescription to memoryPerExecutor, as "Slave" is overrided to represent both worker and executor in the documents... (we have some discussion on this before?) Author: CodingCat <[email protected]> Closes #731 from CodingCat/SPARK-1706-2 and squashes the following commits: 6dee808 [CodingCat] change filter predicate fbeb7e5 [CodingCat] address the comments 940cb42 [CodingCat] avoid unnecessary allocation b8ca561 [CodingCat] revert a change 45967b4 [CodingCat] remove unused method 2eeff77 [CodingCat] stylistic fixes 12a1b32 [CodingCat] change the semantic of coresPerExecutor to exact core number f035423 [CodingCat] stylistic fix d9c1685 [CodingCat] remove unused var f595bd6 [CodingCat] recover some unintentional changes 63b3df9 [CodingCat] change the description of the parameter in the submit script 4cf61f1 [CodingCat] improve the code and docs ff011e2 [CodingCat] start multiple executors on the worker by rewriting startExeuctor logic 2c2bcc5 [CodingCat] fix wrong usage info 497ec2c [CodingCat] address andrew's comments 878402c [CodingCat] change the launching executor code f64a28d [CodingCat] typo fix 387f4ec [CodingCat] bug fix 35c462c [CodingCat] address Andrew's comments 0b64fea [CodingCat] fix compilation issue 19d3da7 [CodingCat] address the comments 5b81466 [CodingCat] remove outdated comments ec7d421 [CodingCat] test commit e5efabb [CodingCat] more java docs and consolidate canUse function a26096d [CodingCat] stylistic fix a5d629a [CodingCat] java doc b34ec0c [CodingCat] make master support multiple executors per worker
[SPARK-30572][BUILD] Add a fallback Maven repository
https://issues.apache.org/jira/browse/SPARK-1706
In current implementation, the user has to start multiple workers in a server for starting multiple executors in a server, which introduces additional overhead due to the more JVM processes...
In this patch, I changed the scheduling logic in master to enable the user to start multiple executor processes within the same JVM process.
Other small changes include
@pwendell, I think we don't need to change anything in scheduler part, as we indexed the executor by executorId instead of host IP address?