-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-3015] Block on cleaning tasks to prevent Akka timeouts #1931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 1931. This patch merges cleanly. |
QA results for PR 1931: |
QA tests have started for PR 1931. This patch merges cleanly. |
QA results for PR 1931: |
The previous code used the length of the referenceBuffer, which is the number of elements registered for clean-up, rather than the number of elements registered AND de-referenced. What we want is the length of the referenceQueue. However, Java does not expose this, so we must access it through reflection. Since this is potentially expensive, we need to limit the number of times we access the queue length this way.
QA tests have started for PR 1931. This patch merges cleanly. |
QA results for PR 1931: |
} | ||
} catch { | ||
case e: Exception => | ||
logDebug("Failed to access reference queue's length through reflection: " + e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note on why this is logDebug and not logWarning/logError.
Its a little ugly that the ContextCleaner class is being polluted with so many parameters, and all the temporary queue length code. Wouldnt it be much cleaner if we make a custom ReferenceQueue, which has the field length(), that does this reflection on itself to find the queue length. All the iteration counter, queue length checking and error message printing code can go inside that ReferenceQueue implementation, which is cleanly separated from the main context cleaner logic. |
Yeah, sounds good. I guess we'll use a |
This simplifies the PR significantly. Now it's strictly a bug fix.
I have removed the logic of logging queue length as a warning. This significantly simplifies the PR and fulfills its original purpose as a bug fix. We can add back some notion of warning later on if there is interest. |
QA tests have started for PR 1931 at commit
|
QA tests have finished for PR 1931 at commit
|
Great - I like this version better! |
More detail on the issue is described in [SPARK-3015](https://issues.apache.org/jira/browse/SPARK-3015), but the TLDR is if we send too many blocking Akka messages that are dependent on each other in quick successions, then we end up causing a few of these messages to time out and ultimately kill the executors. As of #1498, we broadcast each RDD whether or not it is persisted. This means if we create many RDDs (each of which becomes a broadcast) and the driver performs a GC that cleans up all of these broadcast blocks, then we end up sending many `RemoveBroadcast` messages in parallel and trigger the chain of blocking messages at high frequencies. We do not know of the Akka-level root cause yet, so this is intended to be a temporary solution until we identify the real issue. I have done some preliminary testing of enabling blocking and observed that the queue length remains quite low (< 1000) even under very intensive workloads. In the long run, we should do something more sophisticated to allow a limited degree of parallelism through batching clean up tasks or processing them in a sliding window. In the longer run, we should clean up the whole `BlockManager*` message passing interface to avoid unnecessarily awaiting on futures created from Akka asks. tdas pwendell mengxr Author: Andrew Or <[email protected]> Closes #1931 from andrewor14/reference-blocking and squashes the following commits: d0f7195 [Andrew Or] Merge branch 'master' of github.com:apache/spark into reference-blocking ce9daf5 [Andrew Or] Remove logic for logging queue length 111192a [Andrew Or] Add missing space in log message (minor) a183b83 [Andrew Or] Switch order of code blocks (minor) 9fd1fe6 [Andrew Or] Remove outdated log 104b366 [Andrew Or] Use the actual reference queue length 0b7e768 [Andrew Or] Block on cleaning tasks by default + log error on queue full (cherry picked from commit c9da466) Signed-off-by: Patrick Wendell <[email protected]>
*/ | ||
private val blockOnCleanupTasks = sc.conf.getBoolean( | ||
"spark.cleaner.referenceTracking.blocking", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes will not solve the problem here. see.
BlockManagerMasterActor.scala#L165
private def removeShuffle(shuffleId: Int): Future[Seq[Boolean]] = {
// Nothing to do in the BlockManagerMasterActor data structures
import context.dispatcher
val removeMsg = RemoveShuffle(shuffleId)
Future.sequence(
blockManagerInfo.values.map { bm =>
// Here has set the akkaTimeout
bm.slaveActor.ask(removeMsg)(akkaTimeout).mapTo[Boolean]
}.toSeq
)
}
More detail on the issue is described in [SPARK-3015](https://issues.apache.org/jira/browse/SPARK-3015), but the TLDR is if we send too many blocking Akka messages that are dependent on each other in quick successions, then we end up causing a few of these messages to time out and ultimately kill the executors. As of apache#1498, we broadcast each RDD whether or not it is persisted. This means if we create many RDDs (each of which becomes a broadcast) and the driver performs a GC that cleans up all of these broadcast blocks, then we end up sending many `RemoveBroadcast` messages in parallel and trigger the chain of blocking messages at high frequencies. We do not know of the Akka-level root cause yet, so this is intended to be a temporary solution until we identify the real issue. I have done some preliminary testing of enabling blocking and observed that the queue length remains quite low (< 1000) even under very intensive workloads. In the long run, we should do something more sophisticated to allow a limited degree of parallelism through batching clean up tasks or processing them in a sliding window. In the longer run, we should clean up the whole `BlockManager*` message passing interface to avoid unnecessarily awaiting on futures created from Akka asks. tdas pwendell mengxr Author: Andrew Or <[email protected]> Closes apache#1931 from andrewor14/reference-blocking and squashes the following commits: d0f7195 [Andrew Or] Merge branch 'master' of github.com:apache/spark into reference-blocking ce9daf5 [Andrew Or] Remove logic for logging queue length 111192a [Andrew Or] Add missing space in log message (minor) a183b83 [Andrew Or] Switch order of code blocks (minor) 9fd1fe6 [Andrew Or] Remove outdated log 104b366 [Andrew Or] Use the actual reference queue length 0b7e768 [Andrew Or] Block on cleaning tasks by default + log error on queue full
@andrewor14 /**
does that still needed? |
More detail on the issue is described in SPARK-3015, but the TLDR is if we send too many blocking Akka messages that are dependent on each other in quick successions, then we end up causing a few of these messages to time out and ultimately kill the executors. As of #1498, we broadcast each RDD whether or not it is persisted. This means if we create many RDDs (each of which becomes a broadcast) and the driver performs a GC that cleans up all of these broadcast blocks, then we end up sending many
RemoveBroadcast
messages in parallel and trigger the chain of blocking messages at high frequencies.We do not know of the Akka-level root cause yet, so this is intended to be a temporary solution until we identify the real issue. I have done some preliminary testing of enabling blocking and observed that the queue length remains quite low (< 1000) even under very intensive workloads.
In the long run, we should do something more sophisticated to allow a limited degree of parallelism through batching clean up tasks or processing them in a sliding window. In the longer run, we should clean up the whole
BlockManager*
message passing interface to avoid unnecessarily awaiting on futures created from Akka asks.@tdas @pwendell @mengxr