-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-4777][CORE] Some block memory after unrollSafely not count into used memory(memoryStore.entrys or unrollMemory) #3629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
add to whitelist |
Test build #24321 has started for PR 3629 at commit
|
Test build #24321 has finished for PR 3629 at commit
|
Test FAILed. |
Test build #24345 has started for PR 3629 at commit
|
Test build #24345 has finished for PR 3629 at commit
|
Test FAILed. |
Test build #24352 has started for PR 3629 at commit
|
Test build #24352 has finished for PR 3629 at commit
|
Test PASSed. |
It is already resolved in [SPARK-3000][CORE] drop old blocks to disk in parallel when free memory is not enough for caching new blocks #2134 |
Hi @suyanNone , you don't need to close this PR, since #2134 is not merged yet. And the bug still exists in current code. You can reopen this PR. |
@liyezhang556520 I not familiar with the process about pull request = =, ok, I will reopen it... |
Reopen |
Test build #24491 has started for PR 3629 at commit
|
Test build #24491 has finished for PR 3629 at commit
|
Test FAILed. |
@suyanNone Having taken a detailed look over the patch, I believe the correct thing to do here is to just remove the memory release in |
@andrewor14 others unrollSafely, will get all threads reservedUnrollMemoryForThisThread , it will count valueA.(due to valueA will be released after the task is completed) In MemoryStore, it will also count ValueA Did I miss sth in the code? |
Sorry, I don't understand what you're saying. Why will there be double counting? Can you list the steps that causes this? (What is Value A?) |
Hi @andrewor14 , I think @suyanNone 's explain is correct. What @suyanNone wants to say that if we just remove the memory release in unrollSafely, then the memory that marked as unrolled will be never released, even after the the corresponding block is actually put into the memory. You can check in ensureFreeMemory method, the |
@andrewor14 sorry for my poor english, and @liyezhang556520 has explained well. now, I just talk the situation if just remove the memory release in In MemoryStore, the memory has 2 use, one for actual free memory in the memory store = maxMemory - entrys.size - currenUnrollMemory(=unrollMemoryMap.size). Thread A: Thread A's preserved unroll memory, let suppose it equals = 20MB
if thread A now between step 2 and step 3. 20MB be counted both in Thread B begin to put a block in memory, he will have 3 places to be affected by the Thread A unreleased unroll Memory(20MB), where use freeMemory or currentUnrollMemory. we can says trueCurrentUnrollMemory should = currentUnrollMemory - 20MB first place:
second place
third place
|
I see, thanks for your detailed explanations @suyanNone @liyezhang556520. If the problem is that we double count after we put the block in memory, shouldn't we release the pending memory after we actually put the block (i.e. after this line), not before? |
Also, the other issue with this patch is that |
CacheManager.putInBlockManager, do sth like:
In my code:
right? |
34cfbe8
to
3541759
Compare
// we release the memory claimed by this thread later on when the task finishes. | ||
if (keepUnrolling) { | ||
val amountToRelease = currentUnrollMemoryForThisThread - previousMemoryReserved | ||
releaseUnrollMemoryForThisThread(amountToRelease) | ||
releaseUnrollMemoryForThisThread(amountToRelease, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing a random boolean flag here, I would just move the acquire pending memory code into unrollSafely
:
if (keepUnrolling) {
val amountToRelease = currentUnrollMemoryForThisThread - previousMemoryReserved
// Mark the unroll memory as pending so that we release
// it later as soon as we finish caching the block
releaseUnrollMemoryForThisThread(amountToRelease)
reservePendingUnrollMemoryForThisThread(amountToRelease)
}
then somewhere down there you'll have to define reservePendingUnrollMemoryForThisThread
@suyanNone I have left mostly documentation and code style comments. The only slightly less trivial thing is to revert the |
@andrewor14 , Already refine according comments, please review~ |
Test build #28042 has started for PR 3629 at commit
|
Test build #28042 has finished for PR 3629 at commit
|
Test PASSed. |
@@ -381,6 +395,8 @@ private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long) | |||
} | |||
|
|||
// Take into account the amount of memory currently occupied by unrolling blocks | |||
// and minus the pending unroll memory for that block on current thread. | |||
val threadId = Thread.currentThread().getId |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this variable used?
28408f2
to
809cc41
Compare
@andrewor14
If still need to put it at the end of may change the current code |
Test build #28065 has started for PR 3629 at commit
|
Test build #28065 has finished for PR 3629 at commit
|
Test PASSed. |
ping @andrewor14 |
Ok LGTM I'm merging into master finally. Thanks @suyanNone and @liyezhang556520 for uncovering this tricky issue. |
Some memory not count into memory used by memoryStore or unrollMemory.
Thread A after unrollsafely memory, it will release 40MB unrollMemory(40MB will used by other threads). then ThreadA wait get accountingLock to tryToPut blockA(30MB). before Thread A get accountingLock, blockA memory size is not counting into unrollMemory or memoryStore.currentMemory.
IIUC, freeMemory should minus that block memory
So, put this release memory into pending, and release it in tryToPut before ensureSpace