-
Notifications
You must be signed in to change notification settings - Fork 9.1k
YARN-11838: YARN ConcurrentModificationException When Refreshing Node Attributes #7828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
@shameersss1 Thanks for your contribution. I haven't sat down and looked at the larger code yet, but a couple of questions:
|
💔 -1 overall
This message was automatically generated. |
Thanks @sjlee for the review. Please find the answers inline
The method ReadLock ensures that
Since it is raise condition - Replication through unit test is difficult without inducing artificial sleeps in the core code flow. Ye, the unit test passes even without this change as well. The purpose of this unit test is more of protective measure.
As per my analysis |
The unit test failure seems flaky and not reurun it passed locally,
|
@slfan1989 @TaoYang526 @zeekling could you please review ? |
I see that you're copying the key set while holding the read lock to avoid the issue. I do think it is one correct way to address the issue. That's a valid fix. My only point would be that guarding the logging call might be a cheaper and still correct fix, as it avoids copying. I don't think a keySet() call would cause iteration so that is still safe without the read lock. Let me know what you think. |
The lock is required for creating copy (since it will iterate). The only advantage i see with copying is that the log statement will be consistent with what we process. If we don't copy and some other threads might modify host.attribute after we execute the LOG statement - Will this lead to inconsistent logging and processing. On side note - I don't anticipate a host using a large number of attributes in which case the copy might become expensive @sjlee Any thoughts on this ? |
// other threads might access host.attributes | ||
readLock.lock(); | ||
try { | ||
newNodeToAttributesMap.put(hostName, new HashSet<>(host.attributes.keySet())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Judging from the stack, the problem occurred when the log was printed, and the wrong line was modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is that.
- The LOG statement which prints
newNodeToAttributesMap
tries to iteratehost.attribute
host.attribute
gets modified by some other thread - leading to concurrent modification exception.
There are two ways to solve this
- As you said to readLock before LOG statement so that
host.attribute
does not get modified during LOG statement - Create a defensive copy of
host.attribute
(under read lock because the modification can happen at that time as well).
The rationale behind using option 2 to avoid logging inconsistency- Assume that we readLock before LOG statement. Once the LOG statement is executed, some other thread modifies the host.attribute
this will lead to we logging something and processing something else.
Creating a defensive copy make sure that we don't change value. i.e what is LOGed gets processed as well.
@shameersss1 Thanks for fixing this issue, LGTM. |
@slfan1989 - Gentle reminder for review |
@shameersss1 Should you be adding unit tests to cover the following two edge cases? Or those are already covered?
|
The locking is inconsistent with the other methods in the class which uses try{}finally{} block to release the lock, hence i don't see any concerns here. |
Description of PR
Refer YARN-11838 for more details.
The issue is that.
There are two ways to solve this
The rationale behind using option 2 to avoid logging inconsistency- Assume that we readLock before LOG statement. Once the LOG statement is executed, some other thread modifies the host.attribute this will lead to we logging something and processing something else.
Creating a defensive copy make sure that we don't change value. i.e what is LOGed gets processed as well.
How was this patch tested?
Added unit test
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?