Avoid deadlock on server shutdown #279
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR solves the issue of deadlock during concurrent shutdown of
mongo_server
andconfig_manager
nodes. I came across this issue while running pre-release tests.This solution is still up for debate (see Question section at the bottom).
Steps to Reproduce
mongodb_store
package on ROS Noetic or Melodic.config_manager.test
a couple of times:rostest mongodb_store config_manager.test --text
In some cases, the
mongo_server
node hangs during shutdown and requiresSIGKILL
to exit (after ~20 seconds). This behavior causes pre-release tests to fail due to timeout.Cause
During shutdown, the
mongo_server
issues ashutdown
command to itsmongod
subprocess. At the same time, theconfig_manager
attempts to close itsMongoClient
, which sends some cleanup commands to themongod
server. This somehow causes a deadlock and prevents themongo_server
node to exit cleanly. In fact, any concurrent command to themongod
process during shutdown seems to cause the deadlock.Current Solution
This was solved by controlling the node shutdown sequence through the
ready
flag in themongodb_server.py
(see commit).Question
Several other nodes create
MongoClient
instances and do not close them (mongodb_store_node
,replicator_node
, etc.).So here we have two options:
MongoClient
closing/cleanup into all the other nodes instantiating it, throughrospy.on_shutdown
(like we now have in theconfig_manager
node).MongoClient
closing/cleanup from theconfig_manager
node, as is the case in other nodes. Resources in the node are freed anyway when it is shut down, and the daemon should periodically clean up expired sessions.What do you think?