Skip to content

Avoid deadlock on server shutdown #279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 19, 2022

Conversation

ggorjup
Copy link
Collaborator

@ggorjup ggorjup commented Sep 14, 2022

Summary

This PR solves the issue of deadlock during concurrent shutdown of mongo_server and config_manager nodes. I came across this issue while running pre-release tests.

This solution is still up for debate (see Question section at the bottom).

Steps to Reproduce

  1. Clone and build the mongodb_store package on ROS Noetic or Melodic.
  2. Run the config_manager.test a couple of times:
    rostest mongodb_store config_manager.test --text

In some cases, the mongo_server node hangs during shutdown and requires SIGKILL to exit (after ~20 seconds). This behavior causes pre-release tests to fail due to timeout.

Cause

During shutdown, the mongo_server issues a shutdown command to its mongod subprocess. At the same time, the config_manager attempts to close its MongoClient, which sends some cleanup commands to the mongod server. This somehow causes a deadlock and prevents the mongo_server node to exit cleanly. In fact, any concurrent command to the mongod process during shutdown seems to cause the deadlock.

Current Solution

This was solved by controlling the node shutdown sequence through the ready flag in the mongodb_server.py (see commit).

Question

Several other nodes create MongoClient instances and do not close them (mongodb_store_node, replicator_node, etc.).
So here we have two options:

  1. Include the MongoClient closing/cleanup into all the other nodes instantiating it, through rospy.on_shutdown (like we now have in the config_manager node).
  2. Remove the MongoClient closing/cleanup from the config_manager node, as is the case in other nodes. Resources in the node are freed anyway when it is shut down, and the daemon should periodically clean up expired sessions.

What do you think?

@ggorjup ggorjup mentioned this pull request Sep 15, 2022
@hawesie
Copy link
Member

hawesie commented Sep 15, 2022

  1. seems like the simpler option to me.

@ggorjup
Copy link
Collaborator Author

ggorjup commented Sep 16, 2022

Sure, that makes sense - I updated the PR with respect to option 2.
If we decide for option 1 at some point in the future, the initial commit will still be available for reference.

@ggorjup ggorjup merged commit 8875567 into strands-project:noetic-devel Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants