Skip to content

rping: Fix spurious failure return codes #1622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jakemoroni
Copy link
Contributor

The CM event thread processes events in a loop with no explicit termination. When the last CM event is received, the main thread proceeds to clean up and destroy the CM event channel. If this occurs after the CM event thread has processed the last event, but before it reaches rdma_get_cm_event again, then the subsequent call to rdma_get_cm_event will fail and cause the process to exit with a failure code even though the test was actually successful.

This causes flakiness in test scripts that use rping for basic functional testing.

Fix this by using an eventfd+poll to explicitly signal the CM event thread for termination.

Tested by running 4096 parallel rping processes.

Fixes: 6f640ff ("r7019: Introduce event channels.")

The CM event thread processes events in a loop with no explicit
termination. When the last CM event is received, the main thread
proceeds to clean up and destroy the CM event channel. If this occurs
after the CM event thread has processed the last event, but before it
reaches rdma_get_cm_event again, then the subsequent call to
rdma_get_cm_event will fail and cause the process to exit with a failure
code even though the test was actually successful.

This causes flakiness in test scripts that use rping for basic
functional testing.

Fix this by using an eventfd+poll to explicitly signal the CM event
thread for termination.

Tested by running 4096 parallel rping processes.

Fixes: 6f640ff ("r7019: Introduce event channels.")
Signed-off-by: Jacob Moroni <[email protected]>
@jgunthorpe
Copy link
Member

/azp run

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants