[rocprofiler-sdk] - unable to collect PMC data#4590
Open
Conversation
a63a2c9 to
13d83ba
Compare
3196fa9 to
e2155c1
Compare
Fix unable to collect PMC data when running rocprofv3 --pmc with roccap play. The issue caused SSH disconnect and node destabilization due to duplicate /dev/kfd opens in rocplaycap child processes, HSA runtime teardown race conditions, and signal handler deadlock. Fix profiler initialization, teardown race conditions, and signal handler issues to ensure stable PMC data collection with roccap play. Note: companion fixes for rocplaycap will be submitted separately.
e2155c1 to
03875f1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When running
rocprofv3 -A absolute --pmc SQ_WAVES -- roccap play <trace>, PMC counter data could not be collected and the command might caused SSH disconnect and node destabilization.Motivation
Enable
rocprofv3 --pmcto work correctly withroccap play(AQL trace replay)without causing node destabilization or data loss.
Technical Details
See ticket ROCM-1214
JIRA ID
Resolves ROCM-1214
Test Plan
Run the following command on server with rocplaycap AQL trace replay:
Verify:
dmesgis cleanTest Result
ROCm 7.1 — VALIDATED
ROCm 7.2 — PENDING