-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52776][CORE] Do not split the comm field in ProcfsMetricsGetter #51457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for open sourcing the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
### What changes were proposed in this pull request? We are fixing an issue in `ProcfsMetricsGetter` when parsing the `/proc/<pid>/stat` file. The current implementation will split the comm field by spaces if it contains them, thereby causing subsequent numbers to be shifted. The comm field, and only the comm field, is in parentheses so we can resolve this issue by ignoring everything between the first open parenthesis and last closing parenthesis when splitting the stat file. ### Why are the changes needed? These changes are needed to prevent a comm field with spaces from causing incorrect calculations for vmem/rssmem metrics. Please see [JIRA](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52776) for details. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a unit test to test for irregular characters in the comm field ### Was this patch authored or co-authored using generative AI tooling? No Closes #51457 from max2718281/procfs. Authored-by: Maxime Xu <[email protected]> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit cf097a5) Signed-off-by: Mridul Muralidharan <mridulatgmail.com>
Merged to master and 4.0. Can you create a backport for 3.5 please ? Looks like there are some conflicts |
We are fixing an issue in `ProcfsMetricsGetter` when parsing the `/proc/<pid>/stat` file. The current implementation will split the comm field by spaces if it contains them, thereby causing subsequent numbers to be shifted. The comm field, and only the comm field, is in parentheses so we can resolve this issue by ignoring everything between the first open parenthesis and last closing parenthesis when splitting the stat file. These changes are needed to prevent a comm field with spaces from causing incorrect calculations for vmem/rssmem metrics. Please see [JIRA](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52776) for details. No Added a unit test to test for irregular characters in the comm field No Closes apache#51457 from max2718281/procfs. Authored-by: Maxime Xu <[email protected]> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com> (cherry picked from commit cf097a5)
Created #51481 |
…Getter ### What changes were proposed in this pull request? This is a backport of #51457. We are fixing an issue in `ProcfsMetricsGetter` when parsing the `/proc/<pid>/stat` file. The current implementation will split the comm field by spaces if it contains them, thereby causing subsequent numbers to be shifted. The comm field, and only the comm field, is in parentheses so we can resolve this issue by ignoring everything between the first open parenthesis and last closing parenthesis when splitting the stat file. ### Why are the changes needed? These changes are needed to prevent a comm field with spaces from causing incorrect calculations for vmem/rssmem metrics. Please see [JIRA](https://issues.apache.org/jira/projects/SPARK/issues/SPARK-52776) for details. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a unit test to test for irregular characters in the comm field ### Was this patch authored or co-authored using generative AI tooling? No ### Original PR Info Closes #51457 from max2718281/procfs. Authored-by: Maxime Xu <maxxulinkedin.com> (cherry picked from commit cf097a5) Closes #51481 from max2718281/procfs-3.5. Authored-by: Maxime Xu <[email protected]> Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
What changes were proposed in this pull request?
We are fixing an issue in
ProcfsMetricsGetter
when parsing the/proc/<pid>/stat
file. The current implementation will split the comm field by spaces if it contains them, thereby causing subsequent numbers to be shifted. The comm field, and only the comm field, is in parentheses so we can resolve this issue by ignoring everything between the first open parenthesis and last closing parenthesis when splitting the stat file.Why are the changes needed?
These changes are needed to prevent a comm field with spaces from causing incorrect calculations for vmem/rssmem metrics. Please see JIRA for details.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added a unit test to test for irregular characters in the comm field
Was this patch authored or co-authored using generative AI tooling?
No