-
Notifications
You must be signed in to change notification settings - Fork 233
Initial import of cuda.core.system
#1393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| from cuda.bindings cimport _nvml as nvml | ||
|
|
||
|
|
||
| def get_driver_version() -> tuple[int, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing. This is an existing API that returns the CUDA version. It really should be called get_cuda_version to avoid confusion, but that would be a breaking change. There is a new API to return the driver version called get_gpu_driver_version below, but that naming isn't great.
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, cuda.core supports any cuda-bindings/cuda-python 12.x and 13.x, many of which do not have the NVML bindings available. So, we need a version guard here before importing anything that would expect the bindings to exist, and raise an exception in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good reminder. I guess that precludes cimport'ing anything from cuda.bindings._nvml, since _nvml is a moving target. Will just take that out for now...
|
/ok to test |
5 similar comments
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
|
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces the new cuda.core.system module that provides system-level GPU information via NVML (NVIDIA Management Library). It replaces the previous singleton System class with a more comprehensive module that offers both backward-compatible functions and new NVML-powered device management capabilities.
Key changes:
- Replaces singleton
Systemclass with module-level functions (get_num_devices(),get_driver_version(), etc.) - Adds comprehensive
Deviceclass with NVML-backed properties for device information (architecture, memory, PCI info, etc.) - Implements automatic NVML initialization on module import with version-gated availability
- Provides utility functions for formatting bytes and unpacking bitmasks
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| cuda_core/tests/test_memory.py | Updates API calls from ccx_system.num_devices property to ccx_system.get_num_devices() function |
| cuda_core/tests/system/test_system_utils.py | Adds comprehensive tests for utility functions (format_bytes, unpack_bitmask) |
| cuda_core/tests/system/test_system_system.py | Adds tests for system-level functions (driver versions, device count, process name) |
| cuda_core/tests/system/test_system_device.py | Adds extensive tests for Device class properties (architecture, memory, PCI info, etc.) |
| cuda_core/tests/system/test_nvml_context.py | Adds tests for NVML initialization state management across processes |
| cuda_core/tests/system/conftest.py | Defines NVML version requirements and skip marker for unsupported versions |
| cuda_core/tests/system/init.py | Empty init file for test module |
| cuda_core/cuda/core/experimental/system/utils.pyx | Implements utility functions for byte formatting and bitmask unpacking |
| cuda_core/cuda/core/experimental/system/system.pyx | Implements system-level query functions with NVML and fallback support |
| cuda_core/cuda/core/experimental/system/device.pyx | Implements Device class with comprehensive GPU properties via NVML |
| cuda_core/cuda/core/experimental/system/_nvml_context.pyx | Implements thread-safe, per-process NVML initialization logic |
| cuda_core/cuda/core/experimental/system/init.py | Module entry point with version-gated NVML imports and initialization |
| cuda_core/cuda/core/experimental/_system.py | Removes deprecated singleton System class |
| cuda_core/cuda/core/experimental/init.py | Updates imports to use new system module instead of System singleton |
| cuda_bindings/cuda/bindings/_nvml.pyx | Adds enums and fixes BAR1Memory property naming (breaking change) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f8f8e32 to
7719fc5
Compare
|
Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
7719fc5 to
0cff9c7
Compare
|
/ok to test |
|
(Marked as draft as a reminder to not merge until after the 0.5.0 release...)
Prerequisites to get this PR to pass:
This is the first landing of
cuda.core.system, with all of the features in thenvutilprototype (which sort of has an arbitrary collection of the most core things in NVML, but is a reasonable starting point for a first PR).This requires a generator change (not yet merged) to include
AUTO_LOWPP_*classes in the.pxdfile so they can becimport'ed. I know we don't usually do that, but it seems important to be able to use those high-level bindings and not repeat ourselves. ABI stability there should be ok -- I don't anticipate needing to change anything on the.pxdside of those classes.Following the
nvutildesign, this initializes NVML immediately upon import ofcuda.core.system. That feels convenient and may be the right choice, but it will be hard to walk that back. Questions the NVML docs don't answer for me: are there any use cases where you would want to init/shutdown NVML repeatedly. Thecuda.bindings.nvmltests do this, so I know it works. Is there any harm ininit'ingand never shutting down -- we could add anatexithandler, but I don't know if it's required.