[BUG FIX] Fix GPU detection in test infrastructure for WSL2#2653
Merged
duburcqa merged 4 commits intoApr 5, 2026
Conversation
Add nvidia-smi fallback for _get_gpu_indices() and _torch_get_gpu_idx() when /proc/driver/nvidia/gpus/ is unavailable (e.g. WSL2). Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
duburcqa
requested changes
Apr 4, 2026
duburcqa
left a comment
Collaborator
There was a problem hiding this comment.
Try catch and fallback to the already existing default is sufficient. Do not use nvidia-smi.
…lback Replace nvidia-smi fallback with simple try/except around existing /proc/driver/nvidia/gpus/ access, falling back to existing defaults. Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
Lidang-Jiang
commented
Apr 4, 2026
Lidang-Jiang
left a comment
Contributor
Author
There was a problem hiding this comment.
You're right, much simpler. Removed all nvidia-smi fallback and switched to try/except with existing defaults in 9a244a9.
duburcqa
reviewed
Apr 4, 2026
… Linux Print a user-facing warning explaining that multi-GPU support will be disabled when the NVIDIA proc interface is not found, as requested by reviewer. Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
duburcqa
approved these changes
Apr 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_torch_get_gpu_idx()crashing withFileNotFoundErroron WSL2 due to missing/proc/driver/nvidia/gpus//proc/driver/nvidia/gpus/access in try/except, falling back to existing defaultsnvidia_gpu_indicesin_get_gpu_indices()Root cause
WSL2 provides full CUDA support (
nvidia-smiworks,torch.cuda.is_available()returnsTrue), but the native Linux NVIDIA driver sysfs interface at/proc/driver/nvidia/gpus/does not exist. Two functions intests/conftest.pyrely on this path:_get_gpu_indices()— guarded byos.path.exists(), silently falls back to(0,)(correct for single GPU)_torch_get_gpu_idx()— callsos.listdir()without existence check, causing a crashFix
Wrap both
/proc/driver/nvidia/gpus/accesses in try/exceptFileNotFoundError, falling back to the already existing defaults ((0,)and-1). Native Linux behavior is unchanged.Verification
Tested on WSL2 (Windows 11, NVIDIA GeForce RTX 3050 Ti Laptop GPU, Driver 581.83, CUDA 13.0).
Before (original code on WSL2)
After (fixed code on WSL2)
pytest --backend gpu on WSL2 (after fix)
Test plan
pytest --backend gpupasses on WSL2 (previously crashed)pytest --backend cpupasses (regression test)/proc/driver/nvidia/gpus/path still used when available (no behavior change)