Skip to content

[BUG FIX] Fix GPU detection in test infrastructure for WSL2#2653

Merged
duburcqa merged 4 commits into
Genesis-Embodied-AI:mainfrom
Lidang-Jiang:fix/wsl2-gpu-detection
Apr 5, 2026
Merged

[BUG FIX] Fix GPU detection in test infrastructure for WSL2#2653
duburcqa merged 4 commits into
Genesis-Embodied-AI:mainfrom
Lidang-Jiang:fix/wsl2-gpu-detection

Conversation

@Lidang-Jiang

@Lidang-Jiang Lidang-Jiang commented Apr 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fix _torch_get_gpu_idx() crashing with FileNotFoundError on WSL2 due to missing /proc/driver/nvidia/gpus/
  • Wrap /proc/driver/nvidia/gpus/ access in try/except, falling back to existing defaults
  • Remove unused variable nvidia_gpu_indices in _get_gpu_indices()

Root cause

WSL2 provides full CUDA support (nvidia-smi works, torch.cuda.is_available() returns True), but the native Linux NVIDIA driver sysfs interface at /proc/driver/nvidia/gpus/ does not exist. Two functions in tests/conftest.py rely on this path:

  1. _get_gpu_indices() — guarded by os.path.exists(), silently falls back to (0,) (correct for single GPU)
  2. _torch_get_gpu_idx() — calls os.listdir() without existence check, causing a crash

Fix

Wrap both /proc/driver/nvidia/gpus/ accesses in try/except FileNotFoundError, falling back to the already existing defaults ((0,) and -1). Native Linux behavior is unchanged.

Verification

Tested on WSL2 (Windows 11, NVIDIA GeForce RTX 3050 Ti Laptop GPU, Driver 581.83, CUDA 13.0).

Before (original code on WSL2)
============================================================
BEFORE: Original _get_gpu_indices() and _torch_get_gpu_idx()
============================================================
Platform: linux
/proc/driver/nvidia/gpus/ exists: False

torch.cuda.is_available(): True
torch.cuda.get_device_properties(0).name: NVIDIA GeForce RTX 3050 Ti Laptop GPU

_get_gpu_indices() = (0,)  (fell through to default (0,), did NOT detect via /proc)

_torch_get_gpu_idx(0) = CRASH! FileNotFoundError: [Errno 2] No such file or directory: '/proc/driver/nvidia/gpus/'
After (fixed code on WSL2)
============================================================
AFTER: Fixed _get_gpu_indices() and _torch_get_gpu_idx()
============================================================
Platform: linux
/proc/driver/nvidia/gpus/ exists: False

torch.cuda.is_available(): True
torch.cuda.get_device_properties(0).name: NVIDIA GeForce RTX 3050 Ti Laptop GPU

_get_gpu_indices() = (0,)  (try/except caught FileNotFoundError, fell through to default)

_torch_get_gpu_idx(0) = -1  (try/except caught FileNotFoundError, fell through to default)
pytest --backend gpu on WSL2 (after fix)
============================= test session starts ==============================
platform linux -- Python 3.11.14, pytest-9.0.2, pluggy-1.6.0 -- python3
rootdir: worktree-genesis-wsl2
configfile: pyproject.toml
plugins: xdist-3.8.0, anyio-4.13.0, syrupy-5.1.0, forked-1.6.0
collecting ... collected 1 item

tests/test_sensor_camera.py::test_destroy_unbuilt_scene_with_camera
[Genesis] [INFO] Running on [NVIDIA GeForce RTX 3050 Ti Laptop GPU] with backend gs.cuda. Device memory: 4.00 GB.
[Genesis] [INFO] Genesis initialized. version: 0.4.3, precision: 32
PASSED

============================== 1 passed in 7.87s ===============================

Test plan

  • pytest --backend gpu passes on WSL2 (previously crashed)
  • pytest --backend cpu passes (regression test)
  • Native Linux: /proc/driver/nvidia/gpus/ path still used when available (no behavior change)

Add nvidia-smi fallback for _get_gpu_indices() and _torch_get_gpu_idx()
when /proc/driver/nvidia/gpus/ is unavailable (e.g. WSL2).

Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@duburcqa duburcqa left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try catch and fallback to the already existing default is sufficient. Do not use nvidia-smi.

…lback

Replace nvidia-smi fallback with simple try/except around existing
/proc/driver/nvidia/gpus/ access, falling back to existing defaults.

Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>

@Lidang-Jiang Lidang-Jiang left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, much simpler. Removed all nvidia-smi fallback and switched to try/except with existing defaults in 9a244a9.

Comment thread tests/conftest.py
… Linux

Print a user-facing warning explaining that multi-GPU support will be
disabled when the NVIDIA proc interface is not found, as requested by
reviewer.

Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
Signed-off-by: Lidang-Jiang <lidangjiang@gmail.com>
@duburcqa duburcqa merged commit 7bf5f6a into Genesis-Embodied-AI:main Apr 5, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants