-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
NVIDIA Open GPU Kernel Modules Version
580.82.07 Release Build
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
- I confirm that this does not happen with the proprietary driver package.
Operating System and Version
Ubuntu 24.04.2 LTS
Kernel Release
6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
- I am running on a stable kernel release.
Hardware: GPU
RTX 4060 Ti, RTX 3060, RTX 2060, V100, RTX 6000 Ada
Describe the bug
A segmentation fault occurs in vkCreateComputePipelines
in the NVIDIA Vulkan implementation when running a Spir-V kernel with a particular access pattern. Tested in different Nvidia driver versions, last one with latest release 580.82.07 from open-gpu-kernel-modules (current at the time of testing).
To Reproduce
Run a simple Vulkan compute application to reproduce the error with a Spir-V kernel/shader code containing the following:
- An atomic exchange on a workgroup memory location (Workgroup memory scope).
- The access index of the atomic exchange is conditionally chosen from one of two storage buffers (uses
VariablePointers
OpCapability). - The result of the atomic exchange is used in another instruction (memory dependency).
Above three combination result into a segfault when creating the Vulkan pipeline in vkCreateComputePipelines
Host API. The crash occurs only with Nvidia GPU drivers (tested from 550.xx to 580.xx). Both proprietary code and using open-gpu-kernel modules.
vkCreateComputePipelines(device, pipelineCache, createInfoCount, pCreateInfos, pAllocator, pPipelines) returns VkResultSegmentation fault (core dumped)
Note: If the atomic exchange is replaced by a combination of an atomic load and an atomic store, then host code API behaves normally and program can be executed.
Bug Incidence
Always
nvidia-bug-report.log.gz
More Info
A minimal reproducible environment is provided to reproduce the issue. Check nvidia-issue-reproducer.zip