Skip to content

Segmentation fault in NVIDIA Vulkan implementation's vkCreateComputePipelines when running a valid Spir-V shader with a particular access pattern #929

@acasadevall

Description

@acasadevall

NVIDIA Open GPU Kernel Modules Version

580.82.07 Release Build

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

Ubuntu 24.04.2 LTS

Kernel Release

6.14.0-29-generic #29~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC x86_64 GNU/Linux

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

RTX 4060 Ti, RTX 3060, RTX 2060, V100, RTX 6000 Ada

Describe the bug

A segmentation fault occurs in vkCreateComputePipelines in the NVIDIA Vulkan implementation when running a Spir-V kernel with a particular access pattern. Tested in different Nvidia driver versions, last one with latest release 580.82.07 from open-gpu-kernel-modules (current at the time of testing).

To Reproduce

Run a simple Vulkan compute application to reproduce the error with a Spir-V kernel/shader code containing the following:

  • An atomic exchange on a workgroup memory location (Workgroup memory scope).
  • The access index of the atomic exchange is conditionally chosen from one of two storage buffers (uses VariablePointers OpCapability).
  • The result of the atomic exchange is used in another instruction (memory dependency).

Above three combination result into a segfault when creating the Vulkan pipeline in vkCreateComputePipelines Host API. The crash occurs only with Nvidia GPU drivers (tested from 550.xx to 580.xx). Both proprietary code and using open-gpu-kernel modules.

vkCreateComputePipelines(device, pipelineCache, createInfoCount, pCreateInfos, pAllocator, pPipelines) returns VkResultSegmentation fault (core dumped)

Note: If the atomic exchange is replaced by a combination of an atomic load and an atomic store, then host code API behaves normally and program can be executed.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

A minimal reproducible environment is provided to reproduce the issue. Check nvidia-issue-reproducer.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions