Skip to content

drmOpen("nvidia", NULL) returns -1 or garbage value (reopened) #263

@YusufKhan-gamedev

Description

@YusufKhan-gamedev

NVIDIA Open GPU Kernel Modules Version

ce3d74f

Does this happen with the proprietary driver (of the same version) as well?

I cannot test this

Operating System and Version

Description: Fedora release 36 (Thirty Six)

Kernel Release

Linux fedora 5.17.9-300.fc36.x86_64 #1 SMP PREEMPT Wed May 18 15:08:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Hardware: GPU

Its a RTX 2060 from GIGABYTE, I am not going to install the proprietary tool that is suggested

Describe the bug

GPU file descriptors return -1 or 3

To Reproduce

Try to open /dev/dri/cardx or use the drmOpen() function.

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

[ 4.751792] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751793] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751793] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751793] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751794] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751794] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751795] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751795] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751796] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751796] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751797] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751797] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751798] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 5: 0x80000000 MHz
[ 4.751798] ACPI: [Firmware Bug]: No valid BIOS _PSS frequency found for processor 5
[ 4.751799] ACPI: [Firmware Bug]: BIOS needs update for CPU frequency support
[ 4.751827] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751827] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751828] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751828] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751829] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751829] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751830] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751830] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751830] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751831] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751831] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751832] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751832] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751833] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751833] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751834] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 6: 0x80000000 MHz
[ 4.751834] ACPI: [Firmware Bug]: No valid BIOS _PSS frequency found for processor 6
[ 4.751835] ACPI: [Firmware Bug]: BIOS needs update for CPU frequency support
[ 4.751863] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751863] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751864] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751864] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751865] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751865] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751866] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751866] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751866] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751867] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751867] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751868] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751868] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751869] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751869] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751870] ACPI: [Firmware Bug]: Invalid BIOS _PSS frequency found for processor 7: 0x80000000 MHz
[ 4.751870] ACPI: [Firmware Bug]: No valid BIOS _PSS frequency found for processor 7
[ 4.751871] ACPI: [Firmware Bug]: BIOS needs update for CPU frequency support
[ 5.385291] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
[ 5.385294] ucsi_ccg 0-0008: i2c_transfer failed -110
[ 5.385295] ucsi_ccg 0-0008: ucsi_ccg_init failed - -110
[ 5.385298] ucsi_ccg: probe of 0-0008 failed with error -110
[ 5.398888] kauditd_printk_skb: 136 callbacks suppressed
[ 5.398889] audit: type=1130 audit(1653589722.262:145): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-udev-settle comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.444839] audit: type=1130 audit(1653589722.308:146): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2duuid-cd5cf0c9\x2db7ce\x2d41da\x2dbcf1\x2dae0ccb7c629a comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.459774] audit: type=1130 audit(1653589722.323:147): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2duuid-5B81\x2d8B7D comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.463346] EXT4-fs (sda2): mounted filesystem with ordered data mode. Quota mode: none.
[ 5.487808] audit: type=1130 audit(1653589722.351:148): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.511918] audit: type=1130 audit(1653589722.375:149): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-read-write comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.519858] audit: type=1130 audit(1653589722.383:150): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.567885] audit: type=1130 audit(1653589722.431:151): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 5.570480] audit: type=1334 audit(1653589722.433:152): prog-id=60 op=LOAD
[ 5.570670] audit: type=1334 audit(1653589722.434:153): prog-id=61 op=LOAD
[ 5.570726] audit: type=1334 audit(1653589722.434:154): prog-id=62 op=LOAD
[ 5.602867] RPC: Registered named UNIX socket transport module.
[ 5.602869] RPC: Registered udp transport module.
[ 5.602870] RPC: Registered tcp transport module.
[ 5.602870] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 5.771489] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[ 5.771491] Bluetooth: BNEP filters: protocol multicast
[ 5.771494] Bluetooth: BNEP socket layer initialized
[ 5.905379] NET: Registered PF_QIPCRTR protocol family
[ 6.526274] iwlwifi 0000:00:14.3: Conflict between TLV & NVM regarding enabling LAR (TLV = enabled NVM =disabled)
[ 6.712661] iwlwifi 0000:00:14.3: Conflict between TLV & NVM regarding enabling LAR (TLV = enabled NVM =disabled)
[ 9.192208] e1000e 0000:00:1f.6 eno2: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 9.192258] IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
[ 9.768045] thermal cooling_device11: Setting cooling device state is deprecated
[ 11.529700] rfkill: input handler disabled
[ 11.930058] Bluetooth: RFCOMM TTY layer initialized
[ 11.930063] Bluetooth: RFCOMM socket layer initialized
[ 11.930096] Bluetooth: RFCOMM ver 1.11
[ 15.899439] logitech-hidpp-device 0003:046D:1025.0007: HID++ 1.0 device connected.
[ 28.879886] rfkill: input handler enabled
[ 249.288102] nvidia-modeset: Unloading
[ 249.304665] NVOC: __nvoc_objDelete: Child class OBJIOVASPACE not freed from parent class OBJVMM.Allocator 00000000d4fbfba6 released with memory allocations
[ 249.304686] [NvPort] *************************************************
[ 249.304686] NvPort memory tracking information for allocator 00000000d4fbfba6:
[ 249.304687] ACTIVE: 1 allocations, 644 bytes allocated (616 useful, 28 meta)
[ 249.304688] TOTAL: 150 allocations, 512133 bytes allocated (507933 useful, 4200 meta)
[ 249.304689] PEAK: 148 allocations, 511980 bytes allocated (507836 useful, 4144 meta)
[ 249.304689] [NvPort] *************************************************
[ 249.304702] nvidia-nvlink: Unregistered Nvlink Core, major device number 234
[ 249.326369] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[ 249.326373] NVRM getCpuCounts: RmInitCpuCounts: physical 0x8 logical 0x8
[ 249.326722] NVRM rmapiControlCacheInit: using cache mode 1
[ 249.327021] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=none:owns=io+mem
[ 249.327024] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 249.327038] NVRM halmgrGetHalForGpu_IMPL: Matching PMC_BOOT_42 = 0x164a1000 to HAL_IMPL_TU104
[ 249.327070] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 249.375107] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 515.43.04 Release Build (yusufkhan@) Tue May 24 06:08:38 PM EDT 2022
[ 1317.099154] intel_powerclamp: Start idle injection to reduce power
[ 1318.444129] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.445139] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.446128] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.447127] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1318.448126] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1319.405115] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1319.406135] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1319.407115] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1320.797081] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1320.798091] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #202!!!
[ 1326.121176] intel_powerclamp: Stop forced idle injection
[ 1343.137619] intel_powerclamp: Start idle injection to reduce power
[ 1355.185450] intel_powerclamp: Stop forced idle injection
[ 1372.202022] intel_powerclamp: Start idle injection to reduce power
[ 1384.226791] intel_powerclamp: Stop forced idle injection
[ 1401.243395] intel_powerclamp: Start idle injection to reduce power
[ 1411.275389] intel_powerclamp: Stop forced idle injection
[ 1637.846061] nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64 515.43.04 Release Build (yusufkhan@) Tue May 24 06:08:29 PM EDT 2022
[ 1637.846069] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.846071] NVRM rmapiAllocWithSecInfo: client:0x0 parent:0x0 object:0x0 class:0x0
[ 1637.846074] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.846083] NVRM rmapiAllocWithSecInfo: allocation complete
[ 1637.847149] nvidia_drm: unknown parameter 'NVreg_RmMsg' ignored
[ 1637.847478] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[ 1637.847686] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.847688] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 1637.847759] nvidia 0000:01:00.0: Direct firmware load for nvidia/515.43.04/gsp.bin failed with error -2
[ 1637.847771] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x61:0x0:1610)
[ 1637.847787] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 1637.847824] [drm:nv_drm_load [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ 1637.847891] [drm:nv_drm_probe_devices [nvidia_drm]] ERROR [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[ 2690.068695] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068700] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068702] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068710] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068711] NVRM _threadNodeInitTime: Bad threadStateDatabase.timeout.flags: 0x0!
[ 2690.068712] NVRM rm_get_firmware_version: rm_get_firmware_version: Failed to query gpu build versions, status=0x40

The program I attempted to run on this was

#include <xf86drm.h>
#include <nvidia.h>
#include <stddef.h>
#include <stdio.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <fcntl.h>

int main () {
	int fd = open("/dev/dri/card0", O_RDWR);
        printf("%i", fd);
        struct nvidia_gem_alloc_nvkms_memory_params params;
        params.memory_size = 1;
        nvidia_gem_alloc_nvkms_memory(fd, params);
        return params.handle;
}

with my nvidia-next libdrm branch https://gitlab.freedesktop.org/YusufKhan-gamedev/drm/-/tree/nvidia-next

Please Note that the dmesg didnt change immediately after I ran that program

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions