Skip to content

No devices found. Waiting indefinitely #1552

@alrex66608

Description

@alrex66608

SOS,救命!!!

version_info:
os : Ubuntu24.04
Kubernetes version : 1.28.15
Containerd version : 1.7.28
k8s-device-plugin : v0.17.1

GPU检测正常
Image

Image

配置过了运行时
nvidia-ctk runtime configure --runtime=containerd
systemctl daemon-reload
systemctl restart containerd

But , 设备插件依旧检测不到GPU

root@master:~/demo/nvidia-device-plugin# kubectl logs nvidia-device-plugin-daemonset-sdd6n -n kube-system
I1204 07:31:31.764138       1 main.go:235] "Starting NVIDIA Device Plugin" version=<
	3c378193
	commit: 3c378193fcebf6e955f0d65bd6f2aeed099ad8ea
 >
I1204 07:31:31.764189       1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
I1204 07:31:31.764229       1 main.go:245] Starting OS watcher.
I1204 07:31:31.764515       1 main.go:260] Starting Plugins.
I1204 07:31:31.764549       1 main.go:317] Loading configuration.
I1204 07:31:31.765073       1 main.go:342] Updating config with default resource matching patterns.
I1204 07:31:31.765318       1 main.go:353] 
Running with config:
{
  "version": "v1",
  "flags": {
    "migStrategy": "none",
    "failOnInitError": false,
    "mpsRoot": "",
    "nvidiaDriverRoot": "/",
    "nvidiaDevRoot": "/",
    "gdsEnabled": false,
    "mofedEnabled": false,
    "useNodeFeatureAPI": null,
    "deviceDiscoveryStrategy": "auto",
    "plugin": {
      "passDeviceSpecs": false,
      "deviceListStrategy": [
        "envvar"
      ],
      "deviceIDStrategy": "uuid",
      "cdiAnnotationPrefix": "cdi.k8s.io/",
      "nvidiaCTKPath": "/usr/bin/nvidia-ctk",
      "containerDriverRoot": "/driver-root"
    }
  },
  "resources": {
    "gpus": [
      {
        "pattern": "*",
        "name": "nvidia.com/gpu"
      }
    ]
  },
  "sharing": {
    "timeSlicing": {}
  },
  "imex": {}
}
I1204 07:31:31.765334       1 main.go:356] Retrieving plugins.
E1204 07:31:31.765477       1 factory.go:112] Incompatible strategy detected auto
E1204 07:31:31.765489       1 factory.go:113] If this is a GPU node, did you configure the NVIDIA Container Toolkit?
E1204 07:31:31.765494       1 factory.go:114] You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
E1204 07:31:31.765499       1 factory.go:115] You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
E1204 07:31:31.765504       1 factory.go:116] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
I1204 07:31:31.765516       1 main.go:381] No devices found. Waiting indefinitely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions