-
Notifications
You must be signed in to change notification settings - Fork 766
Open
Description
SOS,救命!!!
version_info:
os : Ubuntu24.04
Kubernetes version : 1.28.15
Containerd version : 1.7.28
k8s-device-plugin : v0.17.1
配置过了运行时
nvidia-ctk runtime configure --runtime=containerd
systemctl daemon-reload
systemctl restart containerd
But , 设备插件依旧检测不到GPU
root@master:~/demo/nvidia-device-plugin# kubectl logs nvidia-device-plugin-daemonset-sdd6n -n kube-system
I1204 07:31:31.764138 1 main.go:235] "Starting NVIDIA Device Plugin" version=<
3c378193
commit: 3c378193fcebf6e955f0d65bd6f2aeed099ad8ea
>
I1204 07:31:31.764189 1 main.go:238] Starting FS watcher for /var/lib/kubelet/device-plugins
I1204 07:31:31.764229 1 main.go:245] Starting OS watcher.
I1204 07:31:31.764515 1 main.go:260] Starting Plugins.
I1204 07:31:31.764549 1 main.go:317] Loading configuration.
I1204 07:31:31.765073 1 main.go:342] Updating config with default resource matching patterns.
I1204 07:31:31.765318 1 main.go:353]
Running with config:
{
"version": "v1",
"flags": {
"migStrategy": "none",
"failOnInitError": false,
"mpsRoot": "",
"nvidiaDriverRoot": "/",
"nvidiaDevRoot": "/",
"gdsEnabled": false,
"mofedEnabled": false,
"useNodeFeatureAPI": null,
"deviceDiscoveryStrategy": "auto",
"plugin": {
"passDeviceSpecs": false,
"deviceListStrategy": [
"envvar"
],
"deviceIDStrategy": "uuid",
"cdiAnnotationPrefix": "cdi.k8s.io/",
"nvidiaCTKPath": "/usr/bin/nvidia-ctk",
"containerDriverRoot": "/driver-root"
}
},
"resources": {
"gpus": [
{
"pattern": "*",
"name": "nvidia.com/gpu"
}
]
},
"sharing": {
"timeSlicing": {}
},
"imex": {}
}
I1204 07:31:31.765334 1 main.go:356] Retrieving plugins.
E1204 07:31:31.765477 1 factory.go:112] Incompatible strategy detected auto
E1204 07:31:31.765489 1 factory.go:113] If this is a GPU node, did you configure the NVIDIA Container Toolkit?
E1204 07:31:31.765494 1 factory.go:114] You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
E1204 07:31:31.765499 1 factory.go:115] You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
E1204 07:31:31.765504 1 factory.go:116] If this is not a GPU node, you should set up a toleration or nodeSelector to only deploy this plugin on GPU nodes
I1204 07:31:31.765516 1 main.go:381] No devices found. Waiting indefinitely.
Metadata
Metadata
Assignees
Labels
No labels
