0/N nodes are available: N Insufficient nvidia.com/gpu¶
It could happens that you launch a pod with a GPU as documented here and the pod instead of running remains in pending.
Describing the pod as following:
$kubectl -n <namespace> describe pod <pod_name> >
you could find the error:
0/N nodes are available: N Insufficient nvidia.com/gpu
where N is the number of GPU you have in your cluster.
In this case it is reccomended to:
Check the output of the following command:
$kubectl describe gpu-node
Check the k8s-device-plugin container logs entering the GPU node and giving the following command:
$docker logs <k8s-device-plugin container name>
Check the output of the following command on the GPU node:
$nvidia-smi -a
Check your docker configuration file on the GPU node and relaunching the docker daemon. (e.g: /etc/docker/daemon.json)
Check the kubelet logs on the node:
$sudo journalctl -r -u kubelet
Example¶
Looking at the k8s-device-plugin container you can find the following ouputs:
Loading NVML
Failed to initialize NVML: could not load NVML library.
If this is a GPU node, did you set the docker default runtime to `nvidia`?
You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start
You can solve it by editing the docker configuration file on the GPU node and relaunching the docker daemon.