gpu¶

It could happens that you launch a pod with a GPU as documented here and the pod instead of running remains in pending.

Describing the pod as following:

$kubectl -n <namespace> describe pod <pod_name> >

you could find the error:

0/N nodes are available: N Insufficient nvidia.com/gpu

where N is the number of GPU you have in your cluster.

In this case it is reccomended to:

Be sure you enabled the device plugin feature gate
Check the output of the following command:
```
$kubectl describe gpu-node
```
Check the k8s-device-plugin container logs entering the GPU node and giving the following command:
```
$docker logs <k8s-device-plugin container name>
```
Check the output of the following command on the GPU node:
```
$nvidia-smi -a
```
Check your docker configuration file on the GPU node and relaunching the docker daemon. (e.g: /etc/docker/daemon.json)
Check the kubelet logs on the node:
```
$sudo journalctl -r -u kubelet
```

Example¶

Looking at the k8s-device-plugin container you can find the following ouputs:

Loading NVML
Failed to initialize NVML: could not load NVML library.
If this is a GPU node, did you set the docker default runtime to `nvidia`?
You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites
You can learn how to set the runtime at: https://github.com/NVIDIA/k8s-device-plugin#quick-start

You can solve it by editing the docker configuration file on the GPU node and relaunching the docker daemon.