How to repair NotReady nodes¶

from time to time you may get some nodes in NotReady status:

$ kubectl get nodes
  NAME           STATUS     ROLES    AGE    VERSION
  pa1-r1-s15     Ready      <none>   106d   v1.13.6
  pa1-r2-s01     Ready      <none>   106d   v1.13.6
  pa1-r3-gpu01   Ready      <none>   105d   v1.13.6
  pa1-r3-s14     NotReady   <none>   106d   v1.13.5

Further info may come from kubectl describe node:

$ kubectl describe nodes pa1-r3-s14
...
Conditions:
Type             Status    LastHeartbeatTime                 LastTransitionTime                Reason                   Message
----             ------    -----------------                 ------------------                ------                   -------
MemoryPressure   Unknown   Tue, 11 Jun 2019 00:02:58 +0200   Tue, 11 Jun 2019 00:03:42 +0200   NodeStatusUnknown        Kubelet stopped posting node status.
DiskPressure     Unknown   Tue, 11 Jun 2019 00:02:58 +0200   Tue, 11 Jun 2019 00:03:42 +0200   NodeStatusUnknown        Kubelet stopped posting node status.
...

Here is a list of actions that usually fix the problem.

Reset kubelet service¶

ssh to the problematic node, then do:

$ sudo snap restart kubelet.daemon