How to repair NotReady nodes¶
from time to time you may get some nodes in NotReady status:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
pa1-r1-s15 Ready <none> 106d v1.13.6
pa1-r2-s01 Ready <none> 106d v1.13.6
pa1-r3-gpu01 Ready <none> 105d v1.13.6
pa1-r3-s14 NotReady <none> 106d v1.13.5
Further info may come from kubectl describe node:
$ kubectl describe nodes pa1-r3-s14
...
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure Unknown Tue, 11 Jun 2019 00:02:58 +0200 Tue, 11 Jun 2019 00:03:42 +0200 NodeStatusUnknown Kubelet stopped posting node status.
DiskPressure Unknown Tue, 11 Jun 2019 00:02:58 +0200 Tue, 11 Jun 2019 00:03:42 +0200 NodeStatusUnknown Kubelet stopped posting node status.
...
Here is a list of actions that usually fix the problem.
Reset kubelet service¶
ssh to the problematic node, then do:
$ sudo snap restart kubelet.daemon