Networking (Flannel) issues

Cross-node pod connectivity issues

It happens that pods in different worker nodes could not reach each other.

To check where the pod is runnning do:

$ kubectl get pods -o wide

NAME                                   READY   STATUS    RESTARTS   AGE     IP              NODE           NOMINATED NODE   READINESS GATES
kubernetes-bootcamp-6c5cfd894b-mnq5f   1/1     Running   0          32m   pa1-r2-s01     <none>           <none>
kubernetes-bootcamp-6c5cfd894b-qd2tb   1/1     Running   0          32m   pa1-r3-gpu01   <none>           <none>
kubernetes-bootcamp-6c5cfd894b-t2g2n   1/1     Running   0          47m   pa1-r3-s14     <none>           <none>

Log in in one node and try to ping the others. Note that their IPs are on different subnets (,, etc). Each subnet belongs t oa specific node.

Log in to the worker nodes and check/restart flannel:

$ juju run --application kubernetes-worker "sudo service flannel status [restart]"

This should be enough to solve the connectivity issues.

Flannel plugin disappears in /opt/cni/bin

It happened to us that after replacing a Juju relation between kubernetes-worker and flannel (by juju remove-relation and juju add-relation) the containers created in the worker failed with the following error:

network: failed to find plugin "flannel" in path [/opt/cni/bin]

The cause is probably a bug in the flannel charm, which removes the CNI flannel plugin when the relation is cleared but it does not create it again.

We solved the issue by issuing:

juju upgrade-charm kubernetes-worker

(and if needed rolling back to the previous revision). Indeed, it is kubernetes-worker that installs the plugin during installation!

Configure flannel through etcdctl

Example session:

$ juju ssh etcd/6     # choose the etcd leader

$ etcdctl ls /

$ etcdctl ls /

$  etcdctl get /
{"Network": "", "Backend": {"Type": "vxlan"}}

# specify explicitly that the subnets should be /24 with SubnetLen
$ etcdctl set / '{"Network": "", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}'
{"Network": "", "SubnetLen": 24, "Backend": {"Type": "vxlan"}}

# reconfigure the subnet on a worker ( ->
$ etcdctl get /

$ etcdctl set / '{"PublicIP":"","BackendType":"vxlan","BackendData":{"VtepMAC":"56:25:26:42:5b:00"}}'

$ etcdctl rm /
PrevNode.Value: {"PublicIP":"","BackendType":"vxlan","BackendData":{"VtepMAC":"56:25:26:42:5b:00"}}

# restart flannel on all nodes
$ juju run --application kubernetes-worker "sudo service flannel restart"

# check
$ juju run --application kubernetes-worker "sudo cat /var/run/flannel/subnet.env"
$ juju run --application kubernetes-worker "sudo ip -4 a | grep 111"