What is it all about?
We recently encountered a bug that prevented us from deploying and testing services in our cluster.
This was triggered by containerd v.1.4.6. The new pods were on pending and had the error „Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded“.
While troubleshooting, we came across the solution in the Weave documentation. (https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-things-to-watch-out-for).
However, we have additionally come across an alternative solution approach, which is to increase the CIDR range of Service and Pod Network. The issue, however, is that this is not provided for after a cluster is created. The CIDR entries are read-only in the control plane node. How the whole thing works is explained in more detail below.
How does this work?
To override the write protection of the control plane node, another control plane node is joined and edited. Since this has no entries, but this could still be changed in the future, there will be no write access protection for it. Thus the item podCIDR can be added to the spec entry there. On this control plane node, the CIDR entries must then be updated in the manifest files. In addition, the CIDR and the address of the second master must be updated in the ConfigMaps kube-proxy, kubeadm-config and cluster-info.
Example:
Pre-requisites
A cluster with two control plane nodes and multiple worker nodes is required. Kubernetes version 1.23.14 with the container runtime containerd and CNI Weave (https://www.weave.works/docs/net/latest/kubernetes/kube-addon/#-installation) would be used.
The cluster is created with --service-cidr="192.168.128.0/24" and --pod-network-cidr="192.168.129.0/24".
Testing
After the cluster is created, `kubectl edit` is used to edit the second control plane node.
The spec section is adapted as follows:
spec:
podCIDR: 192.168.130.0/23
podCIDRs:
- 192.168.130.0/23
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
Then, on the second control-plane node, the manifest files are updated with the new CIDR. The command `kubeadm init phase control-plane all --service-cidr "192.168.128.0/23" --pod-network-cidr "192.168.130.0/23"` can be used for this.
In the cluster, the ConfigMaps for kube-proxy, kubeadm-config and cluster-info must now be configured.
First, the ConfigMap of kube-proxy is edited, using the command `kubectl edit cm kube-proxy -n kube-system`.
The `clusterCIDR` entry must have the value of the new pod-network-cidr`s and the `server` entry must have the IP address of the new control-plane-node.
Then the ConfigMap of kubeadm-config is edited with the command `kubectl edit cm kubeadm-config -n kube-system`.
The `podSubnet` entry must have the value of the new pod-network-cidr`s, the `serviceSubnet` entry must have the value of the new service-cidr`s and the `controlPlaneEndpoint` entry must have the IP address of the new control plane node.
Finally, the ConfigMap of kube-proxy is edited with the command `kubectl edit cm cluster-info -n kube-public`.
The `server` entry must have the IP address of the new control plane node.
Now the kubelet is restarted on the second control plane node. The command `systemctl restart kubelet` is used for this.
Now the first control plane node is removed from the cluster with the command `kubectl delete node Control-Plane-Node1`.
If an admin node is used, the ./kube/config file in the user directory must be modified. The `server` entry must have the IP address of the new control plan node. The CIDR range of the cluster has now been updated.
Author: Tobias Altmann