Running your own Kubernetes cluster requires performing maintenance from time to time. It might be due to updating software or even replacing hardware. In my case, it is the other option. I want to replace old HDDs with new faster SSDs.
My Kubernetes cluster contains 6 nodes: 3 control plane nodes and 3 worker nodes. All of those nodes are Virtual Machines of course.
❯ kubectl get nodes
NAME STATUS ROLES AGE VERSION
vm0101 Ready control-plane 34d v1.24.9
vm0102 Ready <none> 34d v1.24.9
vm0201 Ready control-plane 33d v1.24.9
vm0202 Ready <none> 34d v1.24.9
vm0301 Ready control-plane 34d v1.24.9
vm0302 Ready <none> 34d v1.24.9
To perform a disk replacement on one of my computers I have to remove vm0301
and vm0302
from the cluster. But first I have to prepare a cluster for that.
Reducing OpenEBS cStor replicas
The cluster uses OpenEBS cStor for dynamic volume provisioning. This engine takes care of replicating data and volumes among the cluster. Before removing nodes, we have to re-adjust the number of replicas. You can read about it in the following post.
Graceful removal of the OpenEBS cStor disk pool from the cluster
Draining and deleting nodes
As a next step, we will drain nodes, so the scheduler can reschedule Pods to other nodes. Let's start with vm0302
, the worker node.
❯ kubectl drain vm0302 --ignore-daemonsets --delete-local-data
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/vm0302 cordoned
Warning: ignoring DaemonSet-managed Pods: calico-system/calico-node-6zjqd, calico-system/csi-node-driver-85kcd, kube-system/kube-proxy-79b9l, metallb/metallb-speaker-w65t7, monitoring/prometheus-prometheus-node-exporter-7vs4z, openebs/openebs-cstor-csi-node-6xm66, openebs/openebs-ndm-9xqvr
evicting pod openebs/openebs-localpv-provisioner-686b564b5d-v7x5d
evicting pod calico-system/calico-typha-8586df6596-prkwd
evicting pod openebs/openebs-cstor-admission-server-689d6687f-rhvp5
evicting pod openebs/openebs-cstor-cspc-operator-7dffb6f55-gzc7l
evicting pod openebs/openebs-cstor-csi-controller-0
evicting pod calico-apiserver/calico-apiserver-7cd9f48498-58s4g
evicting pod openebs/openebs-cstor-cvc-operator-7c545f6c94-nngjc
evicting pod cert-manager/cert-manager-cainjector-f6b49bddd-gqct2
evicting pod kube-system/coredns-57575c5f89-hsmnx
evicting pod ingress-nginx/ingress-nginx-controller-7444c75fcf-8hrb9
evicting pod mysql/mysql-innodbcluster-2
evicting pod mysql/mysql-operator-76b8467d9c-t5tnw
pod/calico-typha-8586df6596-prkwd evicted
I0217 18:06:02.724903 54873 request.go:682] Waited for 1.124592395s due to client-side throttling, not priority and fairness, request: GET:https://kubernetes.slys.dev:6443/api/v1/namespaces/openebs/pods/openebs-cstor-cvc-operator-7c545f6c94-nngjc
pod/openebs-localpv-provisioner-686b564b5d-v7x5d evicted
pod/openebs-cstor-cvc-operator-7c545f6c94-nngjc evicted
pod/cert-manager-cainjector-f6b49bddd-gqct2 evicted
pod/calico-apiserver-7cd9f48498-58s4g evicted
pod/openebs-cstor-cspc-operator-7dffb6f55-gzc7l evicted
pod/openebs-cstor-csi-controller-0 evicted
pod/openebs-cstor-admission-server-689d6687f-rhvp5 evicted
pod/coredns-57575c5f89-hsmnx evicted
pod/ingress-nginx-controller-7444c75fcf-8hrb9 evicted
pod/mysql-operator-76b8467d9c-t5tnw evicted
pod/mysql-innodbcluster-2 evicted
node/vm0302 drained
It may take a longer time to evict Pods. Once the first node is drained we can move to the other one - the control plane node vm0301
.
❯ kubectl drain vm0301 --ignore-daemonsets --delete-local-data
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/vm0301 cordoned
Warning: ignoring DaemonSet-managed Pods: calico-system/calico-node-qhf2n, calico-system/csi-node-driver-wbmdp, kube-system/kube-proxy-mj85r, metallb/metallb-speaker-mbcv8, monitoring/prometheus-prometheus-node-exporter-qjdmf
node/vm0301 drained
Draining the control plane node is way quicker as there is not much workload on it. It's time to delete the nodes.
❯ kubectl delete node vm0302
node "vm0302" deleted
❯ kubectl delete node vm0301
node "vm0301" deleted
Making ETCD healthy again
My Kubernetes cluster uses ETCD deployed as a static Pod to store its internal state. Let's switch to kube-system namespace for a few moments.
❯ kubens kube-system
Context "kubernetes-admin@kubernetes" modified.
Active namespace is "kube-system".
When printing logs we can notice that the ETCD cluster is in an unhealthy state.
❯ kubectl logs etcd-vm0101
{"level":"warn","ts":"2023-02-17T19:45:58.284Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"d2081a5648ccdb79","rtt":"18.802473ms","error":"dial tcp 192.168.111.98:2380: connect: no route to host"}
It is because the Kubernetes control plane node was not removed from the ETCD member list.
I do not have etcdctl installed locally, so I am going to get into the ETCD Pod.
❯ kubectl exec -it etcd-vm0101 -- sh
sh-5.1#
Now we are attached to the Pod. Let's list members.
sh-5.1# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
21ca2dd1554d7b75, started, vm0101, https://192.168.111.97:2380, https://192.168.111.97:2379, false
d2081a5648ccdb79, started, vm0301, https://192.168.111.98:2380, https://192.168.111.98:2379, false
e613fe6e1ce9fe8f, started, vm0201, https://192.168.111.103:2380, https://192.168.111.103:2379, false
We can see that vm0301
still exists on the list. It's high time to remove it.
sh-5.1# ETCDCTL_API=3 etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove d2081a5648ccdb79
The ETCD cluster is healthy again!
Conclusion
Removing nodes from a Kubernetes cluster is not a that straightforward task. It has to be done with great caution. Doing it wrong may lead to data loss or even a whole cluster breakdown. In my case, it involved the reconfiguration of the storage engine before draining the nodes and making ETCD cluster healthy again afterward. Curing ETCD is crucial as it may prevent the addition of new control plane nodes in the future.