Delete Worker-Node from a Kubernetes Cluster

This guide outlines the steps to delete worker-nodes from a cluster, specifically how to proceed with rook-ceph and other KubeOps Compliance applications

Deleting a Node from a Kubernetes cluster

In rare cases, it may be necessary to remove nodes from a Kubernetes cluster. This how-to guide explains the prerequisites and the key considerations to keep in mind before starting the node removal process.

You can use the following steps to delete nodes from a Kubernetes cluster.

Prerequisits

  • In order to run rook-ceph stable for a longer period your cluster needs at least 3 zones with each zone containing at least 1 worker-node

  • To check which mon and osd is running on the node you want to delete you can use the command kubectl get po -nrook-ceph -owide | grep worker02 | grep "mon\|osd" | grep -v "osd-prepare" | awk '{print $1}'. As an output you get the mon and the osd running on that node. If you don’t get an output, you don’t have to delete the ressource and can skip to the “delete the node”-section


Worker

Important: Due to rook-ceph, a worker node must not be removed without following the steps below. In this example, worker01 (zone1) is removed from the cluster. Worker01 contains osd.0 and mon-c.

Scale down the rook-ceph-operator deployment to 0

This prevents new MONs or OSDs from being created.

kubectl scale deploy rook-ceph-operator -n rook-ceph --replicas=0

Check which hosts and OSDs belong to each zone

kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME              STATUS  REWEIGHT  PRI-AFF
 -1         0.21478  root default
 -9         0.04880      zone zone1
 -7         0.04880          host worker01                              # worker01 is being removed
  0    ssd  0.04880              osd.0          up   1.00000  1.00000   # osd.0 is being removed
-15         0.04880          host worker04
  3    ssd  0.04880              osd.3          up   1.00000  1.00000
-11         0.10739      zone zone2
 -3         0.05859          host worker02
  1    ssd  0.05859              osd.1          up   0.95001  1.00000
-13         0.05859      zone zone3
 -5         0.05859          host worker03
  2    ssd  0.05859              osd.2          up   0.95001  1.00000

From this output you can see that osd.0 is part of worker01.

Scale down the OSD deployment

kubectl scale deploy -n rook-ceph rook-ceph-osd-<x> --replicas=0
# Example: kubectl scale deploy -n rook-ceph rook-ceph-osd-0 --replicas=0

Remove the OSD via ceph-tools

kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- bash
# show OSD tree
ceph osd tree
# mark OSD out
ceph osd out <x>
# Example: ceph osd out 0
ceph osd purge <x> --yes-i-really-mean-it
# Example: ceph osd purge 0 --yes-i-really-mean-it
ceph auth del osd.<x>
# adjust CRUSH map
ceph osd crush remove <nodename>
# exit from ceph-tools
exit
# show OSD tree (now without the deleted node)
kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph osd tree

Delete OSD and MON deployments

kubectl delete deploy -n rook-ceph rook-ceph-osd-<x> rook-ceph-mon-<y>
Example
kubectl delete deploy -n rook-ceph rook-ceph-osd-0 rook-ceph-mon-c

Remove the deleted mon from the ceph tools

kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph mon dump
kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph mon rm <y>
# verfify
kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph mon dump
Example
kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph mon dump
kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph mon rm c
# verfify
kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph mon dump

This is the dump before executing the remove:

0: [v2:192.168.231.184:3300/0,v1:192.168.231.184:6789/0] mon.a
1: [v2:192.168.185.9:3300/0,v1:192.168.185.9:6789/0] mon.b
2: [v2:192.168.196.110:3300/0,v1:192.168.196.110:6789/0] mon.c

This is the dump after executing the remove:

0: [v2:192.168.231.184:3300/0,v1:192.168.231.184:6789/0] mon.a
1: [v2:192.168.185.9:3300/0,v1:192.168.185.9:6789/0] mon.b

Delete the node from the kubernetes cluster

  • Prepare your cluster-values.yaml so that the node you want to delete is removed from it
  • Execute the command kubeopsctl apply --delete -f cluster-values.yaml
Example

The cluster-values.yaml without node1 but with node4

# file cluster-values.yaml
apiVersion: kubeops/kubeopsctl/cluster/beta/v1
imagePullRegistry: registry.kubeops.net/kubeops/kubeops
airgap: true
clusterName: myCluster
clusterUser: root
kubernetesVersion: 1.31.6      
kubeVipEnabled: false
virtualIP: 10.2.10.110
firewall: nftables
pluginNetwork: calico
containerRuntime: containerd
kubeOpsRoot: /home/myuser/kubeops
serviceSubnet: 192.168.128.0/17
podSubnet: 192.168.0.0/17
debug: true
systemCpu: 250m
systemMemory: 256Mi
packageRepository: local
changeCluster: true
zones:
- name: zone1
  nodes:
  - name: controlplane01
    iPAddress: 10.2.10.110
    type: controlplane
    kubeVersion: 1.31.6       
  - name: worker04
    iPAddress: 10.2.10.214
    type: worker
    kubeVersion: 1.31.6       
- name: zone2
  nodes:
  - name: controlplane02
    iPAddress: 10.2.10.120
    type: controlplane
    kubeVersion: 1.31.6       
  - name: worker02
    iPAddress: 10.2.10.220
    type: worker
    kubeVersion: 1.31.6       
- name: zone3
  nodes:
  - name: controlplane03
    iPAddress: 10.2.10.130
    type: controlplane
    kubeVersion: 1.31.6  
  - name: worker03
    iPAddress: 10.2.10.230
    type: worker
    kubeVersion: 1.31.6     

After, you execute the command kubeopsctl apply --delete -f cluster-values.yaml

Scale the rook-ceph-operator deployment back to 1

This allows a new MON to be created automatically in zone2.

kubectl scale deploy rook-ceph-operator -n rook-ceph --replicas=1

Timing and health checks

The total duration depends on cluster size and node performance. Before proceeding, verify Ceph health and placement groups are clean.

kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph status
kubectl exec -it deploy/rook-ceph-tools -n rook-ceph -- ceph pg stat

Typical duration ranges from 15 to 120 minutes.

If you want to rejoin the same node, reset it to a time prior to joining the cluster. Only this way you can be sure, that no leftovers from the deletion process remain!