fix rook-ceph

repair rook-ceph when worker nodes are down

fix rook-ceph

if some worker nodes are down, you need to change the rook-ceph configuration if you use the parameter useallnodes and usealldevices. this guide is for temporarily fixing rook-ceph, and is not a permanent solution

  1. get the tools pod
kubectl -n <rook-ceph namespace> get pod | grep tools
  1. get the status
kubectl -n <rook-ceph namespace> exec -it <rook-ceph namespace> -- bash
ceph status
ceph osd status

if there are osds without the status exists,up they need to be removed

ceph osd out <id of osd>
ceph osd crush remove osd.<id of osd>
ceph auth del osd.<id of osd>

you can now check the rest of the osds with ceph osd status

it could be that you also need to decrease the replicationsize:

ceph osd pool ls
ceph osd pool set <pool-name> size 2

the default pool-name should be the replicapool.

then you can delete the deployments of the pods that are making problems

kubectl -n <rook-ceph namespace> delete deploy <deployment-name>