fix rook-ceph
repair rook-ceph when worker nodes are down
less than a minute
fix rook-ceph
if some worker nodes are down, you need to change the rook-ceph configuration if you use the parameter useallnodes and usealldevices. this guide is for temporarily fixing rook-ceph, and is not a permanent solution
- get the tools pod
kubectl -n <rook-ceph namespace> get pod | grep tools
- get the status
kubectl -n <rook-ceph namespace> exec -it <rook-ceph namespace> -- bash
ceph status
ceph osd status
if there are osds without the status exists,up they need to be removed
ceph osd out <id of osd>
ceph osd crush remove osd.<id of osd>
ceph auth del osd.<id of osd>
you can now check the rest of the osds with ceph osd status
it could be that you also need to decrease the replicationsize:
ceph osd pool ls
ceph osd pool set <pool-name> size 2
the default pool-name should be the replicapool.
then you can delete the deployments of the pods that are making problems
kubectl -n <rook-ceph namespace> delete deploy <deployment-name>