Ceph-OSD Maintenance Procedure

When something goes wrong with a ceph-osd node, one should follow the right procedure to take down the node for maintenance.

Caveat

Never take down too many OSDs at the same time, because this may result in number of copies for some data to fall below “osd pool default min size”, in which case data consistency or integrity might not be guaranteed.

Check, at least for the most critical pools, that the min_size parameter is set to ‘1’, to ensure that as long as 1 copy of the data is available, input/output will be possible:

$ ceph osd pool get <pool_name> min_size

Solution

Because the node is only to be taken down for maintenance, we would like Ceph to regard the node still in the cluster, but at the same time prevent data to be directed to this node.

First set “noout” option for ceph cluster, to ensure Ceph will not immediately start rebalancing data across available OSDs:

$ ceph osd set noout

Once the cluster is set to noout, the OSDs on the node could be stopped with:

$ sudo service ceph-osd stop id={osd num}

Or stop all OSDs all together, if it is the host itself that requires maintenance.

$ sudo service ceph-osd-all stop

These OSDs would be marked as “in” and “down” in ceph status. As they are still regarded “in” the cluster, data migration caused by rebalancing will not happen. New data written during maintenance time window will not be backfilled to placement groups on these OSDs until they are restarted.

$ ceph -s

After the maintenance work is done, you can start OSDs with:

$ sudo service ceph-osd start id={osd num}

Or start all OSDs with:

$ sudo service ceph-osd-all start

And finally unset noout flag for the cluster.

$ ceph osd unset noout