Ceph-OSD replacing a failed disk

You just noticed one of the OSDs has a problem, or will soon break, and you decide to replace it.

Preparing for replacement (GARR-specific section)

At GARR, we are using FC storage to provide disks to Ceph. For completeness, we outline the procedure to be followed:

  • remove the failed disk from the set of disks presented to the CephServer HostGroup

  • select a new disk, respecting placement (same drawer) and naming convention (e.g. R1S1E0D2-02)

  • associate such disk to the CephServer HostGroup

  • change, in the HostMapping tab, the LUN associated to the new disk. Note that the LUN should not be ‘0’ and should be such that you can uniquely identify the disk on the host: for example, pick LUN numbers in range 10-30 for Storage1 and in range 40-60 for Storage2.

  • open a terminal on the storage server and rescan-multipath (actually, this is normally not done by hand, but via a proper Ansible role):

    $ echo "1" > /sys/class/fc_host/host#/issue_lip
    

Remove the failed disk from Ceph

In the following, {osd-name} indicates the full name of the OSD, like osd.66.

From the Ceph administration node, or from any Ceph server:

  • if the noout flag is set, most likely the Ceph cluster will be in warning state, showing PG in inconsistent/degraded state and possibly showing unfound objects. This is normal, don’t panic.

  • unset the noout flag:

    $ ceph osd unset noout
    
  • set the disk’s weight to 0 and monitor rebalancing activity (running ceph -w) until completion:

    $ ceph osd crush reweight {osd-name} 0.
    
  • at this point, you should not see any unfound object. If you do, your cluster is at risk of losing data and I advice you not to proceed further.

  • take the disk out:

    $ ceph osd out {osd-name}
    
  • remove the osd from Crush map:

    $ ceph osd crush remove {osd-name}
    
  • remove authorization key associated to the disk:

    $ ceph auth del {osd-name}
    
  • login to the server owning the failed disk and make sure the ceph-osd daemon is switched-off (if the disk has failed, this will likely be already the case):

    $ systemctl disable ceph-osd@{osd-name}
    $ systemctl stop ceph-osd@{osd-name}
    
  • finally, remove the osd:

    $ ceph osd rm {osd-name}
    

Add the new disk to Ceph

Add the new disk to Ceph, using your preferred Ceph management tool (ceph-deploy, ceph-ansible,…).

Final cheks

Make sure the new disk appears in:

$ ceph osd tree

In the output of the above command check that the weight associated to the disk is not 0, and that the disk appears under the correct bucket. If this is not the case, adjust weight and placement by executing (either on Admin node, or on any Ceph server):

$ ceph osd crush create-or-move {osd-name} {weight} {bucket-type}={bucket-name}

for example:

$ ceph osd crush create-or-move osd.66 1. storage=r3sto1-hdd