Extend RocksDB partition on OSDΒΆ

Ok, so you have a nice Ceph cluster, based on Nautilusor later, and took advantage of SSD/NVMe disks to create a logical volume to host RocksDB/WAL for your spinning disks. You also followed the well known suggestion to have such a LVM sized ~40GB, as larger sizes would be useless unless you scale up to ~300GB.

But things evolve, and Nautilus introduced a better way of handling extra space possibily allowed to RocksDB/WAL. Indeed, ceph health detail shows a lot of spillover so you decide to use all available space on fast disks and you would like to go from 40GB RocksDB LVM size to, say, 80GB size.

The trickiest part is identifying which LVM is associated to which OSD-ID: the output of ceph-volume lvm list will provide you with this information. In my case, I opted to upgrade all existing OSDs so I did not have to perform actions serially for each OSD and used a bit of bash/awk/grep to make loops. However, for each OSD here the steps:

  • extend LVM volume:

    lvextend -L80G <VGname>/<LVname>
    
  • IMPORTANT: I needed to reboot my server, to make the OS aware of the change. May be the same result could be obtained some other way, but I did not mind rebooting.

  • Stop the OSD, play some magic, check your magic, resume OSD:

    $ systemctl stop ceph-osd@<OSDnumber>
    # check following command really claims to be extending the partition
    $ ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-<OSDnumber>
    # in the output of the following command check size of the LVM volumes associated to <OSDnumber>
    $ ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-<OSDnumber>
    $ systemctl restart ceph-osd@<OSDnumber>
    
  • force an OSD compact, to get rid of spillover message:

    $ ceph daemon osd.<OSDnumber>