Ceph upgrade from Jewel to Luminous

Unlike with previous Ceph releases, for Luminous the official procedure requires to upgrade MON nodes first.

In the following we will proceed with:

  • upgrade MON
  • upgrade OSD (at this stage, OSDs will still be using Filestore)
  • add MGR nodes, on the same MON nodes
  • upgrade OSD to Bluestore

It is assumed the cluster is managed via ceph-ansible, although some commands and the overall procedure are valid in general.

Upgrade MON

Perform the following actions on each MON node, one by one, so you can check that after the upgrade the node manages to join the cluster:

sed -i -e 's/jewel/luminous/' /etc/yum.repos.d/ceph_stable.repo
yum update
systemctl restart ceph-mon@<monID>

Verify the mon has joined the cluster (note that the output of this command has changed in Luminous):

ceph -m <monIP> -s
ceph -m <monIP> mon versions

Upgrade OSD

Proceed as above with MON, one node at a time, by first updating the package manager configuration file, and then doing a package upgrade.

Finally, restart all OSD daemons with:

systemctl restart ceph-osd.target

Check with:

ceph osd versions

Upgrade Admin node

If you have one, now it’s time to upgrade Ceph on your administration node (e.g., the one from which you run Ansible playbooks).

Re-run ceph-ansible

Update your ceph-ansible files. At the very least you should:

  • file cluster-primary/inventory: if you have not already done so add [clients] hosts, to include your administration node

  • file cluster-primary/inventory: add [mgrs] hosts, colocating them with MON nodes

  • file group_vars/all.yml:

    ceph_stable_release: luminous
    # In the CONFIGURATION section:
    mon allow pool delete = true
    # Line below specific to our case, we have hyge memory
    bluestore_cache_size_hdd: 2*1024*1024*1024
    

Execute ceph-ansible playbook site.yml.

Global cluster settings

After the upgrade, ceph -s will show HEALTH_WARN. To fix that you will have to set (note that huge rebalancing may happen):

ceph osd require-osd-release luminous
# Set this if you have reasonably up-to-date clients everywhere
ceph osd set-require-min-compat-client jewel
# this may cause some rebalancing
ceph osd crush tunables optimal

After this, ceph -s may still complain: in fact, with Luminous it is now mandatory to enable applications on pools. This is done via the command:

ceph osd application enable <pool_name>  <app-name>

Execute ceph health detail to find what pools need to be enabled and what are the valid app-names.

Upgrade to BlueStore

I performed the upgrade one host at a time, removing all OSDs without prior reweighting and adding them back in. This of course causes rebalancing so schedule the activity to happen during off-peak hours.

You have replica 3, right? So your cluster should withstand losing one server for a while (24 hours, in my case). I think the alternative procedure:

  • reweight OSDs to some small fraction
  • remove OSDs
  • add them back in, which will reset their weight to default

would cause far more rebalancing, for far longer time.

When using ceph-ansible you remove OSDs by running infrastructure-playbooks/shrink-osds.yml, which will also wipe OSD disk partitions and make them ready to be discovered by subsequent running of site.yml playbook. Note that the playbook only deals with OSDs under root=default so OSDs below other roots should be moved to root=default first and then removed (unless you like doing things by hand).

Note

Upgrading to Luminous and Bluestore when using more than one root should be safe without any special precaution, as Luminous deals introduces shadow root which should make your special devices (ssd or big) always accessible. However, to be on the safe side I opted for upgrading to Blustore 50% of my special OSD devices, then I modified the CRUSH ruleset as explained below, then I upgraded the rest of the devices.

Luminous introduced the concept of tags (device classes) for disks and is capable of guessing some default classes like ssd, hdd and nvme. Newly discovered disks will have such class properly configured, however if you need to change it, possibly because you are introducing a new class, you can do so with:

ceph osd crush rm-device-class osd.2 osd.3
ceph osd crush set-device-class ssd osd.2 osd.3

After each node has been upgraded, wait until the status goes back to HEALTH_OK.

Upgrade CRUSHmap

Download the CRUSHmap and edit such that rulesets match the device class you intend to use. For example, in my replicated_ruleset I changed the line:

step take default

to:

step take default class hdd

Compile the CRUSHmap and apply it.