Balance OSDs using mgr balancer module

Luminous has introduced a very-much-desired functionality which simplifies cluster rebalancing.

Due to the semi-randomness of the CRUSH algorithm it is very common to have a cluster where OSD occupation ranges from 45% to 80%: problem is that as soon as one OSD exceed the “full ratio” the whole cluster hangs (to protect your data). Before Luminous one had to deal with ‘osd reweight’ but the result would often be unpredictable.

As documented in the official documentation from Luminous onward you can activate the ‘balancer’ module of the ‘mgr’:

$ ceph mgr module enable balancer
$ ceph balancer status

Rebalancing exploits the setting of ‘weight-set’: note that these are a bit hidden and are not shown by commands such as “ceph osd df”.

Configure the maximum fraction of PGs which are misplaced and the balancer mode:

$ ceph config set mgr/balancer/max_misplaced 0.01
$ ceph config set mgr/balancer/mode upmap
# optionally configure start- end- time for balancing
$ ceph config set mgr/balancer/begin_time  2100
$ ceph config set mgr/balancer/end_time    0700
# other parameters to play with, please refer to doc and/or mailing list
$ ceph config set mgr/balancer/upmap_max_deviation   1
$ ceph config set mgr/balancer/upmap_max_iterations  20

$ ceph config dump | grep balancer

Enable automatic rebalancing:

$ ceph balancer on

Of course, you can stop automatic rebalancing with:

$ ceph balancer off

Automatic rebalancing will run iteratively, while ensuring at most max_misplaced fraction of PGs is misplaced at any given time.

Alternatively, you can run a one-shot optimization by executing a so-called plan. Note that the plan is an optimization relative to the current status of the cluster, so you should create it only when cluster state is HEALTH_OK and immediately remove it after triggering its execution:

# compute current cluster 'score', lower is better
$ ceph balancer eval
# create a plan called 'aplan'
$ ceph balancer optimize aplan
# compute expected score after optimization
$ ceph balancer eval aplan
# if you are curious, see what actions would be performed
$ ceph balancer show aplan
# execute plan
$ ceph balancer execute aplan
# immediately remove all custom plans
$ ceph balancer reset