Recover lost PG

Today I incurred in a problem where 1 PG was stuck in state stale+active+clean:

2018-01-16 08:03:29.115454 mon.0 [INF] pgmap v350571: 464 pgs: 463 active+clean, 1 stale+active+clean; 2068 bytes data, 224 GB used, 76343 GB / 76722 GB avail

Luckily that was a brand new test Ceph cluster so I could have just wiped it and recreated, but I tried to find a cleaner approach.

At some point I had mistakenly activated some OSDs. Once I realized my mistake I decided to remove those OSDs from the cluster. The problem was that I was too quick and removed all such OSDs without allowing some time for the cluster to rebalance. So, it happened that 1 PG, which had all its 3 copies on OSDs which got removes, suddenly found itself without a home. Indeed, in my cluster I had only OSD from 0 to 11, but Ceph was trying to store such PG on OSDs 23,20,13:

$ ceph health detail
HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds; 1 pgs stale; 1 pgs stuck stale
pg 1.a3 is stuck stale for 61477.173256, current state stale+active+clean, last acting [23,20,13]

Meanwhile, the machine holding those OSDs had all been destroyed so bringing them back up and allowing the cluster to rebalance was not a viable option.

I re-created the PG with the following command::
$ for pg in ceph health detail | grep ^pg | grep stale | cut -d’ ‘ -f2; do ceph pg force_create_pg $pg; done

which produced this output:

pg 1.a3 now creating, ok

The status of the cluster had not yet improved that much, however:

$ ceph -s
.....
2018-01-16 08:13:57.612721 mon.0 [INF] pgmap v351004: 464 pgs: 1 creating, 463 active+clean; 2068 bytes data, 12828 MB used, 73691 GB / 73704 GB avail
.....
$ ceph health detail
HEALTH_ERR 1 pgs are stuck inactive for more than 300 seconds; 1 pgs stuck inactive; 1 pgs stuck unclean
pg 1.a3 is stuck inactive since forever, current state creating, last acting []
pg 1.a3 is stuck unclean since forever, current state creating, last acting []

Let’s find out what OSD owns the PG:

$ ceph pg map 1.a3
osdmap e225 pg 1.a3 (1.a3) -> up [11,0,16] acting [6]

In the end, I tried restarting the daemon for osd.6 which did the magic:

2018-01-16 08:29:57.619319 mon.0 [INF] pgmap v351249: 464 pgs: 464 active+clean; 2068 bytes data, 12835 MB used, 73691 GB / 73704 GB avail