Recover boot volume in case of fsck errors¶
Sometimes, for example after a sudden shutdown, it may happen that the boot volume of an instance shows errors like:
.....
Failure: File system check of the root filesystem failed
The root filesystem on /dev/vda1 requires a manual fsck
.....
To fix the error you would need to detach the volume, mount it on some other server, run fsck and finally re-attach it to the original instance. Problem is that when the affected volume is the boot one, you will not be able to detach it from the instance.
So, in this case, you would need to contact Cloud Admins who will:
copy the Ceph volume:
rbd -p cinder-ceph-ct1-cl1 cp volume-<UUID> cinder-ceph-ct1-cl1/backup_<some_meaningful_name>
fix the new volume:
source the appropriate environment variables for the tenant
execute:
cinder manage --name=<some_meaningful_name>_systemvol cinder-ct1-cl1@cinder-ceph-ct1-cl1#cinder-ceph-ct1-cl1 backup_<some_meaningful_name
from the OpenStack GUI, go to the “Volumes” tab and attach the volume called “<some_meaningful_name>_systemvol” to an auxiliary server
on the auxiliary server:
mkdir /tmp/disk # mount/umount, is useful in some cases, to clear internal logs mount --nouuid /dev/vdb1 /tmp/disk # identify filesystem type cat /proc/mounts | grep vdb1 # for ext4 execute: fsck /dev/vdb1 # for xfs execute: xfs_repair /dev/vdb1
from the OpenStack GUI, remove the attachment for the volume “<some_meaningful_name>_systemvol”
if “<some_meaningful_name>_systemvol” is a system volume, make sure the “bootable” flag is set
inform the user that she should create a new instance based on the new volume called “<some_meaningful_name>_systemvol”, and then move any other attachment from the old instance to the new one
Alternative procedure¶
An alternative procedure to manipulate disk volumes, provided they are not encrypted, is the following. With this procedure we only need Ceph admin privileges, no OpenStack credentials are needed as we perform our magic under the hood. The machine needs to be powered-off.
copy the Ceph volume and move the original volume, just in case something goes wrong:
rbd -p cinder-pool cp volume-12345678-abcd-1234-ab12-90abcdef1234 cinder-pool/copy-12345678-abcd-1234-ab12-90abcdef1234 rbd -p cinder-pool rename volume-12345678-abcd-1234-ab12-90abcdef1234 cinder-pool/volume-12345678-abcd-1234-ab12-90abcdef1234-original
disable some Ceph features (will re-enable them later, with only exception of deep-flatten which is only set at creation time):
rbd feature disable copy-12345678-abcd-1234-ab12-90abcdef1234 object-map fast-diff deep-flatten
map the new volume, this will create entries under
/dev/rbd/<pool_name>/<volume_name>
and for all existing partitions, which you should later mount to some local path, for example:rbd -p cinder-pool map copy-12345678-abcd-1234-ab12-90abcdef1234 --name client.admin
now you can either mount and inspect partition, or repair (partition needs to be unmounted):
mkdir /tmp/test mount /dev/rbd/cinder-pool/copy-12345678-abcd-1234-ab12-90abcdef1234-part1 /tmp/test/ umount /tmp/test fsck.ext4 /dev/rbd/cinder-pool/copy-12345678-abcd-1234-ab12-90abcdef1234-part1
while working on the volume you may also want to reset the
root
password so you can later be able to login from the GUI. Mount the partition holding/
, edit file/etc/shadow
and wipe the second field from theroot
line (REMEMBER to set a random, difficult user password as soon as possible!) so the entry looks more or less like:... root::17947:0:99999:7::: ...
once you are done, unmount any mounted partition and unmap the volume:
rbd unmap cinder-pool/copy-12345678-abcd-1234-ab12-90abcdef1234
add back some Ceph features, rebuild the object map, verify everything is OK and finally rename the new volume so it gets the name associated to the original volume:
rbd -p cinder-pool feature enable copy-12345678-abcd-1234-ab12-90abcdef1234 object-map fast-diff rbd -p cinder-pool object-map rebuild copy-12345678-abcd-1234-ab12-90abcdef1234 rbd -p cinder-pool info copy-12345678-abcd-1234-ab12-90abcdef1234 rbd -p cinder-pool rename copy-12345678-abcd-1234-ab12-90abcdef1234 cinder-pool/volume-12345678-abcd-1234-ab12-90abcdef1234
go to the OpenStack GUI and start your instance. Remember to set the
root
password, if you reset it, earlier.