I was testing a tool to backup all the vms in our oVirt 4.1 cluster, called OpenBacchus, a nice piece of software which uses de python ovirt api.
One of the features of OpenBacchus is to configure cron jobs to automate the backups.
I screwed up and configure a cron job to make a full backup of all the vms, EVERY TWO HOURS.
The system crashed as new cron jobs began a new backup process when the earlier one hadn’t finished, causing a lot of corrupted snapshots and failed backups.
Some of these snapshots were in a forever “LOCKED” state and can not be deleted, so to solve this you have two solutions:
If you have backups of the vms that have the corrupted snapshots, you can restore those and delete the original ones.
Or if you don’t have backups you can modify the engine database and then try to delete the snapshot from the web gui of the engine:
1 – SSH to the Hosted Engine machine, and then login with the postgres user
su - postgres
2 – Connect to the engine database using psql:
psql \c engine
3 – Now yo have to get the snapshot_id info from the database, to update it from LOCKED to OK:
select snapshot_id,description,status from snapshots where status = 'LOCKED' and vm_id in (select vm_guid from vm_static where vm_name='');
4 – Update it to be OK:
update snapshots set status = 'OK' where snapshot_id = '';
5 – Also you have to check if there are another images locked involved with this snapshot:
select image_guid,imagestatus from images where vm_snapshot_id = '' and imagestatus != 1;
6 – If so, unlock it:
updates images set imagestatus = 1 where image_guid = '';