HDD diagnosis and recovery tools
Table of Contents
- 1. Health of the drive (S.M.A.R.T.)
- 2. Good drive, Bad data
- 3. Bad drive, Good data
- 4. Dead drive, Unknown data
Linux offers all the tools needed to diagnose and recover data for free from a failing hard drive as long as it can still be read. I'm just going to cover in a general way some tooling options to repair or extract data on a problematic hard drive in this post.
For those who cannot be bothered to roll out their own custom recovery image, System rescue CD has most of the needed tools within. To note that there are other equally capable custom distros images available for this very purpose. A spare USB key with at least enough storage to hold the distro's image and a quick use of
dd is all that required get going.
1. Health of the drive (S.M.A.R.T.)
The first thing to do on a seemingly failing hard drive is to check if it actually is. Thankfully, on modern-ish drives, it is now possible to do just that by looking up the S.M.A.R.T. info of the device which indicates its general health and usage stats.
smartctl (part of the
smartmontools package) which can print all that information for you.
smartctl -a /dev/device
The resulting output should give a clue as to what the problem is. If the device's health looks good then the problems will likely stem from data corruption (see next section).
If there is a high number of sector re-allocation reported it would be wise to replace the drive preemptively. It's usually a sign that the drive is on its way out.
2. Good drive, Bad data
2.1 File system checker
fsck (file system consistency check) utility can identify and correct integrity errors in unix/linux file systems. Just make sure the target partition is NOT mounted when subjecting it to
If the drive/partition is encrypted, it will need to be decrypted first (see section 2.4.1).
fsck -CVr /dev/device
2.2 Recovery from a formatted hard drive
So you formatted your drive by mistake or something obliterated the partition table... The good news is that it can be recovered from in most cases (lest data was written on top).
- Fix partition table, recover deleted partition,
- Recover FAT32 boot sector from its backup,
- Rebuild FAT12/FAT16/FAT32 boot sector,
- Fix FAT tables,
- Rebuild NTFS boot sector,
- Recover NTFS boot sector from its backup,
- Fix MFT using MFT mirror,
- Locate ext2/ext3/ext4 Backup SuperBlock,
- Undelete files from FAT, exFAT, NTFS and ext2 filesystem,
- Copy files from deleted FAT, exFAT, NTFS and ext2/ext3/ext4 partitions.
2.3 Data recovery tools
A file header/footer is just a set of bytes identifying the file as of a certain type. This information is useful when trying to find files from a raw data source with no data telling us the structure of the content within. E.g.: 'jpg' files start with a
ff d8 ff e0 and end with
scalpel, a Linux and Mac file system file recovery utility originally based on foremost, offers an alternative with it's own set of advanced features.
If one of the tools doesn't return much of the lost data from a device/image it is worth trying the other ones as well as they may be more successful or help complete the recovered set.
2.4 Accessing an encrypted volume
2.4.1 Decrypting a partition/drive
If the partition or the entire drive to be checked is encrypted (let's assume with LUKS since this is for Linux) it will need to be un-encrypted first.
cryptsetup luksOpen /dev/device luks_volume
2.4.2 Mounting to access content
In the case where the partition/drive is set up as a LVM then the volumes within the encrypted partition/drive will need to be mapped. To find out if this is the case just use
vgscan command and, to mount the volume(s),
Otherwise, if LVM is *not* used, just mount the partitions found in
/dev/mapper/luks_volume like as normal using the
3. Bad drive, Good data
3.1 Checking for bad blocks
On modern drive, S.M.A.R.T. can passively identify bad blocks on the device but only when an actual error is actually reported to it. To actively check for bad blocks the
badblocks utility can be used in either a non-destructive or destructive scenario.
The non destructive method tests each blocks by writing to it but, on each iteration, does a backup of block's data prior. This way the block's data can be put back after its test is completed:
badblocks -nsv /dev/device
To create a list of all the blocks identified as bad just amend the
-o /save/path/to/badblocks.txt to the options in the arguments.
The destructive way will effectively wipe all data on the drive as it is testing each blocks. This is great for testing new drives but not so much ones in use. When taking this option, a full backup is required to restore everything back to its original state once the test has completed.
badblocks -wsv /dev/device
3.2 Backing up the data
So the device looks to be failing hard. What now? If a backup has not been done very recently it might be a good idea to try doing one if the content can still be accessed... This is where that spare drive mentioned earlier comes into play.
A spare drive with at least an amount of free space equivalent to the size of the dying drive is will be required.
dd could be used but they are not really the best tools for the job when the drive might just die on you mid-transfer and/or the data therein has, potentially, errors. A much better option for purpose at hand is
ddrescue copies the data but also attempts to repair it when read errors occur. Specifying a mapping file will enable the process to be resumed if interrupted and further read attempts to be made on problematic areas of the failing device.
When dealing with a failing device, the aim should be to get as much of the good/uncorrupted data as possible in one go. Once all that data is copied safely the attention can be focused the parts of the device that caused read errors on the first pass.
3.2.1 Cloning from failing device to new device
Make sure neither the failing device nor the new device are mounted and that the target device for the cloning is at least the same size as the failing one.
The direct device-to-device cloning process should be considered when the failing device is to be replaced but the data needs to be the same and both the data and the failing device are in good enough conditions to do a full 1-1 copy without too many crippling errors.
ddrescue -df /dev/failing-device /dev/new-device /mnt/ext-usb/failing-device-map.log
3.2.2 Imaging failing device to a recovery file
Make sure that the file system where the image file is to be saved to has more free space than the size of the failing device available.
Cloning to an image is used for recovery purposes rather than just replacing a failing device with a fresh one. If a lot of errors are reported by
ddrescue it may not be worth cloning the image into a new device. Targeted individual file/folder recovery from the image would then be a better option.
ddrescue -d /dev/device /mnt/ext-drive/hdd.img /mnt/ext-drive/hdd-map.logPass #2: bad data recovery (3x attempts for each bad sectors:
ddrescue -d -r3 /dev/device /mnt/ext-drive/hdd.img /mnt/ext-drive/hdd-map.log
3.2.3 Mounting a recovery image
To mount a recovery image into your system (e.g.: the
/mnt/recovery-img directory) there are different approaches depending on what the image is.
Mounting an image of a partition
That's the easiest. Note that the
ro read-only option is passed so that the image does not get modified by mistake.
mount -o loop,ro /mnt/ext-drive/hdd.img /mnt/recovery-img
Mounting an image of a drive
When there are multiple partitions in the image of the failing drive the offsets of each must be calculated before mounting these is possible. This is where the
parted utility becomes useful.
Then type the following in the interactive prompt to get the start/end/size of each partitions in the image:
units B print
With this information a partition can be mounted by passing its 'start' as the
$OFFSET value and its 'size' as its
$SIZELIMIT value. If the partition is the last on the imaged device then the
sizelimit argument can be omitted.
mount -o loop,ro,offset=$OFFSET,sizelimit=$SIZELIMIT /mnt/ext-drive/hdd.img /mnt/recovery-partX
4. Dead drive, Unknown data
The only recourse left on a drive that is not recognised is to send it off to your nearest friendly data recovery place that has a "clean room". If it's a mechanical drive you may still have a chance but with a solid state, it's another matter...
When the controller on an SSD goes you might as well just bin the thing or, as a last ditch attempt. try the power cycle method (I've had very limited success with it but your experience may vary). If that somehow works through some miracle, it is advisable to use that opportunity and do a full backup before swapping the device with a new one.