Welcome to TwiceTwo. Whenever I make something, if it's not monumentally embarrassing, I put it here. Unless I'm lazy or forget.
What happens when you need to recover data off a six-year-old, broken, RAID-5 array? Adventure happens!
At work I had a request to find some files that had maybe been archived on a retired MaxAttach NAS. The NAS in question runs Windows 2000 and has four physical drives. I wasn't even sure of the layout of the drives (mirrored? spanned? mirrored and spanned? hardware/software RAID?) at this point. Time to take a look at the Windows LDM layout.
(Actually, first I had to setup a machine with all four drives connected, booting to a live Linux USB drive. Four IDE drives + no SATA = no other way to boot.)
For dynamic disks, Windows stores a copy of the LDM database at the end of every drive in a disk group. That's helpful, as it meant that I didn't have to know the order of the drives in the group in order to find the LDM DB. But unfortunately, either the Linux tools out there for working with LDM don't work, or I was using them wrong. In either case, I only had access to the contents of the normal MBR partition data, not the higher-level LDM data. So my first step was to extract and parse the LDM table, so I could determine the order and layout of the LDM volumes.
Most of my info on the structure of the LDM table and its entries is drawn from these sources:
A couple hours and a bit of C++ gave me a nice listing of all the volumes, "components", partitions, and disks in the system, along with their type (mirror, RAID, etc.) and order. While the system partition was mirrored across all four drives, the data partition was setup using software RAID-5, again across all the drives. So the next step was to try to mount this volume in Linux.
As it turns out, you can't; at least, not directly. Although its possible to setup a Windows RAID-5 volume so that it can be mounted and accessed from Linux, you have to do this from the get-go. (mdadm wants to write its superblock to the end of the partitions, so you have to leave room for that when you setup them up.) You can't take an existing RAID-5 volume and mount it, at least, not without risking modification.
So now we have four independent drives, each with a portion of the logical volume I'm looking for. Time to get our hands dirty with the Windows RAID-5 layout to see if we can't reconstruct the entire logical volume. Windows uses an "inverse" parity stripe layout, but orders data blocks differently from a typical RAID setup. In order to reconstruct the volume, we need to look at each logical block i and figure out
(This info is derived from http://www.z-a-recovery.com/art-raid5-variations.htm.) Since there are four drives in the array, the "current" drive simply rotates over all the drives. However, in each stripe there are only three data blocks (the fourth is the parity block), so we divide the block index by (4 - 1) = 3 to get the stripe index. The physical location of the block on the drive (partition) is simply s * blocksize (64k is the default on Windows).
Pulling in blocks on an as-needed basis gives us an added advantage: it can deal with drive errors better. If, when reading a block, we get an error, we simply fill the block with 0xDEAD or similar, clear() the input stream, and then write the block out as usual. The next block will be seek()'d independently, and thus an error in one block won't affect processing of other blocks, even on the same disk.
Again, a bit of C++ and about 10 hours of runtime gave me a fully reconstituted 360GB logical volume image. As it turned out, the files we were looking for weren't on it.
Yay! Twicetwo.com is back. Old content will return shortly.