Wednesday, January 13, 2010

Lessons Learned--Part 1--RAID 0+1

I'm something of an experiential learner. Herewith the first things from the school of "dear God that was a stupid thing to do"--and a few that I actually got right the first time.

Never, ever use RAID 0+1 with an Intel ICHxR chip I like to build computers. Like many a PC do-it-yourselfer, I sometimes get too ambitious. Yes, I could build simple computers, but where's the fun in that? My first project was putting my Dell Dimension 8200 (or something like that) into a new (flashy, naturally) case. Of course, Dell used a proprietary motherboard format that didn't fit the new case. I got out my trusty drill and drilled new holes in the case backboard to accommodate standoffs. Of course, I decided to keep the motherboard on the computer to line up the holes. Not a smart idea, that. The case kept going, the motherboard didn't. I eventually combined bling (the case of course had a clear window) in the form of blacklight CCFLs, glow-in-the-dark sleeves on the power cables and light-up round IDE cables with overengineering (I managed to squeeze a Thermalright SI-120 on my new and undrilled ASrock 939Dual-SATA2 (AMD 3800X2), along with a fan for my RAM (with flashing lights!), a fan clamped on top of the Northbridge heatsink, some massive Panasonic fans throwing about 100 CFM, a Radeon 9700 All-in-Wonder with aftermarket heatsink fan (later joined by an Nvidia 7600 which I somehow managed to keep myself from sticking extra fans on--the Dual-SATA2 supported both AGP and PCI-E, so who was I to pike out?), and two (count 'em, two!) WD Raptor 72GB drives in RAID 0, plus two more data hard drives. So, my first computer (still going all these years later despite the odds) boasted 10 fans (2 case, 1 CPU, 2 RAM, 1 Northbridge, 1 PSU, 2 video) and four hard drives, all wrapped in an aluminium case that practically acted as an amplifier, was a tad noisy. A couple of years later, the kind people I work for let me build a new computer for myself. I did my homework this time. I got a normal motherboard (Gigabyte GA-965-DS3R), a sensible case (a big steel Antec case with a modular 500W PSU), a nice Core 2 Duo E6850 (plus a huge whomping Thermalright Ultra-120 with lovingly lapped copper base plus and my fastest Panasonic air pusher), four WD K-series 250GB drives with an RAID 0+1 partition for data and an 8GB RAID 0 partition for the swap file, 4GB of DDR2-800 RAM (later upped to 8GB), and Win XP 64-bit. (I ran XP 64-bit on my previous workstation, a Dell Precision 380 in the days before 64-bit printer drivers.) RAID 0+1, I thought, was the piece de resistance,. It would be safe, unlike my RAID 0 at home, and speedy, unlike the RAID 5 in the Precision 380. It merrily dispatched Stata analyses with blistering speed until I got back from winter vacation to discover that it wasn't running and reported that one of the disks had failed. Fine, I thought. It used crash on occasion, recovering with RAID errors, and this would finally let my identify the failing drive. I jotted down the failed drive's serial, pulled it, got a new one from Microcenter, installed it, and...and...it wouldn't recover. I'd get to the RAID BIOS, tell it to rebuild (all the information was there, thanks to the +1 part of the RAID) and it wouldn't boot, going straight to asking to boot from CD. I consulted the Intel Matrix RAID web pages (the board used an ICH-9R northbridge) to discover that, yes, it would recover once it booted into Windows. What one would do if one couldn't boot into Windows in the first place wasn't mentioned. Which brings me to the lesson learned. As I found out subsequently, the boot sectors of the first two disks of a RAID 0+1 array are necessary for it to boot properly. Naturally, I experienced failure in disk 1. Intel's Matrix RAID, while in just about every other way a truly admirable system, as a partially software RAID controller, needs an OS to recover, and that doesn't work with a corrupt boot section. Next time, RAID 1 or a proper RAID card! The news wasn't completely awful, though. I did have data redundancy and Vincent from PC-Maker in Waltham, after several days of pain, managed to pull out all my (and my employer's) data for far less than the $2,000 plus of data recovery companies. As it happened, my automated backup at work, other than being almost impossibly large and bringing down the repeated wrath of the network admins at work on me as I tried to recover from it, had stopped backing up my hard drive. I have a sensible new computer (a Dell OptiPlex) on the way and the sad carcass of my second DIY computer awaiting cannibalization.