Notes from Storage Purgatory
Sunday, April 24th, 2005I just wanted to post a few updates to my excessively upbeat pair of articles on my mass storage setup:
- I have had to temporarily drop udev. It worked fine initially, but then started causing my system to restart during the boot sequence. I have no idea why, and it might be some stupidity on my part, but nevertheless, I can’t boot my server unless I have devfs enabled.
- That said, it appears to be possible to automatically discover both RAID sets and LVM physical volumes without knowing the precise device names. I haven’t experimented with the scan feature in mdadm to discover RAID partitions, but
pvscanworks great for finding LVM physical volumes. - The firewire subsystem in Linux is still a little cranky. I find that under heavy load, even with the
serialize_io=1option enabled in the sbp2 module, the I/O system will just stall for several seconds, then resume transmissions. In the worst case, you’ll actually see kernel messages about failed SCSI commands and retries. However, even under moderate load, there can be some unpleasant effects. The I/O scheduling seems to be such that if process A is writing a lot of data to the firewire disk (and is capable of producing data faster than the bandwidth available), it will fill the in-memory buffers and monopolize access to the device until the buffers clear. Process B which reads the disk at a moderate, but consistent, rate will find itself stalled periodically for up to 10 seconds while a large amount of buffered data from process A is flushed to disk. The stall is so complete, that it can appear as if contact with the NFS server has temporarily been lost from the perspective of the client nodes. In general, I seem to get the best throughput when processes write data at a rate that comes close to, but does not significantly exceed, the firewire bandwidth. - “Hotplug” clearly means different things to different people. As mentioned by an excellent comment to a previous article, the external rack is indeed “hotplug” in the most basic sense: You can individually power off bays with the keyswitch to add/remove hard drives safely without powering down the entire array. However, the firewire/IDE bridges do not individually power off, and generally do not seem to react to new drives that have been attached without a complete power cycle of the whole rack. Perhaps this is a limitation of IDE, but nevertheless it is annoying. Moreover, I’ve found the Linux firewire subsystem does not seem to detect the new drives (though it will redetect the old drives) even if the power to the whole firewire rack is cycled after HD installation. I actually have to restart the entire server!
Thankfully, I’m basically a single user operation, so this only annoys me, but it is pretty sad given how close all of this is to true hotplug support. A few missing details, and the whole thing fails miserably to provide. I can only hope that future SATA/firewire racks address the hardware limitations. Not sure if there is any blame to be placed on the software as well.
Though, one positive thing I do have to say is that LVM2 and software RAID (with the help of the mdadm tool) have been positively bullet-proof for me. In all of the monkeying with firewire, I lost communication with drives, which made RAID sets appear to fail. This even occurred while I was moving data off of a LVM2 physical volume (using pvmove) onto a new RAID1 set that had not finished the initial synchronization. (Okay, maybe that was a little foolhardy…) Not only did I have no data loss at all, but I was even able to restart the move (after rebooting everything) from where it left off! I was seriously concerned, but the failures were handled gracefully. LVM2 and software RAID still get two thumbs up, even if the firewire hardware has left me slightly disappointed.
