Archive for April, 2005

Notes from Storage Purgatory

Sunday, April 24th, 2005

I just wanted to post a few updates to my excessively upbeat pair of articles on my mass storage setup:

  • I have had to temporarily drop udev. It worked fine initially, but then started causing my system to restart during the boot sequence. I have no idea why, and it might be some stupidity on my part, but nevertheless, I can’t boot my server unless I have devfs enabled.
  • That said, it appears to be possible to automatically discover both RAID sets and LVM physical volumes without knowing the precise device names. I haven’t experimented with the scan feature in mdadm to discover RAID partitions, but pvscan works great for finding LVM physical volumes.
  • The firewire subsystem in Linux is still a little cranky. I find that under heavy load, even with the serialize_io=1 option enabled in the sbp2 module, the I/O system will just stall for several seconds, then resume transmissions. In the worst case, you’ll actually see kernel messages about failed SCSI commands and retries. However, even under moderate load, there can be some unpleasant effects. The I/O scheduling seems to be such that if process A is writing a lot of data to the firewire disk (and is capable of producing data faster than the bandwidth available), it will fill the in-memory buffers and monopolize access to the device until the buffers clear. Process B which reads the disk at a moderate, but consistent, rate will find itself stalled periodically for up to 10 seconds while a large amount of buffered data from process A is flushed to disk. The stall is so complete, that it can appear as if contact with the NFS server has temporarily been lost from the perspective of the client nodes. In general, I seem to get the best throughput when processes write data at a rate that comes close to, but does not significantly exceed, the firewire bandwidth.
  • “Hotplug” clearly means different things to different people. As mentioned by an excellent comment to a previous article, the external rack is indeed “hotplug” in the most basic sense: You can individually power off bays with the keyswitch to add/remove hard drives safely without powering down the entire array. However, the firewire/IDE bridges do not individually power off, and generally do not seem to react to new drives that have been attached without a complete power cycle of the whole rack. Perhaps this is a limitation of IDE, but nevertheless it is annoying. Moreover, I’ve found the Linux firewire subsystem does not seem to detect the new drives (though it will redetect the old drives) even if the power to the whole firewire rack is cycled after HD installation. I actually have to restart the entire server!

Thankfully, I’m basically a single user operation, so this only annoys me, but it is pretty sad given how close all of this is to true hotplug support. A few missing details, and the whole thing fails miserably to provide. I can only hope that future SATA/firewire racks address the hardware limitations. Not sure if there is any blame to be placed on the software as well.

Though, one positive thing I do have to say is that LVM2 and software RAID (with the help of the mdadm tool) have been positively bullet-proof for me. In all of the monkeying with firewire, I lost communication with drives, which made RAID sets appear to fail. This even occurred while I was moving data off of a LVM2 physical volume (using pvmove) onto a new RAID1 set that had not finished the initial synchronization. (Okay, maybe that was a little foolhardy…) Not only did I have no data loss at all, but I was even able to restart the move (after rebooting everything) from where it left off! I was seriously concerned, but the failures were handled gracefully. LVM2 and software RAID still get two thumbs up, even if the firewire hardware has left me slightly disappointed.

Netboot for Stupid Computers

Saturday, April 9th, 2005

The hard drive in my desktop machine recently developed 20 bad sectors in a critical part of the root partition, so it is no longer mountable. While the failure is concerning, that drive only held mostly backups of my home directory (which I usually mount via NFS from the server).

Until I replace the drive, I wanted to netboot the desktop and just run it entirely over NFS. I’ve done this several times with my cluster nodes, so I expected it to be easy. However, the BIOS on this computer (a Shuttle XPC SK43G) appears to be incapable of booting the PXE BIOS on the Intel PRO/1000 MT card I have. The main BIOS does initialize the NIC BIOS (I see the little Intel PXE setup message for 3 seconds), but it never actually lets the card do anything during the boot sequence. Very lame.

So I wanted to find a way to force a netboot from another device, like a USB flash drive. After much Google-work plus trial and error, I finally figured out how to do it.

  1. Find a USB flash drive that you can reformat.
  2. Many BIOS’s (including mine) are very dumb, and assume a particular geometry when booting a USB flash drive. The flash drive MUST BE setup with 32 sectors per track. (It’s sort of sad that we have to screw around with heads and tracks on a device that has no moving parts at all but such is PC architecture.) Go check out this guy’s page to learn how to rewrite the partition table on the flash drive to have the appropriate geometry and add a FAT 16 partition. It doesn’t need to be much more than a couple megabytes since we aren’t even putting the kernel on it.
  3. Use mkfs.vfat or mkdosfs to format the FAT 16 partition.
  4. Install SYSLINUX onto the FAT 16 partition. syslinux /dev/[insert partition name here] should be enough.
  5. Go to the Etherboot ROM-O-Matic, pick the latest production release, and generate a “ROM” with the LILO/GRUB/SYSLINUX loadable kernel format and support for your family of network card. (Not supported? Looks like you’re in trouble.)
  6. Copy the .zlilo image to the FAT16 partition and name it LINUX so that SYSLINUX will automatically read that image and boot it.
  7. Go read this article on setting up your server to handle Etherboot workstations. It’s actually very similar to the PXE case, but you have to process your kernel image with mkelf-linux or mknbi-linux. For diskless workstations, you will need to pass the kernel several options, which you can embed in the image using the –append option in mkelf-linux/mknbi-linux. (The Etherboot authors recommend mkelf-linux, but I was only able to get the kernel to boot with mknbi-linux. YMMV)
  8. Try to figure out how to setup your BIOS to boot from USB. Good luck here.

If all goes well, you should see the Etherboot image start, acquire an IP address with DHCP, fetch the kernel image from the TFTP server, boot it, mount the root file system over NFS, and away you go. (Assuming of course, you had a root filesystem for it to mount. But that’s a topic for another article.)

Entries (RSS)