Archive for October, 2006

ZFS on Linux (not quite)

Monday, October 16th, 2006

The more I read about ZFS, the more I become convinced it is the future of filesystems. And not surprisingly, I very much want to try it out on my 750 GB of storage spread over 3 pairs of disks. Progress on zfs-fuse for Linux is not moving as fast as my zealotry, though. We will probably see a testable filesystem on the order of months and not weeks. After the initial annoucement, I haven’t seen any further information about ZFS on FreeBSD, either. And Apple, aside from reserving an identifier for ZFS in the Leopard preview, has said nothing about ZFS on OS X.

Thus, for the time being, the only way to get ZFS is to use the Solaris kernel. That basically means Solaris 10 6/06, the OpenSolaris preview release (called “Nevada”), or Nexenta, a fusion of OpenSolaris and Ubuntu. I tried out Nexenta via Parallels on my Mac, and was impressed by how good it looked (thanks to the Ubuntu) and how totally frustrating it is to administer for someone who doesn’t know Solaris. Don’t be fooled by the pretty GUI! This is still Solaris under the hood, so most of your Linux admin experience is no help. The dmesg command is mostly useless, ps is different, and you ain’t gonna find anything in /proc but processes. There was no way I was going to replace my nicely working Gentoo install on the server with Solaris just for ZFS, as great as it looked.

But these days, with enough RAM and disk space, you don’t need to choose between operating systems. You can use several at once!

Failure to reach Xen

My first thought when I decided to go hybrid on my server was to use Xen. The lightweight hypervisor imposes minimal performance penalty, and (as I found later) Xen is much better at giving guest operating systems (domU in their terminology) access to physical disk. Xen support in Linux is a no-brainer, and there are domU images of OpenSolaris available. Unfortunately, I was never able to boot the Xen hypervisor on my Opteron, most likely due to my dodgy motherboard which generates machine check exceptions (thus the inspiration for the blog name) if I boot Linux without the “nomce” option. So, until I care enough to replace my motherboard, Xen is out.

VMware Server

I’ve been a long time user of VMware Workstation, and been very happy with it. VMware Server is now free, so I figured I could use it to run some Solaris-based OS and give the virtual machine access to a few of my hard drives. Getting VMware Server going was pretty easy (though there was a little confusion as I needed to clear out VMware Workstation kernel modules first), and installing Nexenta on a small virtual disk wasn’t too bad.

But the next step was quite annoying. VMware gives you the option of using physical disks or virtual disks. Virtual disks are just large file(s) on the host filesystem, and are the normal way to give the guest access to disk. Physical disks are actual block devices, but VMware’s support for physical disks basically sucks. They have hard coded the software to only accept /dev/hd* and /dev/sd* devices (the * can include a slash, thankfully), and the devices must be IDE or SCSI block devices. You cannot use a logical volume from LVM2, for example, as a disk. I had planned to do just this, with one logical volume fitting on each physical disk, in order to make the device name independent of the /dev/sd[a-f] name. That device name depends on the order of your disks on the SATA controller, and is very likely to change if you modify your system at all. But VMware will not accept a logical volume, even if you symlink it to a /dev/sd* name, because it does not support certain ioctls. There is a LD_PRELOAD hack out there called vmware-bdwrapper which will fake out VMware, but I was never able to get it to work.

Having been defeated in my attempt to use LVM2, I caved in and used the disk directly. This pretty much worked (once I figured out Solaris disk numbering) until I stopped the virtual machine and started it again. Then VMware complained that the physical disk geometry had changed and I would have to readd the device. Of course, if when I readded the device and rebooted Nexenta, the old zpool on the disk was unreadable. I recreated it, and then when I quit and restarted, VMware gave me the same error again. So pretty much no matter what, my data was guaranteed to be scrambled if I stopped and started the virtual machine ever.

So finally, I did the dumbest setup possible: I formatted the disk with JFS (which has pretty good performance on very large files with low CPU overhead), and told VMware to make a 280 GB virtual disk on it as one ENORMOUS file. Yes, this is insane, but it worked. I’ve moved one of my 6 disks over to ZFS and am now populating it with a copy of my files. There is no way I’m going to put all my archived stuff on just ZFS with this crazy setup, but I will host my backup copy there. I want to at least get familiar with how to administer ZFS so someday when I get Xen working I will be comfortable with the tools.

The next question is how I access my data from the Linux side, which I’ll say more about in another post.

Fix Bad Cursor Colors in X11.app

Monday, October 9th, 2006

One thing that has been driving me nuts since the MacBook purchase has been that NX Client changes the X11 cursor colors to yellow when I remote login to our analysis server. It makes it very hard to find the cursor. The problem, it seems an endianness bug in Xquartz. That link also leads you to an article with a patched version of Xquartz which solves the problem for me. Now NX is that much better!

MacBook: Honeymoon is over

Monday, October 9th, 2006

Now that I’ve had my MacBook for 3 months, some of the warts are starting to show. (According to CoconutBattery, the computer is actually 4 months old. This makes sense since it was sold as refurbished.)

Heat-induced shutdowns

Within days of the HD failure, I started experiencing an instant-shutdown problem. This problem has been reported in various places on the Internet. The symptoms are that the MacBook appears perfectly normal, but after using it for a while, such that the interior has warmed up, the computer will instantly power off. There is no warning, just ‘BOOM’ like I ripped the battery out. If you try to start the computer again immediately, it won’t respond. However, if you wait about 10 or 15 minutes, the system will start normally again.

There are numerous theories for why this happens, and not surprisingly Apple hasn’t said anything. One guy claims, after disassembling his MacBook, that there is a wire connected to a temperature sensor on the CPU which feeds through a heatsink. The wire is pulled very tight, so the thermal expansion of the interior (especially the heatsink) is sufficient to put pressure on the wire and cause intermittent contact. This causes the computer to do an emergency power-off in order to save the CPU from what appears to be an overtemp situation. After 10 minutes, the computer has cooled (and shrunk) back down, and all behaves normally again.

This theory sounds plausible, but I’m surprised anyone would install a wire with no slack. So perhaps there is just a grain of truth in there, since the behavior definitely seems to be correlated with heat.

Within a few days, the shutdown problem became frequent enough that I could not pretend it was a fluke any longer, and I dropped my laptop off at the campus computer store for warranty repair. Within a week I had it back, and all is better now. No mention on the receipt is made about what the problem was or how it was fixed, so the real cause will remain a mystery.

Case discoloration

Color changes in the light grey plastic on the MacBook case have been widely reported, although I was skeptical at first. The Internet is not a statistical sample, especially when it comes to the hyperbolic discussion (both good and bad) of Apple products. Nevertheless, I’m now starting to see the first signs of discoloration on the surfaces where my palms rest. At this point there is a slight darkening toward the orange part of the spectrum, and it is generally hard to see unless you put your head at an angle. Water, light scrubbing, and one of those big white erasers have not had any impact, and I’m afraid to damage the plastic with something harsher.

Much to my surprise, I learned that Apple is also now dealing with this problem. They have acknowledged it is a manufacturing defect and will replace that part of the case. After a week without my laptop, I’m not interested in another week in the shop. I think I’ll wait until this gets worse before I finally send it in. It’s good to know it will be taken care of, though.

Entries (RSS)