Archive for April, 2006

Getting rid of magic

Thursday, April 20th, 2006

The Djangoistas have declared the magic-removal branch open for beta testing. This is very exciting as the model magic in Django was a bit of a wart. I can understand why it was initially implemented, but it frequently suprised people by making modules containing your models behave in unexpected ways. (Just try importing something in your model or splitting your models into separate files, and you see what I mean.)

The physics experiment for which I created my Document Management System in Django some months back was (effectively) cancelled last week, before I even got to deploy the system to the rest of the group. That sucks greatly, but the upside is now I have no reason not to dig in and make some larger changes to the DMS. Top of the list will be porting to the magic-removal branch, followed by fixing a race condition using the new database transaction support, and also replacing my crappy home-grown indexer with a real one, like Hype. Hype was pretty impressive in my brief testing, and should be MUCH faster than my indexer/search algorithm.

Then I can face the scary prospect of how to make this code easy to deploy. This is one thing that I don’t understand how to do very well for Django apps. Right now, there’s too much that needs to be changed for each new installation. Once I can fix that, then I will be ready to post the DMS on the Internet and hope that it turns out to be useful for someone.

Creepy phone number: (866) 383-0986

Wednesday, April 12th, 2006

This afternoon I got a call from (866) 383-0986, which I didn’t answer because I didn’t recognize the number or the area code, and I was in deep-hack mode while tracking down a floating point bug. When I got around to checking my phone, I tried looking up the number in a reverse directory to see who it was. No hits, so for the hell of it, I typed it into Google. Wow! This number has quite a history, it seems.

Just read the comments on the top ranked blog post, and you find that this number seems to be correlated with scams targeted at new domain owners. I did recently purchase a domain name (you can now also visit this blog at http://quantize.org/blog/), so they must have found my number in the WHOIS directory.

Perhaps this is why godaddy pushes their “registration privacy” feature for the extra $2…

SATA tower, Round 2: Fix it with a sledgehammer!

Monday, April 10th, 2006

Much like my fun with Firewire, I was fooled into thinking I had solved my disk lockup problem. A random fluctuation in the time-to-failure convinced me that swaping around drives (or maybe exercising the contacts more) had fixed a flakey connection. In fact, it did not, and several hours later, my disks were freezing with a vengence, sometimes after only 20 minutes of heavy I/O. The symptom was a disk access light stuck on, and a lot of messages like this in the kernel log:

Apr  9 18:04:25 lurch ata5: command 0x25 timeout, stat 0xd1 host_stat 0x61
Apr  9 18:04:25 lurch ata5: translated ATA stat/err 0xd1/00 to SCSI SK/ASC/ASCQ 0xb/47/00
Apr  9 18:04:25 lurch ata5: status=0xd1 { Busy }
Apr  9 18:04:25 lurch sd 6:0:0:0: SCSI error: return code = 0x8000002
Apr  9 18:04:25 lurch sdd: Current: sense key=0xb
Apr  9 18:04:25 lurch ASC=0x47 ASCQ=0x0
Apr  9 18:04:25 lurch end_request: I/O error, dev sdd, sector 63890855
Apr  9 18:04:25 lurch ATA: abnormal status 0xD1 on port 0xFFFFC2000033A487
Apr  9 18:04:25 lurch ATA: abnormal status 0xD1 on port 0xFFFFC2000033A487
Apr  9 18:04:25 lurch ATA: abnormal status 0xD1 on port 0xFFFFC2000033A487

The trigger seemed to an actual read/write error, which then degenerated into an endless sequency of “Busy” errors as the system tried over and over to reattempt the I/O operation. The drive simply would not talk to the controller after the initial problem. It sounded very similar to my Firewire problems where a bridgeboard would just fall off the bus under heavy load.

The new piece of data (and different from the firewire case) was that the bug could be cleared by resetting the computer. I did not have to power cycle the drive (which is in a separate tower and has an independent power supply). This was very curious, and pointed to a problem with my SATA host controller, or the kernel error handling procedure, rather than just a crappy IDE to SATA converter board. If I could only reset the SATA bus when this error happened, the problem would go away…

I spent some quality time with Google, the linux-ide archives, and the kernel changelogs. I learned that in general, the libata code has extremely simple error handling, which is easy to understand and also ineffective at dealing with anything but the simplest of problems. This means it won’t do anything crazy, but it also won’t reset the bus when things are very broken. I also discovered that the SiI 3114 (sata_sil module) has a bit of a propensity for getting itself into bad states, which the error handling can’t recover from.

This is when I discovered the “sledgehammer,” as Jeff Garzik called it in the changelog. There is a known erratum for the SiI 3112/3114 which causes them to interact badly with certain Seagate drives. The fix is to detect when such a drive is connected, and then set a flag (”mod15write”) to clamp all ATA commands to no more than 15 sectors at a time. After the fix was put in, people started trying it out on other drives that were having problems, and then when the mod15write flag fixed their problems, they assumed their drive also was afflicted by the SiI 3114 erratum. In fact, they had some other issue, and the mod15write flag just covered it up.

Rather than clutter up the mod15write detection table with bogus model numbers, Jeff Garzik added an explicit parameter, “slow_down”, which when set to 1, will enable mod15write on all drives connected to the controller. I don’t have one of the problematic Seagate drives, but I do have similar symptoms. So, I enabled the bug fix, and it did reduce the failure rate quite a bit. The performance hit was huge, though. Reads are now 25 MB/sec, vs. the 50+ MB/sec I was able to get before. (Note this parameter is only available as of kernel 2.6.16.)

And, the fix was not perfect. It was still possible, though much harder, to lock a drive when doing copies between two disks on the SiI 3114 controller. However, I made another another discovery, a patch which fixes the port enumeration order for my much nicer Promise SATAII150 TX4 card (that runs the main internal server HD). This is a moderately annoying bug which means that the BIOS and Linux enumerate the 4 ports on the card in different orders, so the first drive for the BIOS (which gets booted) is actually /dev/sdc once the kernel loads. The patch makes it much less annoying to put multiple drives on the Promise TX4 card, so I moved the cables for 2 of the external drives over to the TX4, lightening the load on the SiI 3114 card. Now I’ve gone almost 12 hours with no I/O jams, and will continue to test things.

For now, things are stable. I think the long-term solution here is to get another TX4 card and not use the Silicon Image 3114 card anymore. The card was $20, so I guess I should not be too surprised. Silicon Image, however, has been very good about getting documentation to Jeff Garzik on their hardware, so perhaps support will improve in the future. I also saw that patches for more sophisticated error handling (needed to support NCQ among other things) are also in the pipeline. It wasn’t clear if they would squeak through the deadline for 2.6.17, but certainly by 2.6.18 libata might be better at dealing with non-fatal errors.

SATA tower, Round 1

Saturday, April 8th, 2006

I took the plunge and converted half of my FW/IDE disk tower to SATA yesterday. The final parts list was:

The system is up and running, and getting some exercise now before I declare it a success. So far, the problems have been:

  • Hotplug unfriendly hardware. The RC-204 adapters explain in rather cryptic English that they do not support hotplug, and in fact they recommend you do not use them with removeable IDE drive trays. The reason they give is that it is bad to power on the RC-204 (you have to plug a floppy power connector into the board) with no IDE drive attached. Maybe the signal lines need termination or something. Hotplug never really worked consistently with Firewire, but now it is out of the question. A pure SATA system would not have this problem, of course.
  • SATA in the Linux kernel does not support SATA device hotplug. I checked it by yanking the multilane cable, and saw no messages in the kernel. I didn’t try to use the drives after reconnecting without a reboot, so I don’t know if momentary interruption is allowed. As of the latest software status update from Jeff Garzik, a hotplug patch has been posted, so hopefully we’ll see that soon.
  • Mysterious timeouts initially - During my first round of testing, I saw one of the four drives either run very slowly (1/4 normal speed) or just stall occassionally. At one point, the drive just stopped responding. It was very reminiscent of my Firewire problems, and got me worried. I almost thought either one of the RC-204 converters or even one of the IDE drives was malfunctioning. A day later, the problem seems to have mostly gone away after moving some things around. I wonder if there was a poor connection somewhere. Again, pure SATA would be nice so that I wouldn’t be fussing over huge 40-pin connectors.
  • When hitting multiple drives, the throughput per drive goes down, despite each having an independent connection. This is because I’m using a cheap $20 SATA card, of course, but just thought I would mention it.

But there have been lots of good things:

  • The SiI3114 chipset is supported in the Linux kernel, so no need to download extra drivers.
  • So far as I can tell, the RC-204 IDE-to-SATA adapters work and are not a huge performance hit. I was able to get 56 MB/sec read and 47 MB/sec write to a Seagate 7200.8 Barracuda IDE drive.
  • SATA is hella-faster than Firewire 800. I was lucky to hit 25 MB/sec over the Firewire 800 bus, so getting 56 MB/sec is amazing.
  • RAID (either with mdadm or LVM2 striping) appears to work again, and improve the performance a bit. Due to performance issues with the SATA card I bought, I only got 65 MB/sec read and 57 MB/sec write when I put the drives into a striped configuration. Not bad, though I decided not to keep it that way due to a desire not to push my luck. :)
  • Having all the drives on separate channels eliminates some of the possible problems you can have with drives not wanting to be on the same bus. Even in this era of “advanced” computing, I’ve had IDE drives that behaved strangely when they were on the same IDE cable.
  • Aside from some commissioning lockups mentioned above, so far it has been much more stable than the Firewire system. Right now I’m torture testing it with two programs copying files around at the same time. I do notice the tranfer rates going up and down between 10MB/sec and 30 MB/sec, which worries me a bit, but no kernel errors or stuck drives so far.
  • As of 2.6.15, you can query the SMART information using smartmontools and be warned about pending disk failures.

Overall, I think this has been a large improvement. I only wish this was all SATA instead of this hybride IDE/SATA setup, but it is a decent interim measure. Eventually, I will need to retire these drives, and then I can migrate to an SATA-only system.

Entries (RSS)