Archive for the ‘Apple’ Category

64-bit Python on Macs

Saturday, June 7th, 2008

There was a question recently on the ROOT mailing list where someone was having a problem using the python executable that comes with Mac OS X 10.5 and 64-bit libraries.  I went digging around, and noticed a strange discrepancy.  The compiled python libraries that ship with Leopard are four architecture universal binaries:

stan@Rover:/usr/lib/python2.5/config$ file libpython2.5.a 
libpython2.5.a: Mach-O universal binary with 4 architectures
libpython2.5.a (for architecture ppc7400):
    Mach-O dynamically linked shared library ppc
libpython2.5.a (for architecture ppc64):
    Mach-O 64-bit dynamically linked shared library ppc64
libpython2.5.a (for architecture i386):
    Mach-O dynamically linked shared library i386
libpython2.5.a (for architecture x86_64):
    Mach-O 64-bit dynamically linked shared library x86_64

(Reformatted to avoid spilling into my sidebar…)

However, the python executable is not compiled for 64-bit architectures:

stan@Rover:/usr/bin$ file python2.5
python2.5: Mach-O universal binary with 2 architectures
python2.5 (for architecture ppc7400):
    Mach-O executable ppc
python2.5 (for architecture i386):
    Mach-O executable i386

I hadn’t noticed this since my MacBook is the early Core Duo model, rather than Core 2 Duo, so the hardware does not support x86_64.  Apple may have good reasons to force all python scripts to run as 32-bit applications even on 64-bit systems, but I don’t know what they are.

If you find yourself wanting 64-bit python, it’s very easy to make your own, since all the Python libraries on Leopard are already 32/64-bit universal.  Just go grab the very short python64.c from the ROOT svn repository, and compile it like this:

gcc -arch ppc64 -arch x86_64 -arch i386 -arch ppc  \
    -o python64 -I/usr/include/Python2.5 -lpython2.5 python64.c

(Note that this has nothing to do with the ROOT libraries. If you have no idea what ROOT is, the above will still work.)

Now you can check the python64 executable:

Rover:tmp stan$ file python64
python64: Mach-O universal binary with 4 architectures
python64 (for architecture ppc64):
    Mach-O 64-bit executable ppc64
python64 (for architecture x86_64):
    Mach-O 64-bit executable x86_64
python64 (for architecture i386):
    Mach-O executable i386
python64 (for architecture ppc7400):
    Mach-O executable ppc

All four architectures are now present. I haven’t got a 64-bit Mac to try this out on, so I don’t know if it actually runs correctly there. Being universal, this binary works just fine on my 32-bit Mac, of course.

OS X Leopard Roundup

Tuesday, November 6th, 2007

After spending 4 days with the new 10.5 release of Mac OS X, I’ve been pretty impressed. Visually, things have improved, except for the obvious problems with the dock and the menu bar. I had the same initial negative reaction to the translucent 3D dock that most other people did, but it has grown on me slowly. The translucent menu bar, however, is simply atrocious if you do not have a very color-neutral (black, white or grey) background. The most common way to correct this is to add a white stripe to the top of your background image. OpaqueMenuBar does this for you automatically whenever your background changes.

Looks aside, I think Leopard’s biggest advance is the amount of attention Apple has shown to developers. (Clearly, OS X is stealing the love from the iPhone…)

X11.app 2.0 is generally a huge improvement. Now based on the X.org 7.2 code base, it draws on the XDarwin code base for X/OS X integration, rather than the source of the old Panther/Tiger X11.app, which was based on XFree86. As a result, there have been a number of regressions, but it sounds like the future of X11.app will be much better. In particular, the source is now being hosted in the X.org git repository, and the main developer is committed to engaging with the user community.

There are a number of glitches, though. Using a fullscreen X desktop (which I suspect is not terribly common) is broken, as is dragging an X window to another display if you have two monitors on your computer. Most annoying, the patch to fix the yellow cursor bug was dropped on the floor, and didn’t make it into X11.app 2.0. The author has since fixed this in an alpha release on the XDarwin wiki page. The Xquartz binary he posts there works great for me, so I’m happy for now.

The launchd program, which is like init/rc/cron/at/inetd all rolled together, is used to pull off two neat tricks in Leopard. First, the $DISPLAY variable is set to a socket that launchd monitors, so the X server now starts automatically on demand. This means you can start up Terminal, do some work, and as soon as you start an X application, you’ll see X11.app appear. The second trick is now an ssh-agent is started on demand when you use SSH. Apple’s ssh-agent can fetch passphrases for your keys from the OS X Keychain as well. You don’t need to use SSHKeychain any more (which is good since it had a major memory leak on my system). The only downside to the ssh-agent Keychain support is there is no obvious way to expire the ssh keys in the agent when the Keychain locks. Once those keys are decrypted into the ssh-agent memory, they stay valid even after you lock your Keychain.

Python has been updated to 2.5.1, which is great because it solves a linker problem I had with the Python bindings for ROOT. The Leopard install of Python includes easy_install, numpy, twisted, and some other handy stuff. In addition, there are new Objective-C/Cocoa bindings, and it comes along with py2app, for generating proper-looking Mac applications entirely written in Python!

The Cocoa programmers are probably excited about Objective-C 2.0, which adds garbage collection and some other improvements, like a compact syntax for looping over an iterator. I’ve been reading up on Objective-C, and the message passing style of object-orientation reminds me greatly of Python’s duck-typing. I find the syntax unspeakably ugly looking, but that’s really just a matter of taste. You can get used to anything, really. :)

How Much Is Your Data Worth?

Saturday, October 27th, 2007

Backups are one of those things you only take seriously after you experience serious data loss and realize the cost, monetary and otherwise, of losing files. In my case, I started thinking very hard about backups last year when a new hard drive in my MacBook died after a few weeks. I even had a backup, but it was incomplete and a month out of date. It was then I realized that my backup approach was haphazard, and not indicative of how important my data was to me.

Now I’ve decided that a good way to approach the problem is to imagine that one morning, you boot your computer and all of your files are gone. How much would you pay to get your data back? $100? $500? $1000? Without a backup strategy, you might find that when a thief or a malfunctioning disk head separates you from your data, no amount of money will bring it back. Only with some amount of planning can you have any control over how much data recovery will cost.

The types of failure modes you may have to deal with include:

  • Catastrophic hardware or software failure: Your hard drive, computer, or software suddenly and without (much) warning destroys some or all of your data. Moreover, it is obvious when this failure happens, so you can take immediate action. This is probably the most common failure, and is the first thing to be addressed by a backup system.
  • Theft: Someone steals your laptop, or breaks into your house and steals your computer.
  • Physical Disaster: Fire, flood, dropping your laptop on the pavement, backing over it with a car, etc.
  • Silent corruption: Malfunctioning software or hardware might corrupt data too slowly for you to notice immediately.
  • User Failure: This is when you accidentally delete a directory, overwrite an important document, or otherwise make some kind of localized, preventable mistake.

Combating these problems require balancing a number of backup tradeoffs:

  • Frequency: How often you backup determines the amount of recent data you will lose when a disk fails.
  • History: The number of backup revisions you save determines how quickly you need to discover the data loss. Disk failure and natural disasters are immediately obvious, but silent corruption, and even user failure, might take a while to identify.
  • Distance: The further away your backup media is from your computer, the less correlated backup failure is with the data loss event. Hard drive failure is very localized, but a thief will steal your entire laptop bag, including the backup drive in the side pocket. Fire can potentially destroy all devices (backup and computer) in your home, or even a larger area.
  • Convenience: You are more likely to backup if it is fast and easy to do. You also want to be able to restore your files quickly and get back to work.

It is interesting to note that there is interplay between these factors. High frequency backups need to be paired with deep history, or you will not be able to recover from silent corruption and some kinds of user failure. Distance and convenience are usually inversely related. Online backups put the storage media very far away, but can be less convenient to restore due to limited network bandwidth.

After balancing these factors, I have some suggestions for people with Mac laptops or desktops. You should consider these stages, stopping whenever you hit the value limit of your data. That is, stage 1 is the most important, then stage 2, and finally stage 3.

Stage 1: Bootable External Hard Disk (~$150)

Buy an external 3.5″ Firewire hard disk that is at least as big as the hard disk inside your computer. For most people, this should cost no more than $100-$150. I suggest Firewire since all Macs in the last 5 years can boot from an external Firewire disk. Intel Macs can now boot from USB 2.0 disks as well, but Firewire in my experience still performs better. Don’t skimp on the size either! Disks are cheap these days, so there is no excuse for not backing up your entire computer.

Purchase SuperDuper! for $28, or download Carbon Copy Cloner. CCC is free, but I haven’t tried out version 3.0, so I can’t comment on whether it has solved the usability problems with 2.0. I know SuperDuper! works, so that’s why I still recommend it.

Now use SuperDuper! (or CCC) to make bootable, full disk backups of your computer. Both programs have backup modes which quickly refresh the backup by only copying changed files. After your first backup, later backups will probably only require 20-30 minutes to complete. Most importantly, if your disk fails, you can boot your backup and keep working while you replace the hardware. If the whole computer is shot, you can boot your backup disk on another Mac and still keep working. This is also a great thing to have when you perform major software upgrades.

A bootable, full disk backup is easy to do, and covers probably 80% of possible problems. You should keep the disk close to your computer desk, but only plug it into the computer during the backup. This will isolate the backup media from transient software problems, or other bugs, that might affect disks connected to the computer.

Stage 2: Online Backup ($60 + friends, hopefully free)

The two major problems with the bootable disk backup is a lack of history, and a lack of distance. Without history, you can only recover files damaged since your last backup. That is sufficient in the case of sudden disk failure, but not so good when you realize you corrupted your photo database a week ago. And, if you keep your backup disk nearby for quick and easy backups, disaster may strike both your computer and backup disk at the same time.

To mitigate both of these risks, I’ve concluded that online backups provide a sensible tradeoff. In particular, CrashPlan has impressed me with an attractive, simple, cross-platform program that does almost exactly what I want in a backup utility. Unlike some other online backup utilities, CrashPlan lets you save your backup data (in compressed and encrypted form) on their servers and/or your friend’s computers. They don’t even have to pay for the program if they just store backups for others. You only buy licenses for computers that you actively backup. Note that only the $60 version of the program supports any kind of version history, which I consider essential in this case.

You should check out the feature details. Perhaps the smartest feature is the emphasis on diverse backup destinations. If you save your data on several friends computers, you don’t have to worry so much if one of them happens to be offline when you need a backup. Additionally, when restoring, the software can stream your data from several sources at once, so if you have lots of friends, your restoration will go faster. Of course, if you want at least one stable, always available backup destination, you can store data on the CrashPlan server for $0.10/GB/month, with a $5 minimum.

So in stage 2, the recovery strategy is: backup your entire computer to the bootable external disk, and continuously backup your irreplaceable files (documents, photos, etc) with CrashPlan to your friend’s computers. Then, if your hard disk dies, you first go to backup disk, and supplement with the more recent files saved online. If your external backup disk is stolen/destroyed/lost, then at least you can recover your irreplaceable files, even if it means you are having to download them for a week.

(Aside: I haven’t yet decided how to fit Time Machine in 10.5 into this strategy. Time Machine provides revision history, but requires an external drive plugged into your computer. That doesn’t provide any backup distance, and it isn’t clear how this will work with a laptop, where I don’t want to have any disks plugged in most of the time.)

Stage 3: Offsite External Backup Disk ($100)

This extension is pretty simple: Buy a second external disk, and do a full, bootable backup to it once a month. Store the disk somewhere away from your computer and home, like at school or work. Then, if your main backup disk is destroyed or stolen, you can still retrieve the offsite backup disk, and then supplement it with the last month’s worth of files from the online backup.

Conclusion

After reaching stage 3, I decided my paranoia had been satisfied. There is a clear recovery plan for all likely failure scenarios, and the cost is very reasonable. Nominally it only requires $300 for this kind of peace of mind, but it can even be cheaper if you have some spare disks laying around (as I did) that you can put into external Firewire enclosures. Considering how much of my work (and leisure) involves my laptop, I consider $300 a pretty reasonable price for my data.

Mysterious Disappearing Attachments in Mail.app

Sunday, June 10th, 2007

For more than a year now, I’ve been annoyed by a persistent, and seemingly random bug in Mail.app. Sometimes people would send me attachments which would not appear at the bottom of the email in Mail.app. Switching to the “Raw Source” view showed that MIME part containing the attachment was still there, with all the base64 encoded data. In fact, dumping the email source to disk and manually decoding the base64 worked just fine, so it was not as if the data was corrupted somehow. Webmail interfaces to the same IMAP mailbox also would show the attachment. My workaround since this bug appeared has been to watch the Size column in my Inbox view and go to webmail if I see a large email with no visible attachment.

Finally a few weeks ago, I identified the cause of the bug after it really started to annoy me when working with PGP signatures included as MIME attachments (rather than “ASCII armor”, which is incredibly ugly). The bug appears to be caused by a truncated newline symbol in the email as returned by Mailsnare, my IMAP provider. The IMAP standard specifies that the newline in emails as they are sent over the network should be (in hex) 0×0D 0×0A. This often known as the “MSDOS” newline symbol used by DOS and Windows operating systems. UNIX systems use just 0×0A, and older Mac systems use 0×0D, although with OS X it seems that 0×0A is more preferred now.

Something in the Mailsnare email processing pipeline truncates the final 0×0A in every email, so there is only a 0×0D at the end. Every previous newline is properly formed, though. When Apple Mail receives such an email, it seems that the MIME parser sometimes cannot cope with the damaged newline, which would come right after the final MIME boundary. This confuses the parser enough that it fails to display the MIME part right before the error, which is the file attachment.

I have checked this hypothesis by copying an email with the damaged newline to a local mail folder, quitting Mail.app, and manually editing the mail file on disk with a hex editor. If I repair the damaged newline and reload Mail.app, it magically sees the attachment that was invisible before.

The missing attachment problem appeared intermittently, in part, due to variations in how email clients terminate the email body. Some clients only put one newline at the end, but others put two newlines. The second-to-last newline is not damaged, and properly ends the MIME boundary, while the last newline is damaged, but no longer vital to correct parsing. However, some emails with only one damaged newline at the end still parse correctly, for reasons I do not understand.

The bug has been reported to Mailsnare tech support, but I have not been made aware of any progress on the matter.

Appendix

An easy way to test for this problem on an IMAP server (assuming your server uses SSL and you have a copy of python compiled with SSL support) is to start python and type:

>>> import getpass, imaplib
>>> M = imaplib.IMAP4_SSL('your_imap.server.net')
>>> M.login('your_username', getpass.getpass())
Password: [type your password here, nothing is echoed]
>>> code,data = M.fetch(1,’(RFC822)’)
>>> print data

Then look at the last part of the email body and see if it ends in \r\n or just \r. If it is just \r, then the IMAP server (or something else in the email processing chain) has chopped the \n character.

Why ZFS Matters to Mac Users

Wednesday, December 20th, 2006

This morning I read a summary at Think Secret of the leaked news that ZFS will be a supported filesystem in OS X 10.5:

New to build 9A321 is support for Sun’s ZFS file system, a 128-bit open source file system introduced with Solaris 10 that offers support for vastly larger drives and arrays than 64-bit file systems. ZFS also delivers additional options for administrators.

This description totally misses the point of ZFS, focusing on a number that means less than megahertz for CPUs. I imagine that most Mac users who haven’t been following ZFS development incorrectly assume that ZFS probably doesn’t matter to them unless they are running some sort of Xserve server farm. Nothing could be further from the truth.

Why ZFS Does Not Matter

Let me begin with why you should not care about ZFS. ZFS is often described as a “128 bit filesystem.” This is mostly true, but in day-to-day use, completely irrelevant. The upper bound for most personal and small business filesystems is on the terabyte scale, which current filesystems can contain with their puny 64-bit block pointers. In 5-10 years, we might care about petabytes or exabytes, so the ZFS developers were smart to future-proof the filesystem format and avoid the growing pains that systems like ext2/3 have had to endure. The ZFS team lead, Jeff Bonwick, famously noted that populating a storage pool with 2128 blocks of information would require as much energy as would be needed to boil the world’s oceans.

With that bit of vivid imagery, Bonwick is basically saying that the capacity problem is solved for the long term. Overcoming capacity limits is really not all that interesting though, since the solution is obvious: use bigger numbers. The real genius of ZFS is in all of its other design decisions.

Why ZFS Matters to Laptop/Desktop Users

People with iBooks, MacBooks, Powerbooks, Mac Minis, and iMacs all have generally the same storage setup: a single hard disk with capacity ranging from 40-500 GB. A lot of the magic of ZFS does not become manifest until you have several disks, but even with one, you can benefit in several ways:

Filesystems can be compressed. Unlike a compressed disk image, a compressed ZFS filesystem is read/write. Moreover, the compression flag can be turned on and off on the fly. New data will be compressed (or not) as per the flag, and old data will be left as is. Compressed filesystems are great for data that you don’t access very often, or data that compresses very well.

Filesystems are nested and making them is as easy as making a directory. This in itself is not very interesting for laptop/desktop users, but combined with compression, this means that you can effectively turn on compression for just a subfolder on your drive.

Every block of data on the disk is checksummed so errors can be detected during read operations. Many common hard drive failures are catastrophic, and painfully obvious when they happen. But it is possible for your data to be corrupted on disk in ways that you, and the hard disk, will never notice. While checksumming will not allow you to recover your data, it will let you know when you should go retrieve a file from your backup. (You are backing up, right? Go buy an external Firewire disk and SuperDuper!, and start doing it right now. It is easy, fast, and you’ll thank me later.)

Space-efficient and fast snapshots. A snapshot allows you to see your filesystem as it was some time in the past. ZFS is designed to snapshot a filesystem in constant time, no matter how much data you have, or how frequently you snapshot it. Moreover, the snapshot is very space efficient. Identical blocks are shared between snapshots and the live filesystem until they are written to. The space required for snapshots is therefore mostly a function of how quickly your files change, and not so much how often you make a snapshot. It’s like version control for your entire computer!

Apple’s much discussed Time Machine feature in OS X 10.5 is a great example of the interface possibilies when you have snapshots available. However, Time Machine does not appear to require ZFS, which means that Apple had to bolt snapshots onto HFS+, a complex and awkward task. Snapshots in ZFS are cheap and easy.

Why ZFS Matters to Workstation Users

With that list of features, ZFS already beats most other filesystems out there. But with a workstation like the Mac Pro, you can have up to 4 internal drives (8 if you get creative) and start to explore the multi-drive capabilities of ZFS. Traditionally, there has been a hard separation between the volume manager and the filesystem layer. The volume manager takes your many disks, and makes them look like one disk (with mirroring or striping or whatever) to the filesystem layer. The separation of duties ensures that the volume manager knows nothing about files, and the filesystem knows nothing about disks. ZFS, on the other hand, breaks down the barriers between filesystems and volume managers with some amazing results:

Automatically growing filesystems. Once you add your disks to the storage pool, all of their space is available to all of the filesystems you have. You can reserve space for a filesystem, to guarantee a minimum amount is available when you need it, and you can also set quotas. But these are just flags which are easy to change on the fly. The default for every filesystem is automatically expanding capacity up to the limit of your storage pool. There are no manual volume or filesystem resizing operations, ever.

Dynamic striping of file blocks over all drives in the storage pool. If you throw 2 drives in your storage pool, then files are automatically distributed over both disks, making large reads and writes faster. The disks do not have to be the same size (unlike usual striping configurations) and you can expand the pool whenever you want by installing a new disk. New files will stripe over old and new disks, and the old files will stay where they are. But, when you modify old files, the changed blocks are spread over all the available disks again. After adding a new disk, ZFS will get faster as you use the filesystem!

Software mirroring with automatic error detection and self-healing. ZFS also incorporates features traditionally left to software RAID drivers. You can arrange your disks into mirrored pairs (or triples, etc), which speeds up data reads, and also protects against single disk failure. Moreover, since ZFS checksums all data blocks, if one disk returns bad data, ZFS knows without having to query the other disk every time. Having identified the problem, it can then access the failed block from the other disk(s) in the mirror set and return to you correct data. ZFS then writes the correct data back to the original disk which failed the checksum. If the data error was a fluke due to some correctable problem, perhaps a bad sector (which modern drives can reassign to a new physical location) or just a bad write, then this will solve the problem. If the disk is really dead, then ZFS will take it offline and wait for you to replace it.

Fast resync of mirrors. In the unfortunate circumstance where a drive does die and you replace it, the resync process is faster with ZFS. This is because, unlike many other RAID systems, ZFS knows which blocks on the were used, and which blocks were not used. During resynchronization, ZFS only copies blocks with actual filesystem data on them to the new disk. So, if your disk pair was only half-full, then you are back in business twice as fast.

Software pairity RAID that actually works. The most popular pairity RAID system is by far RAID-5, where for every N-1 data blocks, there is one parity block. The parity block allows you to recover all your data if any one disk fails, much like mirroring, but without as much space penalty. There is a seldom discussed problem with RAID-5, known as the “RAID-5 write hole.” When modifying a single block, you have to rewrite all N blocks (including the parity block). If a power or hardware failure happens in the middle of rewriting these N blocks, then you effectively lose all N blocks of data, with no way to recover them. (Update: As pointed out in the comments, I have incorrectly stated how writes happen in RAID-5. Only the changed block and the parity block need to be updated, rather than all N blocks. Nevertheless, there is still a write hole if a hardware failure happens between the two writes.) You can fix this in hardware with battery backup systems, or RAID controllers with non-volatile write caches. The structure of ZFS is such that you can also solve the problem in software using a variant of the RAID 5 algorithm called RAID-Z. RAID-Z behaves much like RAID-5, but has no write hole. Recent ZFS releases have also added a double parity version of RAID-Z, which allows you to withstand 2 disk failures at once.

Why ZFS Matters to Server Admins

By now, I’ve hit on nearly all of the neat features of ZFS, but there are a few left that might be of interest to people with Xserve/Xsan clusters:

Easy command line interface. I have no idea how Apple will choose to present ZFS to users, but regardless, they have to include the fantastic zpool and zfs commands. These two commands make it very easy to manage lots of disks and filesystems.

A stream format which allows you to copy snapshots to other systems. This feature is a little hard to explain, but it basically allows you to dump a ZFS filesystem, preserving the snapshot history, and reload it on another system. This could be used for maintaining a backup server, or loading a filesystem into another storage pool.

Highly SMP-friendly design. ZFS is designed to efficiently support many, many processes all accessing a filesystem at the same time.

Nearly unlimited capacity and scalability. We come full circle back to the capacity issue. For servers which need to manage a large number of disks, ZFS scales pretty well up from the single-disk scenario we started with. Sun certainly pushes ZFS on their 48 disk monster, the Sun Fire X4500.

Waiting for Leopard

Hopefully, I’ve got you excited about ZFS coming to Mac OS X. So far, all we’ve seen is a leaked screenshot showing ZFS in the disk image creator. It’s not clear yet how much Apple wants to promote ZFS, via GUI interface tools, or integration with Time Machine, or just marketing. We’ll certainly learn more at Macworld 2007. Until then, take a look at this presentation on ZFS to learn more about it.

Entries (RSS)