Archive for May, 2007

Disabling Spotlight on External Drives

Monday, May 28th, 2007

I’ve become a huge fan of Google Desktop for Mac for desktop search. The search results are much better ordered, and presented with enough context to be extremely useful. It really is what I expected from Spotlight when it was first announced.

An especially nice aspect of Google Desktop is its integration with Spotlight preferences. Rather than make you specify which directories to ignore in their own GUI, they just use whatever you have set for Spotlight privacy settings. Similarly, Google uses the Spotlight import plugins (via calls to mdimport) rather than reinvent the wheel.

One problem with the Spotlight privacy settings panel is that if you add an external disk, and then later disconnect it, it forgets your settings the next time you reconnect. This is a big annoyance with my external Firewire backup disk which I deliberately keep disconnected from my computer, except when backing up. If I start a backup task and then go to sleep, when I get up in the morning, I find that Spotlight (and Google Desktop) have indexed both my main disk and the backup. Search results get cluttered up with duplicates that I don’t want to see.

Forcing a disk to be permanently unindexed seems to require either a shell prompt, or a $9.95 shareware tool (which probably just runs the command for you). The magic command line utility is mdutil and the two commands are:

sudo mdutil -i off /Volumes/ExternalRover/
sudo mdutil -E /Volumes/ExternalRover/

The first command disables indexing on my external drive (called “ExternalRover”), and the second command deletes the Spotlight index associated with the disk. Note that you can use the second command (-E) on / to force your Spotlight index to be regenerated if Spotlight seems to be acting strangely. Unfortunately, forcing Google Desktop to reindex seems to require uninstalling and reinstalling it.

(Tip thanks to this discussion over at macosxhints.com)

The Great Hash Table in the Sky

Tuesday, May 1st, 2007

Over the weekend, I rediscovered Amazon’s Simple Storage Solution (S3). Like a lot of good ideas, I didn’t pay much attention to this service when it was first announced because I didn’t know how to classify it in my head. When I stumbled on it again, the model was obvious: S3 is a giant hash table connected to the Internet.

To use Python dictionaries as an analogy, this is the essence of S3:

# A bucket is the Amazon term for a collection of keys
bucket = { }

# Upload data to the bucket and associate it with a string key
bucket['my_key'] = ’some data’

# Retreive data by key
my_data = bucket['my_key']

That’s basically it. However, instead of using Python, you communicate using HTTP GET and PUT requests. Of course, that is easy to wrap in a Python (or Ruby or Perl) interface, but fundamentally this is a hash table that speaks Web.

Everyone already has a program to retrieve objects from S3: your web browser. That makes S3 ideal for storing static content associated with a website, as Adrian Holovaty did for his chicagocrime.org website. Just return a URL to the S3 object, and all of your users will download the file from Amazon. If it is a large file, you can even instantly create a torrent by appending ?torrent to the URL. Amazon will play the role of the BitTorrent tracker and seeder.

Which brings me to the pressing question: How does Amazon make money? They charge for the service, but unlike most web hosting providers, there is no flat rate or setup fee. You pay only for your data usage with no base price. Originally, this was very simple: 15 cents per GB stored per month, and 20 cents per GB transfered (upload or download) per month. So if you uploaded a 100 MB file, and served 10000 copies of it in a month, that would cost you: 0.1 GB x $0.20 upload + 0.1 GB x $0.15 storage for a month + 0.1 GB x 10000 x $0.20 download = $200.04. Moreover, if your website suddenly becomes very popular, you can scale up to handle the traffic with no effort and no capital investment. (It will cost you at the end of the month, but presumably you’re making money with all this traffic.)

As of this morning, Amazon announced a new pricing structure which is a little more complicated. Storage stays the same, upload costs were cut in half, and download costs were broken into tiers with the per GB cost dropping as your bandwidth went up. In addition, Amazon now has a per request charge of $0.01 per 1,000 PUTs and $0.01 per 10,000 GETs. As of June 1, our example of 100 MB served 10000 times in a month would cost: 0.1 GB x $0.10 upload + 0.1 GB x $0.15 storage + 0.1 GB x 10,000 x $0.18 download = 180.03. A slight drop in price! On the other hand, chicagocrime.org’s bill is likely to go up due to the per request charge, since he is only serving 925 kB per visitor, but dealing with lots and lots of visitors.

I don’t have a use for this service yet, but I will definitely keep it in mind for future projects.

Entries (RSS)