Archive for the ‘General’ Category

Jonathan Coulton

Sunday, October 21st, 2007

The catchy credits song (which in a strange way could be considered a spoiler for the game ending) in Portal introduced me to Jonathan Coulton, a very talented musician. For lack of a better term, you could call his music “nerdfolk.” Think of They Might Be Giants, but with less abstract lyrics. Like TMBG, Coulton’s songs are about topics you might consider non-musical, but unlike TMBG, you can always figure out what the song is about. (Oh, and Coulton never uses an accordion, which, depending on your musical tastes, could be a plus or minus.)

He’s got a nice intro page here. Lots of good stuff there, but Code Monkey is my current favorite. Coulton portrays the most endearing programmer/primate ever, and wraps it up in a catchy tune. The Code Monkey doesn’t get the girl by the end of the song, but he still thinks he might, and that leaves you smiling. There’s heart and smirk in the song, unlike a lot of nerdfolk artist, who only remember the smirk.

Other good songs include Chiron Beta Prime (where I swear Coulton is channeling John Linnell), Re: Your Brains and Mandelbrot Set.

LaTeX on the Web

Monday, June 18th, 2007

Quick test of the jsMath javascript I just enabled:
\[
\hat{P_1}(\vec{x}) = \frac{1}{n}\sum_{i=1}^n \prod_{j=1}^d
\frac{1}{h_{j}(\vec{t_i})} K\left(\frac{x_j -
t_{ij}}{h_{j}(\vec{t_i})}\right)
\]

And some inline $$\nu_e$$ and $$\alpha^2$$ Greek fun.

Disabling Spotlight on External Drives

Monday, May 28th, 2007

I’ve become a huge fan of Google Desktop for Mac for desktop search. The search results are much better ordered, and presented with enough context to be extremely useful. It really is what I expected from Spotlight when it was first announced.

An especially nice aspect of Google Desktop is its integration with Spotlight preferences. Rather than make you specify which directories to ignore in their own GUI, they just use whatever you have set for Spotlight privacy settings. Similarly, Google uses the Spotlight import plugins (via calls to mdimport) rather than reinvent the wheel.

One problem with the Spotlight privacy settings panel is that if you add an external disk, and then later disconnect it, it forgets your settings the next time you reconnect. This is a big annoyance with my external Firewire backup disk which I deliberately keep disconnected from my computer, except when backing up. If I start a backup task and then go to sleep, when I get up in the morning, I find that Spotlight (and Google Desktop) have indexed both my main disk and the backup. Search results get cluttered up with duplicates that I don’t want to see.

Forcing a disk to be permanently unindexed seems to require either a shell prompt, or a $9.95 shareware tool (which probably just runs the command for you). The magic command line utility is mdutil and the two commands are:

sudo mdutil -i off /Volumes/ExternalRover/
sudo mdutil -E /Volumes/ExternalRover/

The first command disables indexing on my external drive (called “ExternalRover”), and the second command deletes the Spotlight index associated with the disk. Note that you can use the second command (-E) on / to force your Spotlight index to be regenerated if Spotlight seems to be acting strangely. Unfortunately, forcing Google Desktop to reindex seems to require uninstalling and reinstalling it.

(Tip thanks to this discussion over at macosxhints.com)

The Great Hash Table in the Sky

Tuesday, May 1st, 2007

Over the weekend, I rediscovered Amazon’s Simple Storage Solution (S3). Like a lot of good ideas, I didn’t pay much attention to this service when it was first announced because I didn’t know how to classify it in my head. When I stumbled on it again, the model was obvious: S3 is a giant hash table connected to the Internet.

To use Python dictionaries as an analogy, this is the essence of S3:

# A bucket is the Amazon term for a collection of keys
bucket = { }

# Upload data to the bucket and associate it with a string key
bucket['my_key'] = ’some data’

# Retreive data by key
my_data = bucket['my_key']

That’s basically it. However, instead of using Python, you communicate using HTTP GET and PUT requests. Of course, that is easy to wrap in a Python (or Ruby or Perl) interface, but fundamentally this is a hash table that speaks Web.

Everyone already has a program to retrieve objects from S3: your web browser. That makes S3 ideal for storing static content associated with a website, as Adrian Holovaty did for his chicagocrime.org website. Just return a URL to the S3 object, and all of your users will download the file from Amazon. If it is a large file, you can even instantly create a torrent by appending ?torrent to the URL. Amazon will play the role of the BitTorrent tracker and seeder.

Which brings me to the pressing question: How does Amazon make money? They charge for the service, but unlike most web hosting providers, there is no flat rate or setup fee. You pay only for your data usage with no base price. Originally, this was very simple: 15 cents per GB stored per month, and 20 cents per GB transfered (upload or download) per month. So if you uploaded a 100 MB file, and served 10000 copies of it in a month, that would cost you: 0.1 GB x $0.20 upload + 0.1 GB x $0.15 storage for a month + 0.1 GB x 10000 x $0.20 download = $200.04. Moreover, if your website suddenly becomes very popular, you can scale up to handle the traffic with no effort and no capital investment. (It will cost you at the end of the month, but presumably you’re making money with all this traffic.)

As of this morning, Amazon announced a new pricing structure which is a little more complicated. Storage stays the same, upload costs were cut in half, and download costs were broken into tiers with the per GB cost dropping as your bandwidth went up. In addition, Amazon now has a per request charge of $0.01 per 1,000 PUTs and $0.01 per 10,000 GETs. As of June 1, our example of 100 MB served 10000 times in a month would cost: 0.1 GB x $0.10 upload + 0.1 GB x $0.15 storage + 0.1 GB x 10,000 x $0.18 download = 180.03. A slight drop in price! On the other hand, chicagocrime.org’s bill is likely to go up due to the per request charge, since he is only serving 925 kB per visitor, but dealing with lots and lots of visitors.

I don’t have a use for this service yet, but I will definitely keep it in mind for future projects.

Economics of Dreamhost

Tuesday, November 7th, 2006

When I was researching Dreamhost, the information I was finding made it clear there is a very distinct hierarchy of costs for them:

  • Bandwidth (super cheap!) - With quotas measured in TB/month, Dreamhost offers a factor of 1000 more than Textdrive. Clearly fiber is cheap compared to other things.
  • Disk (cheap) - Available in 100 GB increments, disk is also pretty inexpensive for them. According to this post on their blog, Dreamhost pays $10 per useable GB on their disk arrays. That is not nearly as cheap as I would have expected given a 200 GB quota. (I could be using $2000 worth of disk!?!)
  • CPU (expensive) - From reading the comments from other customers, this seems to be where Dreamhost actively tries to conserve resources. Each server monitors how many CPU seconds every process uses, and alerts a sysadmin if one user is taking an “excessive” amount of CPU time. DH uses FastCGI to run PHP and Python scripts with your user privileges, so dynamically generated pages are included in your CPU usage. It is never explicitly stated what “excessive” is precisely, but once that threshold is crossed, Dreamhost tech support will ask you to reduce your usage. If that doesn’t appear to be possible, they might manually move you to a less-loaded server, or they might just require you to turn some stuff off. Customers have frequently requested that “excessive” be clarified, and that DH establish a plan system for people to purchase more CPU time.
  • Telephone support (very expensive) - Clearly people time is the most expensive of all factors in the hosting equation. Here Dreamhost is very conservative, offering no telephone support for the cheapest plan, and rationing out support in the higher plans in units of 1, 3 or 5 phone callbacks per month. Notice that is callbacks! You can’t even initiate the call. Phone time must be very expensive indeed.
Entries (RSS)