Over the weekend, I rediscovered Amazon’s Simple Storage Solution (S3). Like a lot of good ideas, I didn’t pay much attention to this service when it was first announced because I didn’t know how to classify it in my head. When I stumbled on it again, the model was obvious: S3 is a giant hash table connected to the Internet.
To use Python dictionaries as an analogy, this is the essence of S3:
# A bucket is the Amazon term for a collection of keys
bucket = { }
# Upload data to the bucket and associate it with a string key
bucket['my_key'] = ’some data’
# Retreive data by key
my_data = bucket['my_key']
That’s basically it. However, instead of using Python, you communicate using HTTP GET and PUT requests. Of course, that is easy to wrap in a Python (or Ruby or Perl) interface, but fundamentally this is a hash table that speaks Web.
Everyone already has a program to retrieve objects from S3: your web browser. That makes S3 ideal for storing static content associated with a website, as Adrian Holovaty did for his chicagocrime.org website. Just return a URL to the S3 object, and all of your users will download the file from Amazon. If it is a large file, you can even instantly create a torrent by appending ?torrent to the URL. Amazon will play the role of the BitTorrent tracker and seeder.
Which brings me to the pressing question: How does Amazon make money? They charge for the service, but unlike most web hosting providers, there is no flat rate or setup fee. You pay only for your data usage with no base price. Originally, this was very simple: 15 cents per GB stored per month, and 20 cents per GB transfered (upload or download) per month. So if you uploaded a 100 MB file, and served 10000 copies of it in a month, that would cost you: 0.1 GB x $0.20 upload + 0.1 GB x $0.15 storage for a month + 0.1 GB x 10000 x $0.20 download = $200.04. Moreover, if your website suddenly becomes very popular, you can scale up to handle the traffic with no effort and no capital investment. (It will cost you at the end of the month, but presumably you’re making money with all this traffic.)
As of this morning, Amazon announced a new pricing structure which is a little more complicated. Storage stays the same, upload costs were cut in half, and download costs were broken into tiers with the per GB cost dropping as your bandwidth went up. In addition, Amazon now has a per request charge of $0.01 per 1,000 PUTs and $0.01 per 10,000 GETs. As of June 1, our example of 100 MB served 10000 times in a month would cost: 0.1 GB x $0.10 upload + 0.1 GB x $0.15 storage + 0.1 GB x 10,000 x $0.18 download = 180.03. A slight drop in price! On the other hand, chicagocrime.org’s bill is likely to go up due to the per request charge, since he is only serving 925 kB per visitor, but dealing with lots and lots of visitors.
I don’t have a use for this service yet, but I will definitely keep it in mind for future projects.