Monday, April 25, 2016

Capacity units used by Amazon DynamoDB's Scan operation

Amazon DynamoDB's pricing model is based on provisioning a table with read and write capacity units per second. For basic operations, this is simple enough; a put takes 1 write capacity unit per kilobyte of data, rounded up, a get takes 0.5 read capacity units per 4 kilobytes of data, rounded up, or 1 unit if a strongly consistent get is required. It's made less clear by the varying block size per operation, and the way that prices for all this are quoted ($0.0065 for 10 units write capacity, $0.0065 for 50 units read capacity...), but it's at least explicit.

So, reading a 10 byte record costs 0.5 read capacity units, reading a 3999 byte record costs 0.5 read capacity units, reading a 4001 byte record costs 1 read capacity unit.

Things get a bit less clear when it comes to scan, which reads through the table in bulk. The documentation isn't at all clear on whether costing per record applies (with the rounding up behaviour seen in get), or costing per byte. The Internet doesn't seem to be too sure, either; there are stackoverflow posts etc. supporting both positions. It would make a huge difference to costs for applications with small records (and it's worth noting that the export data pipeline that Amazon recommends people use for backups uses scan...). If you have 1000 records of 10 bytes each, then if charging per record applies, that's 500 units, if charging by byte applies it's one-and-a-bit units. So I thought I'd find out for myself.

Here's a little test app (as an aside, have you ever seen a more obfuscated interface than the DynamoDB BatchGetItems one? It's not documented either, as far as I can see...)


Finished populating
For batchGet of 100 count of 10 byte records (1000 bytes), used [{TableName: ScanTestTable,CapacityUnits: 50.0,}] units
For scan of 100 count of 10 byte records (1000 bytes), used 0.5 units
Finished populating
For scan of 100 count of 100 byte records (10000 bytes), used 1.5 units
Finished populating
For scan of 100 count of 1000 byte records (100000 bytes), used 12.5 units

As expected, the batchGet (just a bunch of gets bundled into one request, in effect) behaves just like get, and costs half a capacity unit per record. The scan, however, does end up charging based on total byte size of the operation (rounded to 4kB), ignoring the number of records involved.

So, there you have it; scanning costs 0.5 read capacity units per 4 kB. This means that to provision a table to scan at 1MB/sec requires 125 capacity units, which costs $0.01625 per hour. Not too bad.

Thursday, February 11, 2016

An important message from Dear Leader Creighton

I have just been honoured with a glossy campaign leaflet from Lucinda Creighton, the glorious leader of Renua, in my letterbox. What has she to say?

Hard working mothers and fathers still pay at least half of what they earn in income tax! This is indeed startling news. Now, according to KPMG's handy tax calculator, if you're a couple, you start paying more than half what you earn at about the 770,000 euro mark (assuming you don't have a pension). So to all of you mothers and fathers out there earning less than that, sorry, you're not hard-working. Must do better. As to all of you who are neither mothers nor fathers, well, really, what more can be said?

It's good to know where Renua's definition of 'hard-working' falls, though.