Archive

The Dulin Report

Browsable archive from the WordPress export.

2015

On Managing Stress, Multitasking and Other New Year's Resolutions Jan 1, 2015 Configuring Master-Slave Replication With PostgreSQL Jan 31, 2015 Trying to Replace Cassandra with DynamoDB ? Not so fast Feb 2, 2015 On apprenticeship Feb 13, 2015 Where AWS Elastic BeanStalk Could be Better Mar 3, 2015 Finding Unused Elastic Load Balancers Mar 24, 2015 Do not apply data science methods without understanding them Mar 25, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 Two developers choose to take a class Apr 1, 2015 What can Evernote Teach Us About Enterprise App Architecture Apr 2, 2015 Exploration of the Software Engineering as a Profession Apr 8, 2015 Ordered Sets and Logs in Cassandra vs SQL Apr 8, 2015 Building a Supercomputer in AWS: Is it even worth it ? Apr 13, 2015 Apple is (or was) the Biggest User of Apache Cassandra Apr 23, 2015 My Brief Affair With Android Apr 25, 2015 Why I am not Getting an Apple Watch For Now: Or Ever Apr 26, 2015 The Clarkson School Class of 2015 Commencement May 5, 2015 The Clarkson School Class of 2015 Commencement speech May 5, 2015 We Need a Cloud Version of Cassandra May 7, 2015 Guaranteeing Delivery of Messages with AWS SQS May 9, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 Big Data is not all about Hadoop May 30, 2015 The longer the chain of responsibility the less likely there is anyone in the hierarchy who can actually accept it Jun 7, 2015 Your IT Department's Kodak Moment Jun 17, 2015 Attracting STEM Graduates to Traditional Enterprise IT Jul 4, 2015 Book Review: "Shop Class As Soulcraft" By Matthew B. Crawford Jul 5, 2015 The Three Myths About JavaScript Simplicity Jul 10, 2015 Social Media Detox Jul 11, 2015 Big Data Should Be Used To Make Ads More Relevant Jul 29, 2015 On Maintaining Personal Brand as a Software Engineer Aug 2, 2015 Ten Questions to Consider Before Choosing Cassandra Aug 8, 2015 What Every College Computer Science Freshman Should Know Aug 14, 2015 We Live in a Mobile Device Notification Hell Aug 22, 2015 Top Ten Differences Between ActiveMQ and Amazon SQS Sep 5, 2015 Setting Up Cross-Region Replication of AWS RDS for PostgreSQL Sep 12, 2015 I Stand With Ahmed Sep 19, 2015 Banking Technology is in Dire Need of Standartization and Openness Sep 28, 2015 IT departments must transform in the face of the cloud revolution Nov 9, 2015 Operations costs are the Achille's heel of NoSQL Nov 23, 2015 Our civilization has a single point of failure Dec 16, 2015

Trying to Replace Cassandra with DynamoDB ? Not so fast

February 2, 2015

In November last year I pointed out how tempted I was to replace Cassandra with DynamoDB. Since then I have done some research and things are not as straightforward as they may seem at first.



I'd like to revisit my post and clarify a few things. On elasticity of Cassandra I said the following:



Scaling a Cassandra cluster involves adding new nodes. Each additional node require hours of baby sitting. The process of adding a node takes a few mins, but bootstrapping can take hours. If you are using tokens you are in a bigger pickle since you have to compute just the right balance, move tokens around, and clean up (* we are using tokens since this is a legacy production cluster, and there is no safe and easy way to migrate to vnodes). Once you have added a node, it becomes a fixed cost plus extra network charges. If you ever want to scale down you have to work backwards and decommission extra nodes, which takes hours, and then you have to rebalance your cluster again if you're still using tokens.

Going back to DynamoDB, the only thing I need to care about is IOPS. What is my minimum ? What is my maximum ? How much am I willing to pay. Period. End of story.



Not so fast. The story doesn't actually end there. As it turns out, there is a very important factor that I have not considered. What I did not consider was Cassandra's burst performance. Allow me to explain.



Suppose your application experiences extended periods of low traffic, with significant bursts of activity every few hours. For example, overnight there are batch processes that update the data and then come morning thousands of mobile devices wake up and download the data.



For the sake of the conversation lets say the number of devices is 1000. As per SLA the users expect to get their data in seconds. Let's also say that as per SLA you have to guarantee that up to 250 concurrent requests must return in under 10 seconds. What that means is that if overnight you ran a job for, say, 10 hours that updated 1000 records per device, when those devices wake up you will need to read 250*1000/10=25000 runits of data per second.



Now, Cassandra sitting on a c3.xlarge AWS instance and using SSDs for storage will be more than happy to oblige. DynamoDB, on the other hand, is a bit more intricate.



If you wanted to pay for 25000 read capacity units, you don't really have any problems. However, a DynamoDB table with that much of provisioned capacity is actually orders of magnitude more expensive than a manually configured Cassandra cluster capable of this performance.



On the other hand, it may seem that you could use DynamoDB auto-scaling. The problem, however, is that it can take hours to go from 100 capacity units to 25000 units (at least per my benchmarks). Your users won't understand your excuses for not complying with the SLA.



As it turns out, DynamoDB makes a heck of a lot of sense if you have a steady-stream write and read workloads. You may be able to write into DynamoDB via SQS so you can deal with bursts of activity. In fact, comparison with electric utility is the best analogy I could come up with. Imagine if your electric company took hours to ramp up capacity every morning when people wake up and turn the lights on. Likewise, what I would like to see from Amazon is a DynamoDB pricing model that works like this:




  1. You provision a maximum “fuse” capacity you are willing to pay for. There is a one-time fee to buy the “fuze.” Continuing with the electrical utility analogy this is like paying to get connected to the grid and purchasing a meter and a fuze panel.

  2. You are charged exclusively for the utilization. Once the “fuze” is in place, you only pay for capacity you actually use. If you go for an hour without accessing your table at all, you pay zero for that hour. If you use 12367 read units per second for 25 minutes, you pay for that. If you reach the capacity of your “fuze” you get an exception and you have to deal with it in your application.



I am keeping an eye on changes to the DynamoDB pricing model and I look forward to Amazon improving the platform. Until then, I guess I am stuck with Cassandra.