The Dulin Report

Browsable archive from the WordPress export.

Results (89)

Strategic activity mapping for software architects May 25, 2025 On the role of Distinguished Engineer and CTO Mindset Apr 27, 2025 The future is bright Mar 30, 2025 2024 Reflections Dec 31, 2024 My giant follows me wherever I go Sep 20, 2024 The day I became an architect Sep 11, 2024 Are developer jobs truly in decline? Jun 29, 2024 Leadership is About "We," Not "I" Jun 9, 2024 Form follows fiasco Mar 31, 2024 Software Engineering is here to stay Mar 3, 2024 Some thoughts on recent RTO announcements Jun 22, 2023 On Amazon Prime Video’s move to a monolith May 14, 2023 One size does not fit all: neither cloud nor on-prem Apr 10, 2023 Some thoughts on the latest LastPass fiasco Mar 5, 2023 Comparing AWS SQS, SNS, and Kinesis: A Technical Breakdown for Enterprise Developers Feb 11, 2023 Working from home works as well as any distributed team Nov 25, 2022 Why you should question the “database per service” pattern Oct 5, 2022 Book review: Clojure for the Brave and True Oct 2, 2022 Stop Shakespearizing Sep 16, 2022 Why don’t they tell you that in the instructions? Aug 31, 2022 Monolithic repository vs a monolith Aug 23, 2022 Automation and coding tools for pet projects on the Apple hardware May 28, 2022 There is no such thing as one grand unified full-stack programming language May 27, 2022 Most terrifying professional artifact May 14, 2022 If you haven’t done it already, get yourself a Raspberry Pi and install Linux on it May 9, 2022 Good idea fairy strikes when you least expect it May 2, 2022 Kitchen table conversations Nov 7, 2021 Application developers like to think their app is the only one Apr 5, 2021 A year of COVID taught us all how to work remotely Feb 10, 2021 What programming language to use for a brand new project? Feb 18, 2020 The religion of JavaScript Nov 26, 2018 Teleportation can corrupt your data Sep 29, 2018 Let’s talk cloud neutrality Sep 17, 2018 What does a Chief Software Architect do? Jun 23, 2018 Nobody wants your app Aug 2, 2017 TypeScript starts where JavaScript leaves off Aug 2, 2017 Singletons in TypeScript Jul 16, 2017 Emails, politics, and common sense Jan 14, 2017 Online grocers have an additional burden to be reliable Jan 5, 2017 Collaborative work in the cloud: what I learned teaching my daughter how to code Dec 10, 2016 Apple’s recent announcements have been underwhelming Oct 29, 2016 What I learned from using Amazon Alexa for a month Sep 7, 2016 Why I switched to Android and Google Project Fi and why should you Aug 28, 2016 Amazon Alexa is eating the retailers alive Jun 22, 2016 In search for the mythical neutrality among top-tier public cloud providers Jun 18, 2016 In Support Of Gary Johnson Jun 13, 2016 Files and folders: apps vs documents May 26, 2016 What can we learn from the last week's salesforce.com outage ? May 15, 2016 Why it makes perfect sense for Dropbox to leave AWS May 7, 2016 JEE in the cloud era: building application servers Apr 22, 2016 Let's stop letting tools get in the way of results Apr 10, 2016 JavaScript as the language of the cloud Feb 20, 2016 LinkedIn needs a reset Feb 13, 2016 In memory of Ed Yourdon Jan 23, 2016 Our civilization has a single point of failure Dec 16, 2015 IT departments must transform in the face of the cloud revolution Nov 9, 2015 I Stand With Ahmed Sep 19, 2015 Setting Up Cross-Region Replication of AWS RDS for PostgreSQL Sep 12, 2015 Top Ten Differences Between ActiveMQ and Amazon SQS Sep 5, 2015 We Live in a Mobile Device Notification Hell Aug 22, 2015 What Every College Computer Science Freshman Should Know Aug 14, 2015 On Maintaining Personal Brand as a Software Engineer Aug 2, 2015 The Three Myths About JavaScript Simplicity Jul 10, 2015 Book Review: "Shop Class As Soulcraft" By Matthew B. Crawford Jul 5, 2015 Attracting STEM Graduates to Traditional Enterprise IT Jul 4, 2015 Your IT Department's Kodak Moment Jun 17, 2015 The longer the chain of responsibility the less likely there is anyone in the hierarchy who can actually accept it Jun 7, 2015 Big Data is not all about Hadoop May 30, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 The Clarkson School Class of 2015 Commencement speech May 5, 2015 Why I am not Getting an Apple Watch For Now: Or Ever Apr 26, 2015 My Brief Affair With Android Apr 25, 2015 Exploration of the Software Engineering as a Profession Apr 8, 2015 What can Evernote Teach Us About Enterprise App Architecture Apr 2, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 Do not apply data science methods without understanding them Mar 25, 2015 On apprenticeship Feb 13, 2015 On Managing Stress, Multitasking and Other New Year's Resolutions Jan 1, 2015 Why I am Tempted to Replace Cassandra With DynamoDB Nov 13, 2014 Software Engineering and Domain Area Expertise Nov 7, 2014 Docker can fundamentally change how you think of server deployments Aug 26, 2014 Wall St. wakes up to underinvestment in OMS Aug 21, 2014 Software Engineers Are Not Doctors Aug 3, 2014 Thanking MIT Scratch Sep 14, 2013 Have computers become too complicated for teaching ? Jan 1, 2013 Thoughts on Wall Street Technology Aug 11, 2012 Scripting News: After X years programming Jun 5, 2012 Java, Linux and UNIX: How much things have progressed Dec 7, 2010 Eminence Grise: A trusted advisor May 13, 2009

Why I am Tempted to Replace Cassandra With DynamoDB

November 13, 2014

I have written about Cassandra in the past. I have been using Cassandra actively for the past three years, and I am one of the big advocates of technology out there. However, as I have pointed in this blog and on my Twitter page - if you plan on scaling Cassandra out, be prepared to recruit an army of Java developers to do devops. Cassandra becomes a devops nightmare beyond 3-4 nodes. In this post I am going to try and explain why.

I started seriously considering DynamoDB for my project when I started looking into seemingly excessive inter-zone network charges. We have traced it down to our Cassandra cluster of 3 nodes and replication factor 3 that essentially tripled our network charges on a regular basis. As we started thinking through optimization scenarios and whether we need Cassandra at all for some parts of our application, DynamoDB began to make sense. We have successfully replaced a custom ActiveMQ cluster with Amazon SQS resulting in over a $1000 in monthly savings in AWS charges, and even more savings in terms of devops. Could we do the same with Cassandra ?

Cassandra devops revolves around the following areas: capacity and replication planning, consistency, scaling up and down, software upgrades, node replacements, and regular repairs.

Capacity and Replication Planning

In order to plan capacity with Cassandra one must understand the performance of a single node, performance impact of replication across more than one, and consistency when more than one node is involved. There is no document that says "If you provision this instance type on AWS and configure it in this way you will get this many operations per second."

There is a multitude of settings in the configuration files that require a graduate degree in computer science to comprehend and that are best left alone at their defaults. In other words, there is no sure way for me to say that if I want this many concurrent users doing this many concurrent operations I need this type of a cluster.

Contrasting that with DynamoDB, as far as capacity planning goes all I need to care about is what is the minimum IOPS require by my application of the particular table, what is the maximum I am willing to pay for, and how often and when I should auto scale it. Period. End of story.

Consistency

In Cassandra world consistency revolves around two factors: consistency level and replication factor. You can have fast performance and eventual consistency, or you can have slower performance and high consistency. While consistency level is specified per call, replication factor is specified at key space initialization. If you ever want to change replication factor be prepared for hours of maintenance work which becomes impossible on a live cluster once the number of nodes grows.

Again, this is an area where DynamoDB model makes much more sense. If I want consistent reads I must pay twice for IOPS. That's it. It becomes a purely financial decision.

Scaling up and down

Scaling a Cassandra cluster involves adding new nodes. Each additional node require hours of baby sitting. The process of adding a node takes a few mins, but bootstrapping can take hours. If you are using tokens you are in a bigger pickle since you have to compute just the right balance, move tokens around, and clean up (* we are using tokens since this is a legacy production cluster, and there is no safe and easy way to migrate to vnodes). Once you have added a node, it becomes a fixed cost plus extra network charges. If you ever want to scale down you have to work backwards and decommission extra nodes, which takes hours, and then you have to rebalance your cluster again if you're still using tokens.

The tokens vs vnodes situation is of particular annoyance to me. Cassandra has left many of us excluded from this feature because it does not offer clean , safe and seamless upgrade mechanism.

Going back to DynamoDB, the only thing I need to care about is IOPS. What is my minimum ? What is my maximum ? How much am I willing to pay. Period. End of story.

Software upgrades

Each time I had to upgrade Cassandra the process was the same and tedious: go to each node, upgrade the software, verify the settings have migrated (Cassandra does not offer tools to cleanly port settings from older versions), start the new binaries, run upgrade ss-tables process. It is a process that is bound to ruin a weekend for me. I am simply no longer interested.

One of the pet annoyances I have with Cassandra is how they deprecated Thrift API. Many of us have used the software for years and now have to either use deprecated API or port code to new CQL. Some of us have chosen, wisely or not, to use a Thrift library that is no longer up to date. So to use the new API we have to port the code, and an obvious question comes up - if I have to port my code to new library, do I still want to use Cassandra ?

I do not need to concern myself with software upgrades with DynamoDB. Period. End of story.

Node replacements

This is similar to scaling, as I described above. Node replacement in Cassandra world is an hours long process. No such thing with DynamoDB.

Regular repairs

If a cluster grows larger, especially in multi data center scenarios, Cassandra recommends that a regular repair process is run on each node. Again, this is a long running process that imposes significant IO workload on all nodes in the cluster. It can run for days on end, results in extra disk utilization, and requires baby sitting. On more than one occasion it has ruined a weekend for me.

DynamoDB does not require me to do anything of the sort.

So what is the moral of this story ?

From the data model perspective, DynamoDB and Cassandra are very similar. Cassandra offers more flexibility for sure, and I would much prefer Cassandra over DynamoDB. However, with no managed offering that is as simple as DynamoDB I really don't have the patience anymore.

Yes, there is Instaclustr. But that too misses the point. I have done the math - it is simply not cost effective, and requires me to do the same capacity planning exercises I am trying to avoid.

What I really am looking for is a fully managed Cassandra system that works just like DynamoDB, and only pay for capacity that I actually use, with simple API calls to scale up and down. Until that happens I see DynamoDB on my horizon.