Archive

The Dulin Report

Browsable archive from the WordPress export.

Results (69)

On the role of Distinguished Engineer and CTO Mindset Apr 27, 2025 The future is bright Mar 30, 2025 My giant follows me wherever I go Sep 20, 2024 On Amazon Prime Video’s move to a monolith May 14, 2023 One size does not fit all: neither cloud nor on-prem Apr 10, 2023 Some thoughts on the latest LastPass fiasco Mar 5, 2023 Comparing AWS SQS, SNS, and Kinesis: A Technical Breakdown for Enterprise Developers Feb 11, 2023 Why you should question the “database per service” pattern Oct 5, 2022 Stop Shakespearizing Sep 16, 2022 Monolithic repository vs a monolith Aug 23, 2022 All developers should know UNIX Jun 30, 2022 Scripting languages are tools for tying APIs together, not building complex systems Jun 8, 2022 Java is no longer relevant May 29, 2022 There is no such thing as one grand unified full-stack programming language May 27, 2022 Best practices for building a microservice architecture Apr 25, 2022 In most cases, there is no need for NoSQL Apr 18, 2022 What programming language to use for a brand new project? Feb 18, 2020 TDWI 2019: Architecting Modern Big Data API Ecosystems May 30, 2019 Returning security back to the user Feb 2, 2019 Microsoft acquires Citus Data Jan 26, 2019 Adobe Creative Cloud is an example of iPad replacing a laptop Jan 3, 2019 The religion of JavaScript Nov 26, 2018 Let’s talk cloud neutrality Sep 17, 2018 A conservative version of Facebook? Aug 30, 2018 On Facebook and Twitter censorship Aug 20, 2018 What does a Chief Software Architect do? Jun 23, 2018 Facebook is the new Microsoft Apr 14, 2018 Quick guide to Internet privacy for families Apr 7, 2018 Node.js is a perfect enterprise application platform Jul 30, 2017 Design patterns in TypeScript: Chain of Responsibility Jul 22, 2017 I built an ultimate development environment for iPad Pro. Here is how. Jul 21, 2017 Singletons in TypeScript Jul 16, 2017 The technology publishing industry needs to transform in order to survive Jun 30, 2017 Rather than innovating Walmart bullies their tech vendors to leave AWS Jun 27, 2017 Copyright in the 21st century or how "IT Gurus of Atlanta" plagiarized my and other's articles Mar 21, 2017 Emails, politics, and common sense Jan 14, 2017 Windows 10: a confession from an iOS traitor Jan 4, 2017 Collaborative work in the cloud: what I learned teaching my daughter how to code Dec 10, 2016 Don't trust your cloud service until you've read the terms Sep 27, 2016 I am addicted to Medium, and I am tempted to move my entire blog to it Sep 9, 2016 What I learned from using Amazon Alexa for a month Sep 7, 2016 Why I switched to Android and Google Project Fi and why should you Aug 28, 2016 In search for the mythical neutrality among top-tier public cloud providers Jun 18, 2016 Files and folders: apps vs documents May 26, 2016 What can we learn from the last week's salesforce.com outage ? May 15, 2016 Why it makes perfect sense for Dropbox to leave AWS May 7, 2016 JEE in the cloud era: building application servers Apr 22, 2016 Managed IT is not the future of the cloud Apr 9, 2016 JavaScript as the language of the cloud Feb 20, 2016 OAuth 2.0: the protocol at the center of the universe Jan 1, 2016 Operations costs are the Achille's heel of NoSQL Nov 23, 2015 IT departments must transform in the face of the cloud revolution Nov 9, 2015 Top Ten Differences Between ActiveMQ and Amazon SQS Sep 5, 2015 What Every College Computer Science Freshman Should Know Aug 14, 2015 Ten Questions to Consider Before Choosing Cassandra Aug 8, 2015 Your IT Department's Kodak Moment Jun 17, 2015 Smart IT Departments Own Their Business API and Take Ownership of Data Governance May 13, 2015 We Need a Cloud Version of Cassandra May 7, 2015 Building a Supercomputer in AWS: Is it even worth it ? Apr 13, 2015 Ordered Sets and Logs in Cassandra vs SQL Apr 8, 2015 Exploration of the Software Engineering as a Profession Apr 8, 2015 What can Evernote Teach Us About Enterprise App Architecture Apr 2, 2015 Microsoft and Apple Have Everything to Lose if Chromebooks Succeed Mar 31, 2015 Where AWS Elastic BeanStalk Could be Better Mar 3, 2015 Docker can fundamentally change how you think of server deployments Aug 26, 2014 Infrastructure in the cloud vs on-premise Aug 25, 2014 Cassandra: a key puzzle piece in a design for failure Aug 18, 2014 Cassandra: Lessons Learned Jun 6, 2014 Things I wish Apache Cassandra was better at Feb 12, 2014

Ordered Sets and Logs in Cassandra vs SQL

April 8, 2015

I've written before that Cassandra's achilles' heel is devops:
Storage, redundancy and performance are expanded by adding more nodes. This can happen during normal business hours as long as consistency parameters are met. Same applies to node replacements.

As the number of servers grows be prepared to hire a devops army or look for a managed solution. Datastax offering helps but still not enough. Even in the cloud there is no good managed solution that we found. Cassandra.io requires you give up Thrift and CQL, and Instaclustr as of this moment does not use third generation SSD-backed instance types.

Since I have a total of about two dozen Cassandra nodes to upgrade and regression test the application I am not really looking forward to that exercise. So, one of the things I am working on is finding out whether we actually need Cassandra at all.

Having eliminated all relational-style queries (which are not a use case for Cassandra) by moving some of the data to PostgreSQL (which by the way, in some cases outperforms Cassandra by at least a factor of two on larger data sets1), I am now looking at some of the more obscure data structures that may be perfect for Cassandra but do not map directly onto SQL easily.

Frequently Updated Ordered Sets of Tuples


In Cassandra, there is a very convenient mechanism for maintaining an ordered set. You create a column family for a collection of ordered sets, with row key being a set identifier and column name being the name of an element in the set. Since columns are sorted and column names are unique within a row, you get an ordered set. In Cassandra a column family with ordered sets of tuples looks like this:
OrderedSetsCF   ->
SetA -> (X=1),(Y=2),(Z=3)
SetB -> (Y=10),(Z=4)

In SQL such a data structure would look like a table where set identifier and set element name are parts of the primary key, and value of an element is the third column. In this approach, the set identifier becomes denormalized and repeated for each set element. For large sets this can become a very big table, especially if you are using UUIDs for primary keys (which is a common technique with Cassandra):
OrderedSetsTable

SetID(PK) ElementName(PK) ElementValue
----------------------------------------
SetA X 1
SetA Y 2
SetA Z 3
SetB Y 10
SetB Z 4

Note that you can mitigate the problem of primary key denormalization in a situation where your keys are much bigger than is actually needed by the universe of your data (i.e. UUIDs). You can create a separate look up table where you store an integer code mapped onto your UUIDs and then use the integer code for your primary keys in the sets table.

Setting a value of a tuple in Cassandra is an O(1) operation due to the fact that Cassandra writes into a commitlog and a memtable and returns immediately. If you are writing into your set very rapidly, the values may not be readable until relatively much later especially if you have a high replication factor and you are not always hitting the same node for reads as the one you are using for writes. Eventually, over some period of time, your data becomes consistent across the cluster.

Since Cassandra is not ACID, the column does not need to be first inserted before you can do an update. Effectively, in Cassandra a create is the same as update. Not a single SQL database provides standard UPSERT operation and there is a lot of academic debate on why.

In SQL an insert into such a table is an O(log(n)) operation (roughly speaking, depending on the type of an index used) where nis a multiple of SetID and ElementName. As the number and size of such sets grows, and if the size of the keys grows, inserts become increasingly slower. Depending on the rate of your data ingestion this may or may not be much of an issue, considering that the upside of using SQL is that your data is immediately readable and you don't have an eventual consistency problem2.

Since there is no UPSERT in SQL, and even if there was one it would require a read-before-write pattern, if you are to update a value you will incur a 2*O(log(n)) performance penalty -- one time to look up the existing row, second time to update it, depending on how you do this and what your RDBMS query planner does. Again, this is the price you pay for ACID.

Updates and deletes complicate the situation for both Cassandra and an SQL database but differently. Cassandra uses append-onlySSTable files, meaning that all writes are O(1). However, in the background it must perform a compaction and until that happens you may use more disk space than you actually need and your reads become orders of magnitude slower because they have to skip over tombstones.

To illustrate the tombstone penalty, let's pretend that value X in SetA above has been updated 3 times and value Y was deleted. Until compaction happens, what is actually stored on disk is this:
OrderedSetsCF   ->
SetA -> (X=1),(Y=2),(Z=3)
-> (X=4) //(X=1) tombstone
-> (X=5) //(X=4) tombstone
-> (X=6) //(X=5) tombstone
-> {Y=2} //(Y=2) tombstone

Your set of 3 elements now takes up 4 extra units of storage until compaction happens. Furthermore, to get to the latest value of X and to tell that Y has been deleted Cassandra has to skip over the tombstones. So, what could have been an O(1)+O(log(n))operation (to first get to SetA using hash key and then to get to X using column index) now becomes an O(1)+O(log(n))+O(m) where n is the number of columns (elements in your set) and m is the number of times you updated X. If you just did a few hundred thousand updates to the same value over a relatively short period of time, you just created a serious bottleneck for your read operations.

A Cassandra row compaction itself requires additional storage and significant I/O capacity. Without getting into intricacies of the mechanism, consider the fact that in Cassandra SSTables are write-only. So, in order to compact the SetA row from the above example, Cassandra scans the row and skips over tombstones and writes it out into a new SSTable file. Once that file is written, the original fragmented SSTable is deleted. As a result, Cassandra may temporarily double the storage it requires. Furthermore, you have little control over when compaction actually happens.

Now, SQL databases are not immune to the problem of frequently updated data. For example, PostgreSQL VACCUM operation does exactly that -- a very similar mechanism to what Cassandra does with compactions. There is really no escaping that problem. SQL databases like PostgreSQL may give you better control over storage reclamation than Cassandra, though, and because of the differences between VACUUM and compactions you are not incurring the tombstone penalty.

The reason why one would want to use SQL for this type of structure is simple: if you have a requirement to perform some real-time analytics over your data. It is as simple as that -- are you accumulating rapidly changing data to process it later (in which case Cassandra or DynamoDB are very appropriate) or are you accumulating same data in order to gain meaningful insights out of it in real-time (in which case SQL is more appropriate) ?

Log-style Structures


Let's suppose you want to store events from a multitude of remote sensors. This can be temperature sensors in different locations, market data by ticker symbol, or clicks in the apps across thousands or millions of users. Suppose you only want to retain this data for 2 days.

Each event is identified by its source, time (in milliseconds), event type, event id, and some event value.

In Cassandra column family one would store it like this. Source is your row key, time+eventType+eventId are your composite column name, and eventValue is your column value. Each column will have a ttl of 2 days (expressed in seconds). It would look like this:
EventsCF >
SourceA -> (0:TypeA:1, X, ttl=172800), (1:TypeA:2,Y, ttl=172800), (1:TypeB:3,X, ttl=172800)
SourceB -> (0:TypeB:1, X, ttl=172800), (1:TypeA:2,Y, ttl=172800)

and so on and so forth. In SQL, it would look like this:
EventsTable

source(PK) time(PK) eventType(PK) eventId(PK) eventValue
---------------------------------------------------------------
SourceA 0 TypeA 1 X
SourceA 1 TypeA 2 Y
SourceA 1 TypeB 1 X

and so on. You'd need a batch job to run regularly to delete rows older than 2 days.

In Cassandra, all writes are O(1). In SQL, all writes are O(log(n)) where n is the number of PK combinations. As the size of your log grows, so will the time it takes to insert a row. One could mitigate this in SQL by not using PK or indeces, but then querying this table will become nearly impossible.

Cassandra has a concept of a ttl on values, meaning that they logically disappear when ttl is up. However, it does not mean that the disk space is reclaimed. This too suffers from the compaction problem, and until compaction happens this data structure may consume an enormous amount of disk space. Suppose you accumulate 1 million log entries per day per source. Five days days later, unless compaction happened, you are actually storing 3 days more of data than you actually require.

Retrieval of this data out of Cassandra becomes a bit tricky. If you naively assume that by reading the entire row you are only reading the last two days worth of data you are wrong. Until compaction happens Cassandra will have to scan over tombstones - and in this example three days worth of them! Even if you optimize a bit and use a slice query starting at two days ago the best you will get out of Cassandra will be O(log(n)) where n is the total number of log entries you made in the last five days (until compaction happens).

The disk storage problem is further exacerbated here. Since the data with expired ttl won't actually get deleted until compaction happens, and compaction itself may temporarily double the disk storage requirement you need to make sure you leave extra space on each node. Furthermore, this type of a structure in Cassandra may create an imbalance in the cluster if the amount of data varies a lot between sources.

Cassandra is a clear winner here from the performance perspective if the goal is to collect immense amount of data, especially if that data never expires. However, in the cloud environment like AWS I'd use Amazon's facilities such as DynamoDB, EMR, or RedShift. Cassandra, as it grows, does become a devops nightmare. Over time you may end up with dozens, or hundreds of nodes if you never expire or delete data.

Conclusion


So what am I really getting at here ? Well, Cassandra really is a devops nightmare. I know I am going to stir some debate up on twitter with what I just said. I'd love nothing more than to stop using it. However, it continues to be a useful tool for some of the use cases I deal with, and for all its flaws I have not found a better option yet. As I keep saying, all I want is Cassandra that is a managed SaaS like DynamoDB where I don't have to worry about devops.






  1. Yes I know I need to provide a benchmark. In this post I wanted to spark a conversation and then if I find the time I'll post a benchmark. 

  2. This is negated by the use of read replicas that may experience a lag.