- Apache Cassandra is a free and open-source, distributed & wide column store.
- It is a NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.
- Amazon KeySpaces can be used to deploy Casandra on AWS.
- ⭐ DynamoDB vs Casandra vs MongoDB
Feature | Description |
---|---|
⭐ Low-Latency, Faster Writes | Since writes in Casandra result in storage in an append-only structure, writes are generally very fast. - Casandra provides low latency, at the cost of consistency. - Refer PACELC theorem for more info. |
Rich data model | This is column-oriented. - It means, Cassandra stores columns based on the column names, leading to very quick slicing. - Unlike traditional databases, where column names only consist of metadata, in Cassandra, column names can also consist of the actual data. |
Peer to Peer Architecture | There is no single point of failure in Cassandra, since it uses a P2P architecture (Leaderless replication). - Any number of servers/nodes can be added to any Cassandra cluster in any of the data centers. |
High Availability, Fault-Tolerance | Apache Casandra provides high-availability & fault-tolerance with tunable consistency levels. - Any number of nodes can be added or deleted in the Cassandra cluster without much disturbance. - As scaling happens, read and write throughput both increase simultaneously with zero downtime or any pause to the applications. |
Scales Horizontally & Linearly | Apache Cassandra has a high-scalability architecture. - Cassandra cluster can be easily scaled-up or scaled-down. - Generally doubling the size of the cluster, would result in the half latency (both at the median and 99th percentile). |
Support replication - Cross-site, Data-Centers | Cassandra offers robust support for clusters spanning multiple data centers, with asynchronous leaderless replication allowing low latency operations for all clients. |
Integration with systems (like Spark, HDFS etc.) | Cassandra offers options for bulk importing data from other data sources (such as HDFS) into the Cassandra cluster by building entire SSTables and then streaming the tables into the cluster. - Streaming the tables into the cluster is much simpler, faster and more efficient than sending millions or more of individual INSERT statements for all the data you want to load into Cassandra. |
Supported Consistency Patterns | Eventual Consistency Model |
Casandra Query Language (CGL) | By default, Cassandra provides a prompt Cassandra query language shell (cqlsh) that allows users to communicate with it. - Using this shell, you can execute Cassandra Query Language (CQL). - Using cqlsh, you can define a schema, insert data, and execute a query. - Cassandra does not support joins or subqueries and therefore requires a developer to denormalize the data or duplicate data for efficient access. |
Use Case |
---|
High-Write, Low-Read use cases |
Historical records |
Processing server logs |
Social media posts |
PDF documents |
Emails |
Time Series Data (with JSON as value) |
- How Uber Manages a Million Writes Per Second Using Mesos and Cassandra Across Multiple Datacenters?
- Netflix - Casandra - Time Series Data
- Directi uses Casandra to save HeatMaps (UI activities)
- Instagram - Social Media Posts
- Twillo - Send Message API Design Problem
- Twitter Hit Counter
- Discord Migrates Trillions of Messages from Cassandra to ScyllaDB
- Facebook originally built Cassandra to power its Inbox search feature, with over 200 nodes deployed.
- This was abandoned in late 2010 when they built Facebook Messaging platform on Apache HBase as they found Cassandra's eventual consistency model to be a difficult pattern.
- Read more.