Apache Cassandra can be an integral part of your AWS big data workloads. In this article we’ll take a look at Apache Cassandra and the two deployment options for running it on AWS: as a managed service or self-managed.
Read on below to find out:
Initially developed at Facebook for their inbox search feature, Apache Cassandra is currently an open-source project developed by Apache Software Foundation. Designed as a combination of Google’s Bigtable storage and data engine with Amazon’s Dynamo distributed storage and replication techniques in mind, Apache Cassandra is a highly scalable distributed NoSQL database, capable of handling large amounts of data with sub milli-second performance.
With a masterless architecture, meaning that every node is the same, Apache Cassandra is linearly scalable whether using commodity hardware or any cloud infrastructure. Thanks to automatic data replication and high availability features where failed nodes can be replaced without any downtime or performance impact, Apache Cassandra provides a fault tolerant platform for mission critical data scenarios. It also comes with its own CQL (Cassandra Query Language) that is similar to SQL (Structured Query Language) for your database operations.
Thanks to its distributed nature, linear scalability, and performance, Apache Cassandra addresses a wide variety of use cases. It is often used when global distributed data is necessary, such as in an ecommerce platform where data is closer to customers providing lower latency and storing time-series data, such as logging or chat history.
In order to run Apache Cassandra on AWS, you have two different options to choose from. You can either run your Apache Cassandra workloads on a managed service, such as the native Amazon Keyspaces or Datastax Astra, or deploy Apache Cassandra on AWS compute services and manage it by yourself.
This is a fairly common question. While both DynamoDB and Cassandra are NoSQL databases, the answer is no. DynamoDB is a proprietary engine technology from AWS. For customers that want to use a Cassandra-like managed service, AWS suggests Amazon Keyspaces.
Amazon Keyspaces is a managed database service from AWS, compatible with Apache Cassandra, that allows you to easily set up and scale Cassandra workloads without the administrative overhead of server management. As it is compatible with the most common CQL API’s and Cassandra drivers, you’ll be able to easily update your existing applications to start using Amazon Keyspaces.
However, even being compatible with Apache Cassandra, there are some key aspects to be aware of when using Amazon Keyspaces:
You can also subscribe to Datastax Astra available at the AWS Marketplace. Astra is a fully managed DBaaS (Database-as-a-Service) from Datastax, built on Apache Cassandra, that allows you to easily deploy your database on AWS cloud without the operational overhead, and with exclusive features such as SAI (Storage-Attached Index) and multiple APIs that simplify your database operations and application development. You will be able to choose from multiple database instances with different compute and storage capacity that best fit your workloads, and multiple support packages for direct support from Datastax.
Self-managing Apache Cassandra on AWS is possible with the deployment of your clusters on Amazon Elastic Compute Cloud Instances (Amazon EC2). Amazon EC2 offers a wide variety of instances with different compute and network capacity, and different volume types for storage from Amazon Elastic Block Store (EBS), enabling you to pick the most appropriate configuration for your particular use cases.
When selecting a self-managed option, since you are not bound to the restrictions of a managed database service, you are able to install Cassandra using the official instructions. From the AWS perspective, you can still leverage services such as the virtual computing (EC2), storage (EBS), monitoring and logging (CloudWatch), among others. Alternatively, you can also leverage a container-based approach using the AWS ECS or EKS services.
There are a number of benefits to the self-managed option for Cassandra deployment:
Amazon Keyspaces is a good option when you are looking to use your existing Apache Cassandra workloads in AWS without the operational overhead of server management, however it does come with its own limitations. On the other hand, by self-managing Apache Cassandra clusters on EC2, you will benefit from full control and flexibility, and access to all of Apache Cassandra features has to offer. It does come with the added operational overhead.
But is there another option?
Cloud Volumes ONTAP, the cloud-based data management solution from NetApp, is a popular option to further enhance your experience when choosing to self-manage Apache Cassandra on AWS.
As a data management platform that works on top of AWS, Azure, and Google Cloud IaaS resources, Cloud Volumes ONTAP lets you avoid the managed service limitations for Apache Cassandra deployment, while making it easy to deploy multi cloud and hybrid cluster architectures, fulfilling complex scenarios.
But you’re getting more than you would then just by running Cassandra on raw EC2 instances. Cloud Volumes ONTAP provides essential enterprise-grade features that aren’t native to AWS, including:
When using Cloud Volumes ONTAP as a data management solution for your self-managed Cassandra clusters, you will benefit from additional features that will lower the operational overhead while also reducing storage costs and increasing performance.