DocumentDB is a managed service by Amazon Web Services (AWS) that provides access to the popular NoSQL database engine, MongoDB. In this article you will learn about the key features of DocumentDB, its architecture, and best practices for successfully deploying MongoDB on AWS.
Amazon DocumentDB is an AWS database service that is fully managed and compatible with MongoDB. You can use this service to migrate and host MongoDB workloads and application data while working with native Mongo code, tools, and drivers.
Through DocumentDB you gain access to the following features:
In this article, you will learn:
The DocumentDB architecture is based on clusters of MongoDB-compatible database instances managed by an AWS cluster volume. Each DocumentDB cluster you create contains:
The instances that your cluster includes can serve one of two roles:
When provisioning cluster instances, you can use mixed instance classes to meet the needs of your individual regions or uses.
Image Source: AWS
During write operations, your primary instance writes data to your cluster volume and replicates the state of the write to your replicas. It does not replicate the data to replicas. Because replicas do not write, they have eventual consistency. Eventual consistency does not satisfy the ACID properties of transactional databases. It means that actions may be inconsistent at a given point in time but will be consistent “eventually”.
Eventual consistency on the primary instance will typically be less than 100 milliseconds. However, if you are performing many writes at once, this lag may increase.
DocumentDB and DynamoDB are both services that you can use as document databases. Both provide data portability and support migration through the AWS Database Migration Service. Both services also provide encryption with AWS Key Management Service and auditing with CloudTrail, CloudFormation, and VPC Flow Logs.
Despite these similarities, the use cases for the two services differ slightly. DynamoDB is both a document database and a key-value database. It is optimized for applications that rely on unique keys, but it is not as good at scan or query operations. In contrast, DocumentDB allows more flexible data indexing and is optimized for queries.
Another difference is the cost structure of the services. DynamoDB pricing is according to read/write units with on-demand, provisioned, or reserve pricing models. You can maintain small capacities to keep costs low and the first 25GB of storage are free.
In contrast, DocumentDB is based on a pay per instance pricing model. The smallest available instance sizes are the r4.large or r5.large instances. You can provision these instances or use bill per hour pricing.
Related content: read our guide to all AWS database as a service offerings
When implementing DocumentDB the following best practices can help you ensure that your database provides high performance, secure data, and the lowest possible costs.
Backups created for DocumentDB are incremental and continuous. This enables you to restore to any point in time in the retention period (up to 35 days). Setting a minimum retention period can help you ensure that backups are available when you need them, and that data meets compliance requirements.
Indexing enables you to decrease query times by making it easier to locate the data you need. However, when documents are indexed, each write or modification requires the index to be updated. This means that write times increase according to the number of indexes that must be updated each time. Indexes can also increase I/O operations and storage use.
Minimizing the number of indexes you create, can help you speed query times without drastically affecting write times. In general, you are recommended to use no more than five indexes per data collection.
Auditing through Log Exports is disabled by default in your DocumentDB clusters, but you can enable it at any time. Once enabled, your database sends user management events, authorization, authentication, and Data Definition Language (DDL) data to CloudWatch Logs.
You can then access CloudWatch Logs to audit, analyze, and visualize data as needed. You can also export data from CloudWatch Logs to third-party monitoring solutions if desired.
After deployment, you should verify that encryption is enabled for your clusters. Once enabled, encryption secures your data, replicas, logs, indexes, and backups. This encryption is managed transparently, and key management can be modified through AWS KMS.
There are a few different methods for optimizing cost in DocumentDB. To start, you should set up billing alerts at 75% and 50% of your budget per month. This can alert you before you go over costs and enable you to evaluate potential wasted resources.
When evaluating resource use, remember that you can scale storage and compute independently. For example, if you do not need AWS high availability for all clusters (for example development or test clusters) you can reduce your number of instances. Additionally, for these clusters you might consider only leaving instances live when in use.
Another aspect to consider is whether you have enabled change streams or time to live (TTL) features. When live, these features incur costs any time data is read, written, or deleted. If your application doesn’t require these features, you should deactivate them.
NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP supports up to a capacity of 368TB, and supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.
In particular, Cloud Volumes ONTAP helps in addressing database workloads challenges in the cloud, and filling the gap between your cloud-based database capabilities and the public cloud resources it runs on.
Cloud Volumes ONTAP supports advanced features for managing SAN storage in the cloud, catering for NoSQL database systems, as well as NFS shares that can be accessed directly from cloud big data analytics clusters.
In addition, the built-in storage efficiency features have a direct impact on costs for NoSQL in cloud deployments. The data protection and flexibility provided by features such as snapshots and data cloning give NoSQL database administrators and big data engineers the power to manage large volumes of data effectively.