Google’s cloud platform (GCP) offers a wide variety of database services. Of these, its NoSQL database services are unique in their ability to rapidly process very large, dynamic datasets with no fixed schema.
This post describes GCP’s main NoSQL managed database services, their key features, and important best practices.
This is part of our series of articles on Google Cloud database services.
In this article, you will learn:
Google Cloud provides the following NoSQL database services:
In the rest of this article we cover the first three database offerings in more detail. You can learn more about MongoDB Atlas here.
Related content: read our guides to:
Cloud Firestore is a NoSQL database that stores data in documents, arranged into collections.
Firestore is optimized for such collections of small documents. Each of these documents includes a set of key-value pairs. Documents may contain subcollections and nested objects, including strings, complex objects, lists, or other primitive fields.
Firestone creates these documents and collections implicitly. That is, when you assign data to a document or collection, Firestone creates the document or collection if it does not exist.
Key features of Google Cloud Firestore include:
Here are a few best practices that will help you make the most of Cloud Firestore:
Select a database location closest to your users, to reduce latency. You can select two types of locations:
Minimize the number of indexes—too many indexes can increase write latency and storage costs. Do not index numeric values that increase monotonically, because this can impact latency in high throughput applications.
In general, when using Firestore, write to a document no more than once per second. If possible, use asynchronous calls, because they have low latency impact. If there is no data dependency, there is no need to wait until a lookup completes before running a query.
Related content: read our guides to: Google Cloud Firestore
Cloud Datastore offers high performance and automatic scaling, with a simplified user experience. It is perfect for applications that must process structured data at large scale. Datastore allows you to store and query ACID transactions, enabling rollback of complex multi-step operations. Behind the scenes, it stores data in Google Bigtable.
Cloud Datastore’s key features include:
Here are a few best practices that can help you work with Cloud Datastore more effectively:
Do not write to an entity group more than once per second, to avoid timeouts for strongly consistent reads, which will negatively affect performance for your application. If you are using batch writes or transactions, these count as one write operation.
For hot Datastore keys, you can use sharding or replication to read keys at a higher rate than allowed by Bigtable, the underlying storage. For example, you replicate keys three times to enable 3X faster read throughput. Or you can use sharding to break up the key range into several parts.
Cloud Bigtable is a managed NoSQL database, intended for analytics and operational workloads. It is an alternative to HBase, a columnar database system that runs on HDFS.
Cloud Bigtable is suitable for applications that need high throughput and scalability, for values under 10 MB in size. It can also be used for batch MapReduce, stream processing, and machine learning.
Cloud Bigtable has several advantages over a traditional HBase deployment.
Related content: read our guide to Google Cloud Data Lake
Cloud Bigtable can support low-latency access to terabytes or even petabytes of single-keyed data. Because it is built as a sparsely populated table, it is able to scale to thousands of columns and billions of rows. A row key indexes each row as a single value, enabling very high read and write throughput, making it ideal as a MapReduce operations data source.
Cloud Bigtable has several client libraries including an extension of Apache HBase for Java. This allows it to integrate with multiple open source big data solutions.
When planning Cloud Bigtable capacity, consider your goals—you can optimize for throughput and reduce latency, or vice versa. Cloud Bigtable offers optimal latency when CPU load is under 70%, or preferably exactly 50%. If latency is less important, you can load CPUs to higher than 70%, to get higher throughput for the same number of cluster nodes.
If you have several datasets with a similar schema, store them in one big table for better performance. You use a unique row key prefix to ensure datasets are stacked one after the other in the table.
If you have rows with multiple related values, it is best to group those columns into a column family. Grouping data as closely as possible avoids the need for complex filters—you can get exactly the data you need in a single read request.
NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP supports up to a capacity of 368TB, and supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.
Cloud Volumes ONTAP supports advanced features for managing SAN storage in the cloud, catering for NoSQL database systems, as well as NFS shares that can be accessed directly from cloud big data analytics clusters.
In addition, Cloud Volumes ONTAP provides storage efficiency features, including thin provisioning, data compression and deduplication, and data tiering, reducing the storage footprint and costs by up to 70%.