Google Cloud NoSQL: Firestore, Datastore, and Bigtable

Written by Yifat Perry, Technical Content Manager | Mar 18, 2021 8:33:24 AM

What is Google Cloud NoSQL?

Google’s cloud platform (GCP) offers a wide variety of database services. Of these, its NoSQL database services are unique in their ability to rapidly process very large, dynamic datasets with no fixed schema.

This post describes GCP’s main NoSQL managed database services, their key features, and important best practices.

This is part of our series of articles on Google Cloud database services.

In this article, you will learn:

Google Cloud NoSQL Database Options
What is Google Cloud Firestore?
What is Google Cloud Datastore?
What is Google Cloud Bigtable?
Google Cloud NoSQL with NetApp Cloud Volumes ONTAP

Google Cloud NoSQL Database Options

Google Cloud provides the following NoSQL database services:

Cloud Firestore—a document-oriented database storing key-value pairs. Optimized for small documents and easy to use with mobile applications.
Cloud Datastore—a document database built for automatic scaling, high performance, and ease of use.
Cloud Bigtable—an alternative to HBase, a columnar database system running on HDFS. Suitable for high throughput applications.
MongoDB Atlas—a managed MongoDB service, hosted by Google Cloud and built by the original makers of MongoDB.

In the rest of this article we cover the first three database offerings in more detail. You can learn more about MongoDB Atlas here.

What is Google Cloud Firestore?

Cloud Firestore is a NoSQL database that stores data in documents, arranged into collections.

Firestore is optimized for such collections of small documents. Each of these documents includes a set of key-value pairs. Documents may contain subcollections and nested objects, including strings, complex objects, lists, or other primitive fields.

Firestone creates these documents and collections implicitly. That is, when you assign data to a document or collection, Firestone creates the document or collection if it does not exist.

Features

Key features of Google Cloud Firestore include:

Automatic scaling—Firestore scales data storage automatically, retaining the same query performance regardless of database size.
Serverless development—networking and authentication are handled using client side SDKs, with less need to coding.
Backend security rules—enabling complex validation rules on data.
Offline support—databases can be accessed from user devices while offline on web browsers, iOS and Android.
Datastore mode—support for the Cloud Datastore API, enabling applications that currently work with Google Cloud Datastore to switch to Firestore without code changes.

Best Practices

Here are a few best practices that will help you make the most of Cloud Firestore:

Database Location

Select a database location closest to your users, to reduce latency. You can select two types of locations:

Multi-regional location—for improved availability, deploys the database in at least two Google Cloud regions.
Regional location—provides lower cost and better write latency (because there is no need to synchronize with another region)

Indexes

Minimize the number of indexes—too many indexes can increase write latency and storage costs. Do not index numeric values that increase monotonically, because this can impact latency in high throughput applications.

Optimizing Write Performance

In general, when using Firestore, write to a document no more than once per second. If possible, use asynchronous calls, because they have low latency impact. If there is no data dependency, there is no need to wait until a lookup completes before running a query.

Related content: read our guides to: Google Cloud Firestore

What is Google Cloud Datastore?

Cloud Datastore offers high performance and automatic scaling, with a simplified user experience. It is perfect for applications that must process structured data at large scale. Datastore allows you to store and query ACID transactions, enabling rollback of complex multi-step operations. Behind the scenes, it stores data in Google Bigtable.

Features

Cloud Datastore’s key features include:

Atomic transactions—executing operation sets which must succeed in their entirety, or be rolled back.
High read and write availability—uses a highly redundant design to minimize the impact of component failure.
Automatic scalability—highly distributed with scaling transparently managed.
High performance—mixes index and query constraints to ensure that queries scale according to result-set size rather than the size of the data-set.
Flexible storage and querying of data—besides offering a SQL-like query language, maps naturally to object-oriented scripting languages.

Best Practices

Here are a few best practices that can help you work with Cloud Datastore more effectively:

API Calls

Use batch operations—these are more efficient because they use the same overhead as one operation.
Roll back failed transactions—if there is another request for the same resources, this will improve the latency of the retry operation.
Use asynchronous calls—like in Firestore, prefer to use asynchronous calls if there is no data dependency of the result of a query.

Entities

Do not write to an entity group more than once per second, to avoid timeouts for strongly consistent reads, which will negatively affect performance for your application. If you are using batch writes or transactions, these count as one write operation.

Sharding and Replication

For hot Datastore keys, you can use sharding or replication to read keys at a higher rate than allowed by Bigtable, the underlying storage. For example, you replicate keys three times to enable 3X faster read throughput. Or you can use sharding to break up the key range into several parts.

What is Google Cloud Bigtable?

Cloud Bigtable is a managed NoSQL database, intended for analytics and operational workloads. It is an alternative to HBase, a columnar database system that runs on HDFS.

Cloud Bigtable is suitable for applications that need high throughput and scalability, for values under 10 MB in size. It can also be used for batch MapReduce, stream processing, and machine learning.

Cloud Bigtable has several advantages over a traditional HBase deployment.

Scalability—scales linearly in proportion to the number of machines in the cluster. HBase has a limit in cluster size, beyond which read/write throughput does not improve.
Automation—handles upgrades and restarts automatically, and ensures data durability via replication. While in HBase you would need to manage replicas and regions, in Cloud Bigtable you only need to design table schemas and add a second cluster to instances, and replication is configured automatically.
Dynamic cluster resizing— can grow and shrink cluster size on demand. It takes only a few minutes to rebalance performance across nodes in the cluster. In HBase, cluster resizing is a complex operation that requires downtime.

Related content: read our guide to Google Cloud Data Lake

How it Works

Cloud Bigtable can support low-latency access to terabytes or even petabytes of single-keyed data. Because it is built as a sparsely populated table, it is able to scale to thousands of columns and billions of rows. A row key indexes each row as a single value, enabling very high read and write throughput, making it ideal as a MapReduce operations data source.

Cloud Bigtable has several client libraries including an extension of Apache HBase for Java. This allows it to integrate with multiple open source big data solutions.

Best Practices

Here are some best practices to make better use of Cloud Bigtable as an HBase replacement:

Trade-off Between High Throughput and Low Latency

When planning Cloud Bigtable capacity, consider your goals—you can optimize for throughput and reduce latency, or vice versa. Cloud Bigtable offers optimal latency when CPU load is under 70%, or preferably exactly 50%. If latency is less important, you can load CPUs to higher than 70%, to get higher throughput for the same number of cluster nodes.

Tables and Schemas

If you have several datasets with a similar schema, store them in one big table for better performance. You use a unique row key prefix to ensure datasets are stacked one after the other in the table.

Column Families

If you have rows with multiple related values, it is best to group those columns into a column family. Grouping data as closely as possible avoids the need for complex filters—you can get exactly the data you need in a single read request.

Google Cloud NoSQL with NetApp Cloud Volumes ONTAP

NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP supports up to a capacity of 368TB, and supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.

Cloud Volumes ONTAP supports advanced features for managing SAN storage in the cloud, catering for NoSQL database systems, as well as NFS shares that can be accessed directly from cloud big data analytics clusters.

In addition, Cloud Volumes ONTAP provides storage efficiency features, including thin provisioning, data compression and deduplication, and data tiering, reducing the storage footprint and costs by up to 70%.

View full post