Should You Still Be Using Google Cloud Datastore?

Written by Yifat Perry, Technical Content Manager | Jul 4, 2022 3:54:18 PM

What Is Google Cloud Datastore?

Google Cloud Datastore is a highly scalable, managed NoSQL database hosted on the Google Cloud Platform. It provides a durable, highly available data store for applications, which handles aspects like scaling, replication, and sharding fully automatically.

Google Cloud Datastore provides convenient SQL-like queries, supports ACID transactions, and enables indexing to improve query performance.

Google has released Firestore, a new version of Datastore with several improvements and additional features. Existing Datastore users can access these features by cheating a database using “Firestore in Datastore” mode. In future, existing Datastore databases will be automatically upgraded to Firestore.

Related content: Read our guides to Cloud Firestore and Google Cloud NoSQL options

In this article:

Cloud Datastore Features
Cloud Datastore vs. Firestore and Migration to Firestore
Google Cloud Datastore Best Practices
Google Cloud Storage Optimization with Cloud Volumes ONTAP

Cloud Datastore Features

Google Cloud Datastore is a NoSQL database designed for high performance, auto-scaling, and easy software development. It includes the following features:

Atomic transactions- executes a series of operations with all successful or none occurring.
High read and write availability- runs in multiple data centers, providing redundancy that helps reduce the impact from any point of failure.
High performance and scalability- the distributed architecture of Datastore enables automated scaling management. It uses a combination of query constraints and indexes to ensure queries scale together with the result set’s size rather than the dataset’s site.
Flexible data querying and storage- maps naturally to scripting and object-oriented languages, exposed to applications via multiple clients. Datastore also offers a SQL-like querying language.
Consistency- ensures that all entity lookups using ancestor and key queries receive strongly-consistent data while other queries have eventually-consistent data. These consistency types allow applications to deliver robust user experiences when handling many users and large volumes of data.
At-rest encryption - automatically encrypts data before writing it to disk. Datastore also automatically decrypts data when authorized users read it.
Fully managed service without downtime- Google administers Datastore as a service, allowing users to focus on building their applications. This service is available even during planned upgrades.

Cloud Datastore vs Firestore: Should You Migrate?

Firestore is a newer version of Datastore and is backward compatible with Datastore (except for its new features). New features added in Firestore include:

Real-time updates
Ability to automatically scale to millions of concurrent clients
Strongly consistent storage layer
Collection and document data model
Mobile and web client libraries

Firestore provides two modes:

Firestore Native Mode—allows you to use all the new features. Suitable for new applications.
Firestore in Datastore—uses Datastore system behavior but accesses Firestore's storage layer, removing several Datastore limitations: eventual consistency, requirement for ancestor queries in transactions, and transactions limited to 25 entity groups, and writes limited to 1 second. Suitable for existing Datastore implementations.

Google provides an automated migration path for upgrading traditional Datastore to Firestore in Datastore.

A few important notes when using Firestore in Datastore mode:

The database accepts Datastore API requests and denies Firestore API requests.
The database uses Datastore indexes, not Firestore indexes.
The database supports Datastore client libraries and not Firestore client libraries.
Google Cloud console uses the Datastore viewer when accessing the database.

Related content: Read our guide to Google Cloud migration tools

Google Cloud Datastore Best Practices

The following best practices will help you use your database more effectively whether you migrate to Firestore in Datastore mode or continue using Cloud Datastore.

Limit Updates to Entity Groups

Don’t update a single Datastore entity group too quickly. Google recommends that Datastore users design applications not to require more than one update per second to each entity group. Entities without a parent or children constitute whole entity groups. Rapidly updating an entity creates problems for Datastore writes, such as high latency and timeouts.

Apply Replication or Sharding

Replication and sharding are useful for hotkeys in Datastore. Replication allows you to read portions of a key range at higher rates. Sharding lets you write to key range portions at higher rates, breaking up entities into smaller parts.

Avoid common sharding mistakes such as:

Using time prefixes- unsplit portions of the key range become hotspots during rollovers to new prefixes.
Sharding the hot entities- sharding only part of the overall entities’ risks creating insufficient space in between the hottest entities.

Handle the Datastore Index Correctly

Use the following practices to manage your Datastore index:

Exclude properties not required for queries- redundant properties can cause latency and increase storage demands.
Limit composite indexes- using too many composite indexes can increase latency when achieving consistency. You can use BigQuery to execute an ad hoc query.
Avoid indexing properties with monotonic value increases- this index can cause hotspots impacting latency for applications with high write and read rates.

Avoid Deleting Too Many Datastore Entities

Bigtable regularly rewrites tables, removing deleted entries and reorganizing data to make writes and reads more efficient (i.e., compaction). Deleting large numbers of entities in a small key range makes queries slower before Bigtable completes the compaction process.

Avoid using timestamp values for indexed fields to specify entities’ expiration times - retrieving the expired entities would require querying the indexed field. Sharder queries help improve performance by prepending fixed-length strings to expiration timestamps, making locating entities with the same timestamp easier.

Alternatively, use a generation number prepended to the entities’ timestamp - this is a regularly updated global counter. This approach makes it easier to sort entities. Undeleted expired entities should have incremented generated numbers - queries must address new generations to ensure optimal performance.

Google Cloud Storage Optimization with Cloud Volumes ONTAP

NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP capacity can scale into the petabytes, and it supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.

In particular, Cloud Volumes ONTAP helps in addressing database workloads challenges in the cloud, and filling the gap between your cloud-based database capabilities and the public cloud resources it runs on. Learn more in these Cloud Volumes ONTAP Databases Case Studies.

View full post