At any time, your network and applications are vulnerable to events that disrupt services, ranging from network outages or a critical bug introduced into your application, to a natural disaster that damages your physical infrastructure. To help protect your systems and business operations during such events, it is essential to have a tested, effective disaster recovery (DR) plan.
Google Cloud offers a number of features that are useful for planning your DR, including:
This is part of our series of articles about Google Cloud backup.
In this article:
Google doesn’t offer any products specifically intended for disaster recovery, but it does offer guidance for teams building a cloud DR system. Google Cloud provides several products and features that are useful as building blocks for DR architectures.
Compute Engine is the driver of Google Cloud, providing virtual machine (VM) instances, as well as a number of features you can leverage for your DR plan. For example, you can set the delete protection flag to prevent the accidental deletion of VM instances.
You can use Google Cloud Storage to store objects like backup files in various storage classes. For DR, you can leverage lower-cost classes like Nearline storage (as well as Coldline and Archive) to save on storage costs while enabling periodic DR stress testing. Note that retrieving data can incur extra costs.
A Google Cloud Filestore instance is a fully-managed network file system (NFS) server for applications that run on Google Kubernetes Engine (GKE) clusters or Compute Engine instances. This is helpful for disaster recovery, as applications can switch to Filestore in a failover region to restore Filestore volume access before a restore process is completed.
Cloud Load Balancing distributes requests across multiple instances to provide Compute Engine high availability. It can be configured with instance health checks to prevent traffic from being routed to failing instances.
This service mesh traffic control plane can handle the configuration of proxies running in GKE and Compute Engine. You can make a service highly available by deploying it in multiple regions. Traffic Director initiates failover proxy configuration to redirect traffic from unhealthy instances.
Cloud DNS allows you to programmatically manage DNS entries in an automated recovery process. Cloud DNS uses redundant locations globally via an Anycast name server network for low latency and high availability.
Cloud Monitoring tracks events and metrics (with metadata) from Google Cloud and various application components. With the proper configurations, it can send alerts to third-party apps and tools that trigger automated DR processes in response to the alerts.
Deployment Manager provides templates for defining Google Cloud environments. The templates allow you to easily create or dismantle your environment with a simple command.
There are several ways you can leverage Google Cloud to back up your data if you have an on-prem production environment, with the cloud serving as a recovery site. The following are two potential solutions.
You can back up on-premises data to Cloud Storage with Transfer Service. This is useful given the complexity of transferring large volumes of data across networks and the associated risk of data loss. This managed service is reliable and scalable, allowing you to transfer data from a data center to Cloud Storage buckets.
You can use a partner gateway solution to back up your data to Cloud Storage. Integrated third-party backup and recovery solutions can apply tiered storage strategies to prioritize recent backups while saving costs on older backups (for instance by using slower storage tiers like Archive). Backup data can be recovered in the event of a failure, with a DR environment serving production traffic while the production environment is being restored.
A partner gateway facilitates the transfer of data from on-premises to cloud storage, as illustrated in the following diagram.
If both your production and disaster recovery environments run in Google Cloud, you can leverage storage tiering for data backups. You can migrate backup data to cheaper storage tiers, because the likelihood of accessing it is lower. Nearline, Coldline and Archive are useful for storing infrequently used data, but they require minimum storage durations and have additional costs for retrieving data.
The storage tiers for a production workload in Google Cloud are illustrated in the following diagram.
If your production environment runs in another cloud, you can still use Google Cloud as a recovery site for your disaster recovery plan. It is common for DR strategies to involve transferring data from one object store to another.
You can use Storage Transfer Service to transfer data to Google Cloud from Amazon S3. You can configure transfer jobs to periodically synchronize the data source and data sink, and apply filters (i.e. by file name or creation date) to control how and when data is transferred.
You can use the Boto Python tool to transfer data to Google Cloud Storage from AWS. You can install it as a plugin via the gsutil command-line tool.
NetApp Cloud Volumes ONTAP, the leading enterprise-grade storage management solution, delivers secure, proven storage management services on AWS, Azure and Google Cloud. Cloud Volumes ONTAP capacity can scale into the petabytes, and it supports various use cases such as file services, databases, DevOps or any other enterprise workload, with a strong set of features including high availability, data protection, storage efficiencies, Kubernetes integration, and more.
Cloud Volumes makes it easier and faster to create an up-to-date secondary copy of your ONTAP on Google Cloud leveraging NetApp Snapshot™ technology and Cloud Manager, while paying less for updates and storage using Cloud Volumes ONTAP storage efficiency features and data tiering.
Download our guide to Disaster Recovery in Google Cloud with Cloud Volumes ONTAP to learn more.