Avoid These 5 POFs by Proper Disaster Recovery Plan Testing

Written by Yifat Perry, Technical Content Manager | Apr 16, 2018 8:24:38 AM

The aim of disaster recovery is for an organization to create a replica of its primary data, to which you can failover and continue operation during a failure, whether it is a network failure, storage failure, infrastructure failure, communication failure, virtual machine failure or an outage caused by human error.

The ideal cloud-based disaster recovery plan is one which is optimized and properly tested to make sure any points of failure (POFs) are discovered before a real disaster takes place. Many sites don’t do the proper disaster recovery testing and fall to these POFs.

In this article, we will discuss five points of failure for cloud-based disaster recovery architectures, show you how to avoid them, and how to support your DR setup and disaster recovery plan testing with NetApp’s Cloud Volumes ONTAP (formerly ONTAP Cloud).

1. Your Data Migration Costs Too Much and Works Too Slowly

Creating a secondary copy involves migrating the primary data to the cloud and performing asynchronous replication of the data to keep the secondary copy up to date.

However, in the case of a data migration strategy that includes the live migration of large amounts of data, we may have to compromise on the speed and cost. The larger the size of the architecture that has to be replicated in the cloud, the greater the costs involved. Latency will also be affected, since replication involves making ongoing connections from the primary data and the cloud-based disaster recovery copy.

The possible POF here is that if the replication process is too lengthy it will leave your primary system vulnerable without a DR copy that meets your SLAs. Plus, it will rack up costs.

Cloud Volumes ONTAP helps you in this situation by providing SnapMirror® data replication. SnapMirror is a fast and efficient data replication method for backup and disaster recovery. It minimizes latency by reducing network traffic and transporting only the changed data blocks. Only updating the delta means that SnapMirror is also cost-effective when it comes to transfers and storage consumption.

2. Your Primary and DR Copies Aren’t Synced

Many companies will use a disaster recovery strategy that relies on transferring data from the data source to another dataset in the cloud using file copy tools or custom scripts. But those tools are generally best-suited for one-time migrations and scripts that involve a lot of manual effort.

The ideal DR solution is one in which data is continuously replicated and synchronized with the secondary dataset. Data replication scripts can do this, but they require investing huge resources in order to be created, thoroughly tested, and maintained. It is possible that, during multiple migrations, some data might not get synced and that could cause a POF as your DR data might be inconsistent or not up to date.

To make sure data synchronization problems are avoided, Cloud Volumes ONTAP users can leverage SnapMirror to have regularly scheduled syncs take place. You can schedule these syncs to be in line to ensure your data protection needs and SLAs are met.

3. You Haven’t Done Proper Disaster Recovery Plan Testing

The best disaster recovery solution is the one which works when a disaster actually takes place. Many times, disaster recovery solutions fail exactly when they’re needed the most.

The best way to prevent that from happening is to perform disaster recovery testing of the DR solution and strategies before a disaster takes place. For example, a test should be run on an exact replica of the entire site failing over to the DR copy to make sure it will work in a live scenario. If not, you face a potentially major POF.

Cloud Volumes ONTAP uses FlexClone® volumes to make easy and cost-efficient disaster recovery plan testing possible. FlexClone technology instantly provisions test environments at zero capacity: That means they don’t allocate additional storage and avoid additional costs. The FlexClone zero-cost speed of creation and general ease of use enables disaster recovery plan testing and assures that the data will be available and synchronized in case of a disaster.

4. Inefficient Disaster Recovery Failover and Failback processes and SLAs

Failover and failback processes come into play when considering your RPO and RTO.

Failover and failback processes are a key point of failure when we talk about disaster recovery architecture. The disaster recovery process should be designed in a way which allows you to failover to the secondary in case of disaster, and failback to the primary once the disaster is overcome. It’s also a process that needs to be included in any disaster recovery testing.

This to-and-from process must also take into account the loss of data. There is always some loss of data while the application is being failed-over or failed-back: The challenge is how to keep those losses minimal, keep the data consistent and get back to normal operation as quickly and smoothly as possible.

This demands a reliable, synchronized solution that provides the capability of syncing data continuously according the user’s pre-defined schedules. Failing to do so, this POF will keep your site offline for too long, causing you to miss your RTO and RPO.

With SnapMirror, Cloud Volumes ONTAP protects your data by replicating it from source volumes or qtrees to the destination volumes. SnapMirror can be used for FlexClone flexible volumes too. It works by migrating the copy of data from source to destination and continuously replicating thereafter any data that changes.

There are many important uses for SnapMirror besides disaster recovery, such as data migration, data analysis, data restoration, application testing, disaster recovery testing, load balancing, remote data access and more.

5. Inefficient Data Storage for Backup and Disaster Recovery

In disaster recovery solutions, people may end up spending more for the storage of replicated objects and the data transfer cost of the DR data copy.

It is important for the recovery solution to optimize storage and data transfer costs. There is a potential concern that if storage is so inefficiently managed, it will raise costs considerably. Additionally, a cost-optimized storage solution won’t compromise the speed and latency of the solution.

Cloud Volumes ONTAP does a number of things to avoid this concern with its storage efficiency features. Data compression, data deduplication, thin provisioning and data tiering to lower-cost storage types can help dramatically cut storage spending, whether using Azure storage or AWS storage.

SnapMirror is optimized as well. It transfers only data deltas, and these data deltas are compressed and deduplicated as well, further reducing transfer costs. In addition, NetApp FlexClone provides you with cost-efficient cloned copies for disaster recovery plan testing of the DR environment.

Final Note

Disaster recovery protects your application from any kind of downtime and increases the availability of the application. It helps to achieve your SLAs and provides reliable failover and failback mechanisms. But a POF can cause any DR solution to fail.

As a key NetApp cloud solution, Cloud Volumes ONTAP helps you avoid DR architecture POFs with its data protection and data replication technologies for less-expensive and more-effective disaster recovery testing and solutions.

View full post