hamburger icon close icon

High availability across multiple zones with Amazon FSx for NetApp ONTAP

Building infrastructure that aligns with a recovery point objective (RPO) of zero and the lowest possible recovery time objective (RTO) isn’t easy. One reason is that an entire data center or an availability zone (AZ) can fail. The solution is for your infrastructure to span multiple AZs, which can be a challenge.

There are methods for achieving this level of availability, but building it on your own takes time and careful management, and this approach can cause performance and latency issues.

In this post, we discuss how Amazon FSx for NetApp ONTAP provides a better solution: a built-in option for multi-AZ deployment that provides high availability on AWS.

Use these links to jump down to each section:

Multi-AZ high availability isn’t easy

Enterprise-level deployments need systems that can handle failures—from individual components to entire AZs—without losing data (RPO=0). These systems also need to quickly recover from disruptions to maintain minimal RTO, and that requires resilience across hardware, networks, and geographic locations.

There are methods for achieving this kind of multi-AZ high availability on AWS. However, to build this kind of infrastructure, you would need to architect the mechanisms for real-time data synchronization, seamless data security, and recovery across AZs. These operations all grow more complex when you add another zone.

The underlying tasks are complex:

  • Redundancy through synchronized replication. To achieve absolute redundancy, your data needs to be synchronously mirrored across zones—a process typically managed by an intelligent orchestration layer. The setup must guarantee that replication is accurate and timely, eliminating the risk of data loss to consistently achieve an RPO of 0.
  • Addressing latency in data access. Real-time data synchronization across zones makes it hard to overcome latency. Although caching and content delivery networks (CDN) can partially offset this latency, the overarching challenge lies in developing a high-performance network with dedicated interconnects, capable of consistently reducing latency for cross-zone communication.
  • Failover dynamics. Achieving a seamless transition between zones presents a nuanced challenge, particularly with the goal of a minimal RTO. The storage framework needs to be resilient and able to mirror data—without discrepancies—between environments. It must also be able to failover and failback without losing data or interrupting the user experience.
  • Data protection with no point of failure.
    Building your own multi-AZ architecture introduces new components to your infrastructure—and any of them can become an additional point of failure. To avoid exposing or losing data, it’s important to have solutions in place for point-in-time snapshots, consistent backups, and seamless disaster recovery (DR) processes.

    You’ll also need uniform security standards such as encryption in transit and at rest, access controls, ransomware protection, and write-once, read-many (WORM) data locks.
  • Containing costs and data copies. With a redundant system that spans multiple AZs, there are two major cost concerns: high overheads and redundant costs. Not only will you spend significant resources to build and maintain a multi-AZ system, but after it’s set up, all the costs of running a single deployment will be duplicated—from the data being stored to the network traffic between AZs.

Considering these factors, multi-AZ high availability isn’t an easy solution to configure on your own. But on AWS, there’s an easier option: built-in multi-AZ high availability with FSx for ONTAP.

Achieving multi-AZ high availability with FSx for ONTAP

FSx for ONTAP is a fully managed service from AWS. Using signature NetApp® ONTAP® data management features, it delivers high-performance shared storage options for files and block storage. One of these features is multi-AZ high availability.

Infrastructure resources for FSx for ONTAP nodes are provisioned in different AZs within the same AWS Region, and data is synchronously mirrored across both nodes. The write operations are completed only after the data has been added to both nodes—that way, data isn’t lost if an outage occurs.

If a disruption occurs—even something as massive as an entire AZ failure—FSx for ONTAP automatically and seamlessly fails over to the healthy FSx for ONTAP node, and continuously serves data.

This operating node can continue to serve all data requests from its own independent copy of the data, allowing you to maintain an RPO of 0. When the failed node recovers, it’s automatically refreshed with the up-to-date data from the healthy node and fails back to dual-mode operation.

ONTAP-multi-AZ-High-Availability-architectureThe FSx for ONTAP multi-AZ High Availability architecture.

If you don’t need such a high level of availability, you can opt for a dual-node structure that resides within a single AZ.

The benefits of multi-AZ High Availability deployment with FSx for ONTAP

When you use FSx for ONTAP for multi-AZ deployments, you get the following benefits:

  • High availability: Achieve an RPO of 0 by synchronously mirroring data across multiple AZs in real time, preventing data loss even during disruptions. This contributes to 99.99% availability, which is crucial for mission-critical applications.

    FSx for ONTAP stores mirrored replicas of your data in multiple AZs simultaneously. If one AZ fails, the system automatically routes data access to the replica in the other AZ.
  • Data resilience: Through its automatic and seamless failover and failback processes, FSx for ONTAP can help you achieve an RTO of less than 60 seconds. The solution automatically switches over to the redundant node (failover) and resumes to dual-node operation (failback) when the failed node recovers.
  • Robust security: In addition to enforcing strict access controls, FSx for ONTAP also encrypts data at rest and in transit. The service also secures data with immutable NetApp Snapshot™ copies to prevent unauthorized data changes, and it provides malware protection to safeguard against cyberthreats.
  • Comprehensive data protection: Local Snapshot copies enable quick data recovery, whereas optimized backup and cross-region DR options keep data safe across the board.
  • Cost-optimized data copies: FSx for ONTAP optimizes costs through ONTAP storage efficiency features. Data deduplication, compression, and compaction reduce storage usage and costs up to 65%, and automatic tiering of infrequently used data to capacity tier reduces charges for premium storage on SSDs. These features don’t compromise the data’s availability in any way.

FSx for ONTAP helps your operations withstand the worst outages. Here’s how one company is taking advantage of that.

How a software company maintains multi-AZ high availability with FSx for ONTAP 

One company using FSx for ONTAP to keep business operations running smoothly is a software developer of workforce engagement solutions. This global company’s software-as-a-service (SaaS) technology offers tools for efficient workforce management and compliant customer engagement—and those tools need high availability.

The company needed a unified storage solution that could provide scalability, maintain data integrity across multi-AZ setups, and help it adhere to stringent compliance and security standards.

The solution was to shift to the cloud and adopt FSx for ONTAP, which offers several benefits:

  • Enterprise-grade resilience. The multi-AZ high availability and cross-region DR features of FSx for ONTAP keep the company’s data protected.
  • Simplified operations. With its move to FSx for ONTAP, the company eliminated the need for hands-on management of its storage infrastructure. Now, it has a single fully managed storage service that handles both its modernized cloud-native Kubernetes workloads and the SaaS applications migrated from its legacy systems.
  • Cost efficiency. With its storage efficiency features, FSx for ONTAP offered considerable cost savings, reducing the company’s total cost of ownership (TCO) for cloud storage.

An easier way to maintain business continuity

You need to be sure your applications are always available and secure. FSx for ONTAP can help you do that, even in the worst outages.

With the multi-AZ deployment option, FSx for ONTAP achieves RPO=0 and RTO<60 seconds—right out of the box. That means your operations aren’t affected by major disruptions, and there’s no additional overhead for you to worry about.

New call-to-action

Yifat Perry, Technical Content Manager

Technical Content Manager