hamburger icon close icon
Azure High Availability

Azure Resiliency Capabilities: A Deep Dive

As business functions are becoming more and more data driven, ensuring the availability and reliability of the storage layer is a top priority for enterprises. There are a number of enterprise-class data storage solutions that cater to these requirements for high availability on Azure.

What data resiliency features does Azure storage offer out of the box? In this blog we’ll explore the data resiliency capabilities of Azure storage, how they protect your data from regional and zonal failures, and help meet stringent availability demands.

Read on, or use the links below to jump to the sections on:

What Is Azure Resiliency?

Azure resiliency can be broadly defined as the capability of the platform to recover from failures. These failures can either be local to the resource, zonal or affecting entire Azure regions. While certain resiliency features are in-built into the platform, it is also expected that customers design the application architecture in a way that resources are deployed with no single point of failure.

For example, application servers need to be deployed across different zones, front-ended by load balancers and use zone-resilient storage.

How Do You Achieve Resilience in Azure?

To protect against planned and unplanned outages, Azure storage maintains several copies of your data by default. The number of data copies and their location depends on the storage type selected. Do note that a higher number of copies will have a direct impact on costs and hence this tradeoff has to be evaluated before selecting the storage type.

Based on your application design, you also need to identify the access pattern for storage. For example, note whether your application needs read access from secondary storage, or if the application needs data to be replicated to a different region for disaster recovery. We will explore some of the data resiliency capabilities offered by Azure storage for these use cases in the next section.

Azure Data Resiliency Capabilities

Azure storage represents a pool of storage resources consisting of blobs, file shares, tables, and queues. Depending on the number of copies, regions, resiliency, and access patterns, customers can choose from the following Azure storage types.

  • Locally Redundant Storage (LRS)
    This is the default Azure resiliency level for storage. LRS stores three copies of your data that all reside within a single data center. The data is written across all three copies synchronously.

    LRS storage provides a resiliency of 11 nines (99.999999999 %) over a given year. While LRS storage protects your data from drive or server rack failures, the availability of data would be impacted if an outage ever affects the data center.

  • Zone-Redundant Storage (ZRS)
    ZRS offers additional levels of resiliency within a region. In this case, the data is replicated across data centers that are located in three different Azure availability zones. The use of multiple availability zones prevents having a single point of failure, as each zone has its own independent network, power, and cooling.

    ZRS data replication is synchronous with an SLA of 12 nines i.e. 99.9999999999% over a year. Should a zone fail, your data will still be available. ZRS can be considered for highly available workloads that need to be deployed within a region due to data residency requirements.

  • Geo-Redundant Storage (GRS)
    For applications that need resiliency from region failures, you can opt for GRS. In GRS, three copies of the data are kept in a single location within a primary region, with another three copies of the data stored in another location in a paired secondary region. Note that the secondary region is pre-determined and is paired based on the primary region.

    The data is replicated asynchronously to the secondary region and is made available if the primary region should ever experience an outage. GRS provides an SLA of 16 nines over the span of a year.

  • Geo-Zone-Redundant Storage (GZRS)
    GZRS combines cross-zone high availability within a region with an added level of protection against regional outages thanks to geo-replication. Similar to ZRS, with GZRS three copies of data are maintained across three availability zones within a region and updated synchronously.

    This data is then replicated to the secondary region for resiliency against regional outages. Similar to GRS, GZRS also offers an SLA of 16 nines over the course of a year.

  • Read-Access Geo-Redundant Storage (RA-GRS)
    Though GRS and GZRS replicate data to a secondary region, the data from the secondary region will not be accessible until a failover is triggered. If applications need access to the data in the secondary regions as part of your high availability design, you can use RA-GRS.

    While using RA-GRS, the data in the secondary region can always be accessed by your applications, not just when a failover happens.

  • Read-Access Geo-Redundant Storage (RA-GZRS)
    RA-GZRS option can be used if you want read access to data in the secondary region for geo-zone-redundant storage. Note that while using RA-GRS and RA-GZRS you need to ensure that the application is designed to read from both the regions so as to ensure resiliency.

Storage Account Failover

Azure resiliency designs require that access to data be uninterrupted in the event of any planned or unplanned maintenance activities or outages. This requires storage account failover to be possible so that the additional copies of data can be accessed by applications for business continuity.

Let’s explore some of the possible outage scenarios and how storage account failovers happen in such cases.

  • Zone Failures
    While using storage types that are zone redundant, Azure manages the failover process with no manual intervention. All backend tasks, such as DNS repointing and network updates, are handled by the platform.

    These processes will happen with minimal disruption to user access; however, if applications try to access data before the backend changes are completed, it could result in errors. That means applications should be designed to include retry mechanisms and fault handling logic to manage this.

  • Region Failure
    In geo-redundant storage, data is asynchronously replicated. Because of that, there is a chance of some data loss in the event of a primary region failure. The RPO is typically 15 minutes, however there is no defined SLA for this. That may not align with every business’s RPO requirements.

    If the primary region fails, account failover to the secondary region should be initiated by the customer. This failover can be performed using the Azure portal, the Azure CLI, or through Azure Storage APIs. The failover process will update the DNS record making the secondary endpoint the new primary for the storage account.

    The storage account is configured in the new primary region to be locally redundant. You can also configure the storage to be geo-redundant and failback to the original primary region once it becomes available again.

  • Restrictions
    Account failover to a new region is not supported for Azure Data lake gen2 storage, which has a hierarchical namespace. Storage accounts with premium blobs and WORM immutable containers are also not supported for failover.

What Are the Patterns That Come Under Resiliency in Azure?

Varying design patterns can be leveraged to ensure Azure redundancy and resiliency. While designing applications, consider the resiliency at each layer. This article has largely focused on deploying applications in different zones or regions to ensure resiliency, such as Azure geo-replication. It is also possible to create more resiliency by configuring Azure auto scaling, using Azure load balancer to handle incoming traffic, and leveraging native Azure scalability features to ensure availability even during usage spikes.

For another Azure HA use case, read about Azure proximity placement groups here.

Ensuring Azure Resiliency with NetApp Cloud Volumes ONTAP

NetApp BlueXP Cloud Volumes ONTAP delivers the trusted NetApp storage management capabilities in Azure, with enterprise-class performance, security, and resiliency. Cloud Volumes ONTAP can be deployed in a resilient architecture that can help you achieve an RPO of zero and RTO of less than 60 seconds.

The Azure high availability deployment of Cloud Volumes ONTAP uses a dual-node architecture, where data is synchronously replicated between two redundant nodes to avoid a single point of failure. You can also configure HA pairs in active-active configuration so that clients can access data from both the nodes. The deployment can also be done in an active-passive deployment, where the read requests are handled by the passive node.

Should service ever be disrupted, the data can still be accessed from one of the nodes. Cloud Volumes ONTAP eliminates the requirement of complex application high availability configurations as the storage layer is resilient from failures.

Cloud Volumes ONTAP ensures the much-needed resiliency for business critical applications in Azure with minimal operational overhead. It can be used to host enterprise databases such as SQL, Oracle, SAP HANA, and NoSQL databases. It helps to ensure high availability of your DevOps and container ecosystem through easy integration with solutions such as Azure Kubernetes Service. With multi-protocol support Cloud Volumes ONTAP is a trusted solution that can cater to common enterprise requirements for home directories, VDI, application shared drives, and much more.

Conclusion

Ensuring data resilience is becoming a non-negotiable requirement. For Azure users, there are many options available, each with a vary level of resilience. But some enterprises will need to go even further than the out-of-the-box data resilience offered by Azure.

In addition to the native availability offered by Azure at the storage level, Cloud Volumes ONTAP high availability configuration provides an additional level of resiliency for mission-critical workloads in Azure. With minimal (less than 60 seconds) recovery time, zero data loss and seamless failover process, Cloud Volumes ONTAP helps meet your Azure data resiliency goals. Cloud Volumes ONTAP's integrated storage efficiency, security, and data protection capabilities ensure that you get a great value proposition for your cloud storage spend.

New call-to-action
Yifat Perry, Technical Content Manager

Technical Content Manager