BlueXP Blog

Tier Inactive Data from AFFs to Azure Blob

Written by Oded Berman, Product Evangelist | Aug 20, 2019 1:42:00 PM

To meet the challenges of exponential data growth, cloud storage is considered a cheaper alternative to on-premises storage boxes by many organizations. But despite the fact that managing and hosting on-premises storage systems can be considered a CAPEX and operational overhead, some enterprise use cases demand the use of high performance on-premises storage. The optimal approach for achieving a balance between cost and performance is to adopt a hybrid architecture where the data is distributed across on-premises storage and cloud storage.

Manual segregation of data with different performance requirements and access patterns with an intention of migrating the least-accessed data to the cloud is not a practical approach. It takes too much time and effort and is far too complicated to configure. The new NetApp Cloud Tiering service for NetApp AFF and SSD-backed FAS systems helps address this issue by seamlessly moving inactive data into low cost cloud-based object storage. This blog will explore the cost and performance considerations while tiering data from on-premises AFF storage to Azure Blob storage.

Cloud Tiering to Azure Blob

NetApp AFF systems are leading on-premises storage solutions trusted by enterprises to provide high-performance storage for their critical line of business (LOB) applications and other use cases. The Cloud Tiering service for NetApp AFF is aligned with the “Lift and DON’T shift” value proposition: customers get an easy option to efficiently manage their data estate by leveraging Azure cloud storage for inactive data without making any changes to their applications and processes.

Cloud Tiering leverages the proven NetApp FabricPool technology in the backend to tier data from the on-prem ONTAP SSD-based tier, known as the performance tier, to the low-cost capacity tier, also known as cloud tier, that uses Azure Blob storage.

By default, Cloud Tiering uses an Auto-tiering policy. With Auto-tiering, all data, including snapshots, is tiered if it is inactive for 31 days or another user-defined period of time. When this data is randomly accessed by an application, it is automatically tiered back to the on-premises storage system. If it is sequentially accessed (such as in indexing and antivirus scans operations) the data stays cold and remain in the cloud tier.

Cloud Tiering also offers the use of the Snapshot-only tiering policy, where only inactive snapshots data blocks are tiered to Azure Blob storage after a short cooling period of two days. In case these cold snapshot blocks are accessed, either randomly or sequentially, they will be tiered back to the on-premises performance tier.

The value proposition of Cloud Tiering can be summarized as follows:

  • Get the maximum ROI on NetApp AFF by extending the storage capacity to infinitely scalable cloud storage.
  • Manage tiering with minimal configuration overhead using easy automation processes and tools.
  • Uses an OPEX spending model so that customers only pay for the data that is moved to Azure Blob storage.
  • Reduce on-premises data center storage footprint while meeting capacity demands of enterprise applications.
  • Offers a safe adoption of a hybrid cloud architecture without refactoring applications, as the tiering happens automatically and transparently.

The Cost Benefits of Using Azure Blob Storage

Azure Blob storage offers an inexpensive object storage solution in the cloud with built-in data encryption, availability across regions, and scalability to store large amounts of data. Though primarily used for storing unstructured data such as images, videos, and the log files used by applications, storing backup and archival data at minimal cost is an equally popular use case for Azure Blob storage. Cloud Tiering currently supports moving the infrequently accessed data to Azure Blob storage’s Hot Access tier. Tiering data to the Cool Access tier will be supported in a future release.

It is estimated that 80% of the data in any organization is cold data, rarely accessed by applications. That means for a LOB application that uses 10 TB of data, approximately 8 TB could be cold data. It would take multiple SSDs of 3.8 TBs each (most commonly used) to store this data and provide it with a high level of protection against disk failures, using RAID-DP, NetApp’s RAID-6 implementation. The number of disks required would eventually increase over the years due to annual data growth and hence additional storage arrays would be needed when the limits of the first array are reached. This calls for additional CAPEX investments in terms of disks, storage arrays, hosting space etc. Using Cloud Tiering, that same data can be tiered and stored in Azure Blob storage at significantly lower charges than on local SSDs.

With the Cloud Tiering service enabled, organizations don’t need to go through the annual storage planning and forecasting exercises where the possibilities of miscalculation can wind up leading to either insufficient or over-provisioned storage capacity. Storage hardware purchases can be postponed since Cloud Tiering effectively increases the capacity of NetApp AFF systems up to 20X. Storage investment can be redirected to flash disk-based storage systems like NetApp AFF due to assured TCO savings instead of investing in less-expensive physical storage units due to cost constraints.

AFF with Azure Storage: Performance Considerations

NetApp AFF uses an All-Flash NVME-based architecture that outperforms the competition with benchmarks of sub-200 millisecond latency, 11.4M IOPS, and 300 GB/s throughput. It is to be noted that these performance levels are achieved without affecting the ONTAP storage efficiency techniques such as deduplication and compression, that are enabled by default.

Data tiering back to on-premises storage can be triggered in multiple use cases. When used as backend storage for databases, hot data could remain on-premises while data files like archived logs, DB backups, or even database blocks which are not accessed over a period of time are automatically tiered to Azure Blob storage. This data will be tiered back when it is accessed, for example, in a DB restore operation to retrieve old data, which might happen sporadically. Similarly, backup and archival data tiered to Azure Blob storage would need to be retrieved occasionally during restore tests or compliance audits.

Best-in-class performance is assured while using on-premises storage. It is obvious that an I/O operation served by an AFF system accessing its local SSD storage will be many folds faster than serving that same I/O from the object storage. However, once the data brought back to the performance tier, subsequent access to it will result in sub-millisecond response time. Therefore, one of the performance related considerations should be the sizing of the storage. Data will not be tiered back on-premises if the performance tier is at 70% capacity and it will remain cold, and subsequent access to the data will be from the cloud tier. For most environments, a 1:10 ratio between the performance tier and the cloud tier, respectively, would be conservative enough while providing substantial savings.

The second consideration, that greatly affects the performance level while accessing data from Azure Blob storage is the network connectivity available. Though not a prerequisite for Cloud Tiering, an ExpressRoute connection that offers assured bandwidth from on-premises networks to Azure is recommended for better performance. Should the network connectivity to Azure Blob storage ever become unavailable, there would only be data access errors if the data accessed by applications or users resides in the cloud tier—there wouldn’t, in any circumstance, be any data loss.

Conclusion

The Cloud Tiering service helps in optimizing the storage capacity of AFF storage while maintaining a fine balance between cost and performance. It helps in cost reduction by aiding the CAPEX to OPEX conversion where major chunk of data is moved to low-cost Azure storage based on access patterns. This zero-effort data extension can be implemented without any changes in applications, tools, and processes. The service can be easily configured, managed, and automated from a unified and user-friendly interface.