hamburger icon close icon

Why You Want (and Don’t Want) to Keep Data in Archive Storage

March 4, 2021

Topics: Cloud Tiering Advanced7 minute read

Whether using it as your first step to a cloud-based strategy, or as a plan to save up some space on your on-prem ONTAP and extend its capacity, Cloud Tiering is becoming easier, richer and more flexible with each update. With the release of its latest update, Cloud Tiering shows no signs of cooling off, but its tiering options have: Cloud Tiering now supports tiering to Google Cloud Storage’s Coldline and Archive storage classes.

But with that new support for colder storage classes, there is still a question you might ask: why doesn’t Cloud Tiering support all the DEEP cold storage classes? In this article we explore the ins and outs of deep cold storage classes such as S3 Glacier or Azure Archive access tier and explain some of the reasons why these archive tiers have not been added as Cloud Tiering targets, while Google Cloud Storage’s Coldline and Archive storage classes have. In this blog we will take a look at the features these colder storage layers use, their most common use cases, and why Cloud Tiering mainly targets other types of workloads.

Archiving to Object Storage

For the majority of data uses, data is less likely to be accessed with the passage of time. The more time passes since the data was processed, the lower the chance that it will ever be used again. In many cases, that inactive data remains and accumulates over time in on-prem storage arrays, continuing to consume energy, disk space, floor space, and admin overhead.

Thanks to Cloud Tiering, companies with NetApp AFF or SSD-backed FAS storage systems have the option to move this inactive data to low-cost object storage based in the public cloud. But once that inactive data is tiered to the cloud, it will age there as well, until it becomes subject to archiving. In general, data classified for archiving can be further tiered to additional, more cost-efficient object storage classes that the major cloud providers offer, including:

  • Google Cloud Storage Coldline and Archive on Google Cloud
  • Amazon S3 Glacier and S3 Glacier Deep Archive on AWS
  • Archive access tier for Azure Blob storage on Azure

But what type of data exactly belongs in such archive storage tiers? Here we are not talking about data that is accessed constantly for a few days and then needs to sit in a cooler tier in case it needs to be accessed again. That kind of data, for example, includes short term backups or video content, and needs to be promptly available, should somebody request it—it can’t go into an archive tier, where retrieval times are too long.

The type of data that can be stored in archive storage tiers is different. This type of data doesn’t need to be immediately available for the user: restore times of minutes or even hours must be acceptable for this kind of data. For example, data that needs to be kept for legal, regulatory, or compliance reasons in industries such as healthcare, finance, and law firms. Long-term backups, secondary backups, or historical data pools for data analytics are also examples of these types of data following the same pattern.

Archival cloud storage services such as Google Archive storage class, Amazon S3 Glacier, Amazon S3 Glacier Deep Archive, and Azure Archive access tier are services aimed for that type of demand. They all have significantly lower data-at-rest costs, however this significantly lower pricing in storage comes with a few trade-offs. For example, compared to the Hot tier which costs $0.0184 per GB in the West US 2 region, the Azure Archive tier costs $0.00099 per GB in that same region. Also, if data sitting in the archive tier is ever accessed, it incurs in additional read and data retrieval charges, plus any applicable network transfer costs. Amazon S3 Glacier Deep Archive has the same storage cost of $0.00099 in the US West 2 region and it also has additional data read and data retrieval charges plus any network transfer costs. The same goes with Google Cloud.

The common denominator for these services is that you need to budget additional costs for operations and data retrieval charges on top of the storage costs in case you need to access the archived data. These charges tend to be higher than their hotter-tier counterparts.

But how is this all related to Cloud Tiering? Before getting there let’s take a look into archived data retrieval.

Retrieving Archived Data

Let's have a high-level overview at how the data is retrieved on a per-case basis, so we have a better insight on what are the steps needed to accomplish that.

  • Amazon S3 Glacier and S3 Glacier Deep Archive: Restoring data from S3 Glacier involves placing a restore request and then waiting for S3 to create a temporary copy for the requested objects. In addition to the retrieval charges mentioned above, you will also need to pay for the storage to accommodate this temporary copy. Issuing a restore request is done through the AWS Management console or through the AWS CLI. Depending on the restore option chosen, the process can take a few minutes to several hours or even days.
  • Azure Archive Tier: Restoring data from this offline tier is also known as rehydrating the data. Rehydrating data from the archive tier can be accomplished by changing the tier of a blob with a Set Blob Tier operation or by copying it over to an online tier (hot or cool) with the Copy Blob. When restoring with these operations you can choose between two priorities (high and standard) which can take up to 15 hours to complete.
  • Google Cloud Archive Storage: Unlike the other options, this storage class is not an offline tier, but rather an online archival option. This means that restore operations can be done within milliseconds and with no additional steps to perform other than the ones required by the application in use, such as a database program or a disaster recovery app. However, the request operations and data retrieval fees on Google Cloud Archive Storage are still higher than with other storage classes.

Unlike Google's Archive Storage class, the AWS and Azure options involve separate restore procedures with restore times that range from several minutes to several hours or even several days. Effectively, this makes Google's Archive Storage class less cold than the archive options on AWS and Azure, though it does still charge similar retrieval fees.

Cloud Tiering and the Archive Storage Classes

Now that we have looked at how restores work in deep storage layers, we can discuss and understand why Cloud Tiering hasn’t incorporated these storage classes as options for cold data tiering targets.

Leveraging NetApp’s FabricPool technology, Cloud Tiering provides a high-performance data tiering solution where data is intelligently, and seamlessly moved to and from a cloud tier. Data residing in any of the supported storage classes can be accessed without performing any further action and is always ready for immediate use.

Cloud Tiering moves this data between tiers based on the application activities, making it hot and available again on the on-prem performance tier with millisecond latencies, without any extra work. As was explained above, the archive layers on AWS and Azure involve separate and additional restore procedures which involve manual requests or additional admin work, making them essentially offline tiers. This is the reason why the only archive object storage offerings included in Cloud Tiering are Google Cloud Storage’s Coldline and Archive classes. With these Google Storage classes, no additional restore procedures are involved, and they keep access times to latencies of milliseconds.

In terms of use cases, think about the health industry, where patient files need to be readily accessible after long periods of time of no access. Databases with archived logs that need to be readily available for a database recovery. Short term backups, such as snapshots which provide a quick recovery from accidental data deletion, corruption, or ransomware lockouts. All these are examples of use cases that can benefit from using Cloud Tiering. The data in these cases can rest on lower-cost object storage tiers but must be made available again with no additional processes to follow and in no time when called.

Conclusion    

Cloud Tiering is an intelligent solution that fits into a wide range of use cases where data becomes infrequently accessed, but when needed, it has to be immediately available, without any extra effort. Some of the cloud archive offerings can’t meet those performance demands.

Since Cloud Tiering is a high-performance cold data tiering solution, only Google Cloud's Archive storage class was incorporated as a tiering option for deep archive storage. With this addition, Cloud Tiering can now span more use cases including long-term archives, secondary backups, or tape substitutions. However, bear in mind that with this type of storage class, restores incur in additional data retrieval costs and must be taken into consideration so they don’t catch you off guard.

New call-to-action

Oded Berman, Cloud Evangelist

Cloud Evangelist