Subscribe to our blog
Thanks for subscribing to the blog.
March 8, 2020
Topics: Cloud Tiering AzureData TieringAWSGoogle CloudAdvanced9 minute read
Tiering is about finding the best storage option for data throughout its lifecycle. Not all data is always in active use and keeping it on a performant tier can become a big cost factor. For this reason, the big three cloud vendors have different storage tiering options within their cloud storage environments.
In this post we will look at how tiering is accomplished within AWS, Azure, and Google Cloud storage environments, and whether there is any level of automation or manual process involved. As these vendors only offer "storage tiering" for their object storage offerings, we will concentrate on this.
We’ll also investigate NetApp’s Cloud Tiering service that connects on-premises equipment, such as NetApp AFF and SSD-backed FAS systems, to these storage services and take advantage of all of them.
Object Storage in the Public Cloud
All of the major cloud vendors offer data lifecycle management options where policies are configured by the user, and each policy is a set of rules describing actions to be taken on an object, such as migrating objects over 30 days old to a less-costly but also less-performant storage tier.
Google Cloud Storage
In Google Cloud storage the closest process to storage tiering is Object Lifecycle Management of Storage Classes. This process applies to objects stored in buckets and refers to downgrading storage performance of objects based on user-defined rules to reduce long term costs. Several metrics can be used in the rule definition, such as age, time in a particular class, and others that can be used on versioned objects only.
An object can exist in one of four different storage classes. These classes can be set upon upload or inherited from the storage bucket if left unspecified. The class can also be changed manually in the web GUI or via CLI commands.
The object storage classes available for tiering are as follows:
Standard: Hot data or short storage periods.
Nearline: Lower cost, good for data accessed once a month or less.
Coldline: Very low cost, better for data read or modified once per quarter.
Archive: Lowest cost, for data you access less than once a year. Still millisecond access times.
For storage classes other than standard there are data access charges and a minimum length of storage charge, this applies if you delete an object early. For example, if you upload an object into Nearline and you move or delete the object after 20 days, you will still be charged a minimum of 30 days. Google storage charges are billed from the date of object creation, this means if you have an object in Standard storage class and the class is changed to Nearline after 20 days, and then after 10 days you delete or move the object back to standard, you will not incur early deletion charges.
Azure Blob Storage
Azure objects are called blobs and they support blob-level tiering, which is also called Lifecycle Management. On Azure, this process is very similar to Google Cloud Storage’s object lifecycle management and is also designed to downgrade storage to reduce long term storage costs. It is implemented by a set of rules executed once a day which can transition blobs to cooler storage (i.e hot to cool, hot to archive, or cool to archive) or delete the blob.
Blobs have three storage classes, which can be set on upload, if the class is not specified then the class will be inferred from the account access tier setting. The class can be changed by API, CLI, or web GUI.
The blob storage classes for tiering are:
Hot: Data in active use or expected to be accessed frequently.
Cool: Lower cost but higher access cost, monthly access or less.
Archive: Lowest cost but higher access cost. Access time is several hours.
When a blob is transitioned to cooler storage, Azure will charge for writing to the destination tier, when you transition a blob to warmer storage (i.e Archive to Cool, Cool to Hot, or Archive to Hot) Azure will charge you for reading from the source tier.
Minimum storage charges for each tier apply and time is reset at each transition, therefore if you transition a blob from cool to archive after 20 days and then to hot after 60 days, you will be charged for 30 days in cool storage + 180 days in archive, so ensure your object lifecycle management reflects your usage and does save you money.
A blob located in the Archive access tier is considered offline and cannot be read or modified. Transitioning a blob from Archive to a warmer tier, so it can be read and modified, is called "rehydration". Currently, there are two rehydrate priority levels: standard and high-priority. Standard rehydration requests are processed in the order they come in and may take up to 15 hours to complete, while high priority requests are prioritized over standard requests and may be done in under one hour, depending on the blob size and current demand.
AWS Storage
AWS has two options for automatic object storage tiering within the cloud, S3 Lifecycle management, and S3 Intelligent Tiering. S3 Intelligent Tiering is a storage class and is enabled by setting it as an object storage class, a lifecycle policy rule could even migrate objects into the intelligent tiering class.
AWS objects have 5 storage classes between which an object can be managed by lifecycle policies, these are:
S3 Standard: Low latency and high throughput, used for frequently accessed data.
S3 Standard-IA: Same performance, lower storage charges, retrieval charges. Ideal for long-term storage.
S3 Glacier: Ideal for data archiving. Low cost with a retrieval range from one minute up to 12 hours.
S3 Glacier Deep Archive: Lowest cost storage. Data is restored within 12 hours.
S3 Intelligent Tiering: Two-tiered storage in a single class. Ideal for long-lived data with unknown or unpredictable access patterns.
There are a number of different cost factors to keep in mind when using Amazon S3 Lifecycle policies. Object lifecycle transitions incur a small charge, the cost of storage varies per class, some classes have charges for retrieval of data, and there are minimum charging periods for different classes. Use the AWS cost calculators to ensure the policies you set will save you money.
S3 Intelligent Tiering has two access tiers: One optimized for frequent access, the hot tier, and the other optimized for infrequent access, i.e., a cold tier. When objects are put into this class they start in the hot tier, AWS automatically migrates objects, that have not been accessed for 30 consecutive days, to the cold tier and migrates them back to the hot tier if they are accessed. For this, AWS charges a small management fee—monitoring the access patterns of the objects and automation for tiers changes.
Comparison Chart
The most common comparable storage offerings from cloud vendors are shown below. In cloud the storage costs can become complex, therefore the comparison is very simple and doesn't include the many small costs with cloud-based storage, such as network egress and operations/API costs. Listed here are the costs of storage and retrieval rate to show the differences in the most basic way.
(Note: The example rates listed below are for storage within a single US West region.)
Amazon Web Services S3 storage
Amazon Simple Storage Service (Amazon S3) was the initial public cloud storage offering. As such, it is the most widely used low-cost cloud-based object storage. There are five relevant storage classes that data can be tiered to that vary by cost, availability, and data retrieval charges.
Class |
Storage $ per GB |
Retrieval $ per GB |
Minimum Days Charged |
S3 Standard |
0.026 |
0 |
- |
S3 Intelligent Tiering |
0.026/0.019 |
0 |
30 |
S3 Standard - Infrequent Access |
0.019 |
0.01 |
30 |
S3 Glacier |
0.005 |
0.011 |
90 |
S3 Glacier Deep Archive |
0.002 |
0.022 |
180 |
Azure Blob Storage
Azure Blob storage offers three classes of object storage that vary by cost, retrieval charges and the minimum number of days charged for usage (even if deleted).
Class |
Storage $ per GB |
Retrieval $ per GB |
Minimum Days Charged |
Hot |
0.0184 |
0 |
- |
Cool |
0.01 |
0.01 |
30 |
Archive |
0.00099 |
0.02 |
180 |
Google Cloud Storage
On Google Cloud, object storage is offered in the form of Google Cloud Storage. There are four classes of Cloud Storage that vary by cost, the minimum number of days charged for usage (even if deleted), the data retrieval charge, and a slight decrease in potential availability.
Class |
Availability |
Storage $ per GB |
Retrieval $ per GB |
Minimum Days Charged |
Standard |
99.9% |
0.026 |
0 |
- |
Nearline |
99.0% |
0.01 |
0.01 |
30 |
Coldline |
99.0% |
0.007 |
0.02 |
90 |
Archive |
N/A |
0.004 |
0.05 |
365 |
The Weak Spot to Cloud Provider Tiering: Lack of On-Prem Support
While all of the object storage offerings mentioned above do a good job of tiering data between different tiers based on access frequency, the major drawback to these services is that on their own they don’t have the capability to work with on-prem deployments. On-prem storage users looking to take advantage of the scale and economy of cloud-based object storage have to turn to additional solutions.
There are solutions offered by the cloud providers that can be used to tier infrequently accessed data from on-prem to the cloud such as AWS Storage Gateway that enables tiering of data to Amazon S3 or Hubstor, a Microsoft partner, to Azure Blob Storage. But these products are add-ons that may require changes to existing applications, added costs and mostly used at the file level and to tier secondary copies. NetApp storage system users have a much more flexible and efficient option: The Cloud Tiering service for AFF and SSD-backed FAS systems from NetApp.
Cloud Tiering makes it possible for data center systems to extend into the cloud, using the object offerings covered above as capacity tiers for cold or infrequently used data. Cloud Tiering operates automatically and seamlessly at the block level and can be used to tier both production and secondary data.
This tiering service can automatically detect infrequently used data and move it seamlessly to any of the three cloud services covered in this article: Google Cloud Storage, Amazon S3, or Azure Blob storage. When data is needed for performant use, it is automatically shifted back to the performance tier on-prem.
In effect, Cloud Tiering turns your on-prem system into part of a hybrid cloud storage architecture. And since Cloud Tiering is already part of NetApp AFF and SSD-backed FAS systems, there is no additional cost to use it— you only pay for the storage you use, when you use it. Plus, there is no need to refactor or rearchitect existing applications, making this the easiest and more cost-effective way to get to the cloud if you’re shifting towards a cloud strategy.
Summary
Google and Azure have used lifecycle management to automate migrating objects between different classes of storage to reduce storage costs over short and long retention periods. AWS has lifecycle management as well but also has a more modern tiering solution, which also reduces storage costs, that moves the objects back and forth between the hot and cold tier when required. Note that none of the three cloud providers enable tiering between their object stores and more performant disk storage and to take advantage of these services on-prem, you need to turn to an additional service.
NetApp SSD-backed FAS and AFF users can activate Cloud Tiering service to begin leveraging the object storage services in the public cloud seamlessly.
Find out more on Cloud Tiering to AWS, Azure, or Google Cloud here.