BlueXP Blog

EFS IA vs Cloud Volumes ONTAP File Share Data Tiering on AWS

Written by Aviv Degani, Cloud Solutions Architecture Manager, NetApp | Jun 18, 2020 2:24:26 PM

Having the ability to automatically move data between hot and cold tiers with no user intervention and with no impact on existing workflows is a key part of a data lifecycle management plan. This technology can also help enact the “80/20” Pareto principle that prevails across the IT industry: the rule that assumes 80% of data in an organization is rarely ever used and 20% is hot or frequently accessed. The same is true when it comes to file share data like the kind stored on AWS EFS.

In this article we will go through two intelligent and advanced data tiering solutions that achieve just that: Cloud Volumes ONTAP data tiering and the recently released EFS IA, which can be used as a capacity tier for AWS EFS. If you are an advanced AWS user and your organization runs comprehensive workloads and storage in the cloud, comparing these features will help you gain a clearer perspective on these two cloud tiering products, on tiering in general, and on the added benefits that Cloud Volumes ONTAP can bring to you when looking for a storage solution with tiering capabilities.

One of the main differences between tiering with EFS and Cloud Volumes ONTAP is that EFS tiers at the file level, and Cloud Volumes ONTAP tiers blocks at a default size of 4 KB. As such, block-level tiering is a more granular approach, enabling parts of a file to be tiered, while in a file-level approach, the only unit that can be tiered is the entire file.

Let’s take a look at how each of these approaches work in practice below.

File-Level Tiering with AWS EFS

AWS Elastic File System (EFS) provides a shared, scalable NFS file system for Linux workloads which can be accessed from both AWS cloud services and on-prem applications. This file system is designed to be accessed in parallel from many Amazon EC2 instances while still keeping consistently low latency times and enabling a wide aggregate throughput.

Having an auto-scalable and easy-to-use storage solution were among the reasons why large enterprise companies opted for EFS storage. However, these customers provided feedback to AWS on how their bills go up significantly when storing large amounts of data for long-term periods. This feedback led AWS to implement a less expensive way to store data on EFS.

What Is AWS EFS Infrequent Access?

AWS added a new storage class for EFS called EFS Infrequent Access (EFS IA) to serve as the capacity tier. EFS IA is a lower-cost storage class based on magnetic HDD storage designed for keeping infrequently accessed files stored long-term. The performant EFS tier is now referred to as EFS Standard storage.

Moving data from EFS Standard to EFS IA is a form of storage tiering that optimizes the data lifecycle. Using the US East N. Virginia region as an example, storage on EFS IA is just $0.025 per GB/month. With industry research pointing out that 80% of data in an organization is infrequently accessed, an estimated average price for storing all your data in EFS with the lifecycle management feature enabled, (i.e., using both EFS Standard and EFS IA), could be as low as $0.08 GB/month. Compare that with storing all the data using only EFS Standard, which costs $0.30 GB/month, at more than four times the price.

To start using this data tiering service, existing EFS users can enable the EFS lifecycle management feature from the AWS Management Console or using API calls.

Once enabled, lifecycle management keeps an internal time tracker for every file from the last time it was accessed. Operations such as listing a file within a directory do not count towards resetting the timer, for example, the ls command in Linux wouldn’t reset it. When the timer reaches the aging policy which you previously defined, the file is then automatically and seamlessly moved to the EFS IA layer. The available age-off policies to choose from are 7,14, 30, 60, and 90 days.

However, once a file is placed in the EFS IA storage class, it stays there. This means that even if that file is accessed it will remain in EFS IA. The only way to bring that data back to the hot tier is by creating a copy and transferring it from EFS IA storage. That takes time, and adds extra manual efforts and costs to the process. Because EFS IA resides in separate servers with slower magnetic HDDs, any time there is a read or write request to a file stored in EFS IA, an additional data transfer fee will be applied. For example, in the Ohio region, pricing is $0.025 per GB per month for storage in EFS IA, and $0.01 per GB transferred.

Below are some of the important characteristics of tiering data from EFS Standard to EFS IA:

  • File-level tiering moves entire files from Standard to IA.
  • Once files are moved to IA they do not move back to Standard.
  • Files smaller than 128 KB are not eligible for tiering and remain in EFS Standard storage.
  • Double digit milliseconds latency when reading data from EFS IA Storage Class in contrast to single digit milliseconds when retrieving it from EFS Standard Class.
  • Meta data (file ownership, directory structures, file names) is kept on the EFS Standard storage class.
  • EFS IA charges are based on the amount of storage used and extra costs for data transfers to and from EFS IA.

Block-Level Tiering with Cloud Volumes ONTAP AWS

Cloud Volumes ONTAP data tiering is based on NetApp’s FabricPool technology, enabling block data automatic tiering to lower-cost object data. The data tiering process regularly scans the data blocks on Amazon EBS (AWS EBS), noting how long it has been since the block was accessed. Once it identifies a data block that has reached its preconfigured cooling period, that block is automatically and seamlessly moved to an Amazon S3 object store.

But here is a major difference between tiering data to EFS IA and tiering with Cloud Volumes ONTAP: If that same data block now residing in object storage is requested again, it is automatically and seamlessly moved back in the EBS performance tier.

Cloud Volumes ONTAP data tiering.

The system is also designed to differentiate between random read/write operations and sequential reads, such as a virus scan, so that only necessary data blocks move back to EBS. As already mentioned above, the smallest unit to be scanned under this storage tiering technology are the individual blocks of 4 KB which are packed together in 4 MB objects when moved to Amazon S3.

Using the same 80/20% principle we discussed above, compared to the standard costs of storing data on EBS at $0.10 per GB/month, with inactive data tiered to Amazon S3, overall storage costs with Cloud Volumes ONTAP can be as low as $0.03 for GB/month, which makes it three times less expensive.

Plus, Cloud Volumes ONTAP only tiers compressed and deduplicated blocks, not the full file size, making it faster to move and extremely space efficient to store on Amazon S3—you’re not just paying less to store cold data on Amazon S3 than you would on EFS IA, you’re also using less space to do it. Note that those space efficiencies also extend to the data on Amazon EBS, making it less expensive than EFS to store the important data in the hot tier as well.

Below are the main features of Cloud Volumes ONTAP storage tiering: 

  • Block-level tiering moves 4 KB blocks of data.
  • Brings back data blocks to the hot tier automatically when they are requested. No user management or manual intervention is needed.
  • In addition to the block to object storage tiering, another level of tiering is possible between three different AWS S3 storage classes which reduces costs even further: S3 Standard, Standard Infrequent Access, and One Zone-Infrequent Access.
  • Three different tiering policies are available. These policies are applicable to different use cases and scenarios, and you can choose each one at the volume level:
  1. Auto: Moves any data block that was not accessed in the pre-defined cooling period. Data is moved back automatically and seamlessly when access is needed again. Note: System IOs such as antivirus scans or backups will not pull the data back to the performance tier.
  2. All: Moves all the data blocks to Amazon S3. This is useful for use cases such as DR or backup. Data is moved back to EBS automatically and seamlessly in case of a disaster, or when needed for production.
  3. Snapshot only: Moves cold data blocks that belong to volume snapshots only. Data is moved back automatically and seamlessly when recovery is required.
  • Adjustable cooling periods can range between 2-63 days, allowing for a more specific configuration.

More Benefits to Cloud Volumes ONTAP

Along with tiering, Cloud Volumes ONTAP offers additional benefits for file share storage on AWS:

Conclusion

Both AWS tiered storage solutions using EFS and Cloud Volumes ONTAP are effective in reducing storage costs transparently with no impact on existing workflows. If you are dedicating hot tier storage for aging data such as document housing for an HR department, database log archiving or historical purchase data for future analysis, you should explore these tiering solutions.

Comparing the two solutions, each gives users the ability to store that massive cold 80% of the data set less expensively. Cloud Volumes ONTAP lets you do it at lower cost, while using less space.