hamburger icon close icon

Tiering Cold Data: AWS Storage Gateway vs Cloud Tiering

April 23, 2020

Topics: Cloud Tiering AWSAdvanced7 minute read

With data growing faster than ever, more and more companies are looking for ways to reduce their on-prem storage systems total cost of ownership (TCO) and to enhance their return on investment (ROI). A hybrid cloud architecture, which combines on-prem data centers with the cloud, is one of the solutions that make this possible. An integral part of such an architecture is tiering data between the two environments.

Looking to construct a hybrid cloud using AWS? In this article, we will take a look at two data tiering solutions that enable building a hybrid cloud using Amazon S3 object storage service. Find out what AWS Storage Gateways and NetApp’s Cloud Tiering service for NetApp AFF and SSD-backed FAS storage systems can bring to a hybrid cloud deployment.

AWS Storage Gateway

AWS Storage Gateway service can bring the virtually unlimited AWS cloud storage services to your on-premises data center. It does this as a VM provisioned on your on-premises virtual infrastructure or as a hardware appliance which can be purchased from Amazon. The service provides you with three types of configurable gateways for various use cases—file gateways, volume gateways and tape gateways—that connect your applications to cloud storage through standard data access protocols.

File Gateway

File gateways are NAS-style gateways, presenting Amazon S3 buckets as network file shares allowing you to connect Linux and Windows clients to cloud storage. By provisioning NFS (V3 and V4.1) or SMB (V2 and V3) bucket shares, applications can read and write files and directories which are retrieved from and stored in an Amazon S3 storage bucket. The gateway’s local storage is used as an upload buffer and as cache for low-latency access, storing the most recently used data.

The file gateway could be useful in several use cases, such as for online content repositories, backup to the cloud, business intelligence, analytics, and machine learning, all using native AWS services via API. It can be useful in the media industry, where a large amount of photographic or video media are required to be stored or in industries such as oil and gas and manufacturing where applications generate large amounts of files that should be distributed and accessed from multiple locations.

Volume Gateway

A volume gateway is a SAN-style gateway that provides an iSCSI target and allows provisioning of block storage volumes that hosts and applications can use, with a standard iSCSI connectivity from your chosen operating system.

Volume gateways can be used in cases where scalable on-premises storage for file services is required as well as in backup and disaster recovery for local application. Volume gateways can also be used as a means to migrate on-premises data to Amazon EBS to be used by applications running on Amazon EC2.

Volume gateways come in two flavors: Cached volumes and stored volumes.

Cached Volumes

  • Fully stored in Amazon S3 buckets
  • Maximum volume size of 32 TB in multiples of 1GB
  • 32 volumes per gateway, totaling 1 PB virtual storage
  • Splits the local storage between a cache, storing recent data and an upload buffer.
  • Useful for data science and AI development

Stored Volumes

  • Provisioned volumes are fully available on the gateway device
  • Incremental snapshots back up the volumes to S3
  • Maximum volume size of 16 TB (in multiples of 1 GB)
  • 32 volumes per gateway, totaling to 512 TB
  • Useful for large databases that ingest only a small percentage of that data

Tape Gateways

The Tape Gateway acts as a Virtual Tape Library (VTL), that spans from the on-premises data center to Amazon S3, allowing users to seamlessly move backup jobs to the cloud while eliminating all the hassle involved with operating and maintaining physical backup technologies.

Your existing backup software is presented with an industry-standard iSCSI VTL containing virtual tape drives and a virtual media changer when connected to the gateway. It can then backup data to virtual tapes, as you would to any on-premise iSCSI tape solution. A tape gateway can maintain an aggregated capacity of 1PB from up to 1,500 virtual tapes (sizes range from 100GiB to 5TiB) supported within a single VTL.

Backed up data is stored on the Amazon S3 Standard storage class and can be accessed instantaneously. When it no longer requires frequent access, it can be archived into Amazon S3 Glacier or S3 Glacier Deep Archive. Storing data in S3 Glacier and Glacier Deep Archive reduces costs but increases the time to restore from virtual tape, as the data needs 3–5 hours to be restored from Glacier and up to 12 hours for Glacier Deep Archive, and the tape is read-only. Also, to recover data from a tape in S3 Glacier you will have to recover the entire tape, which may cost more than just recovering the data.

This can be useful for long and very long-term backup solutions, where you would normally only need to access a tape less than twice a year or for regulatory reasons that you may never need to recover.

Tiering with AWS Storage Gateway

The key to managing storage capacity effectively in any storage system is finding the optimized format for the data in its lifecycle. AWS Storage Gateway uses three different tiering methods depending on the type of gateway: Block level, file level, and backup. Let’s take a look at each of them below.

Block Level Tiering

The AWS Volume Gateway uses block-level tiering for both stored volumes and cached Volumes. The difference between the volume types is that a stored Volume keeps the entire volume of data locally, and a cached volume keeps the entire volume of data on S3 and only caches the most recently accessed data locally.

When blocks of data are written to on-prem storage, the same blocks are also written to the upload buffer, where they are asynchronously moved to an Amazon S3 bucket. The S3 buckets effectively become a point in time backups of your on-prem volumes, and snapshots could be created to back up the data as Amazon EBS snapshots stored in S3.

A cached volume appears to the iSCSI connected host as the size of the provisioned volume. Cold blocks in the cache that have been copied to S3 are deleted, and if the iSCSI connected host requests these blocks, then they are copied from S3 to the cache. This process can incur some latency, and therefore it is essential to ensure the cache disc is sized to hold the hot data for any provisioned cached volume on-prem for optimal performance.

For both types of volumes, once the data is in Amazon S3, point in time snapshots can be created, either manually or by a schedule, to back up the data as Amazon EBS snapshots stored in S3. During DR scenarios your volumes can be restored, or mapped to a new volume gateway, which may be at another location.

File Level Tiering

The AWS File Gateway uses file-level tiering for each file share. File shares created on the gateway are mapped to an S3 bucket, and based on AWS’ recommendation, users should only map one NAS share per S3 bucket.

Files created on the file share are written to both the on-prem cache storage, and the upload buffer. The cache storage holds hot files locally for performance and reduced egress charges; old files are removed from the cache but appear in directories. Therefore ensure the cache is large enough to hold your regular working files with some spare room to ensure your hot files stay on-prem. Files in the upload buffer are turned into objects (one-to-one file-to-object relation exists) and are asynchronously transferred into the mapped S3 bucket.

When files are modified, the changed bytes are uploaded to keep the S3 object in sync. If a read operation requests files that are no longer in the cache, a byte range GET request downloads only the required data, that reduces data transfer and costs (some latency may be introduced depending on your connectivity to AWS).

Each object in S3 has the file’s metadata which contains its NFS or SMB ownership and permissions; if the mapped S3 bucket already contains objects, then these objects would be given custom default permissions and would be visible by clients who use the NAS share.

Files stored in S3 as objects can be versioned or replicated as any other object. AWS lifecycle management policies could be applied to the objects, moving them to longer-term storage or deleting them, ensuring compliance with relevant regulations.

Backups

Backups are not really storage tiering but an additional functionality for compatibility with existing processes and software. The AWS Tape Gateway uses this method of “tiering”.

The method is very similar to block Level Tiering with cached volumes, in that data is written to both upload buffer and cache. Except that the S3 bucket contains your virtual tape and once your virtual tape is full or you would like to start the next tape, the virtual tape is “ejected.” Ejecting a tape means marking the virtual tape for archiving turning it to read-only and moving it into S3 Glacier or S3 Glacier Deep Archive.

Tiering On-Prem Data to the Cloud with Cloud Tiering

While AWS Storage Gateway offers several tiering options to the cloud, existing NetApp AFF and SSD-backed FAS users have another option they can consider using: NetApp Cloud Tiering service.

Cloud Tiering service is available for tiering data to AWS and also to Azure, and Google Cloud. Cloud Tiering connects cloud-based object storage (such as Amazon S3), referred to as the “cloud tier,” to a NetApp AFF or SSD-backed FAS appliance, where its SSDs are referred to as the “performance tier.” The service is responsible for configuring ONTAP for that object storage provider and associates a data tiering policy to volumes of your choice.

There are four tiering policies in NetApp Cloud Tiering, one of which must be applied to each volume to be tiered and determine how the volume is tiered. The policies are no-tiering, cold snapshots, cold user data, and all data.

Summary

A NetApp on-prem system with cloud tiered iSCSI target volumes is quite similar to the AWS Volume Gateway and an AFF NetApp sharing cloud tiered volumes via NFS or SMB is comparable to AWS File Gateway.

An ONTAP volume backup to a destination volume via SnapMirror/SnapVault, is commonly used as a replacement for tape backups. Tiering the destination volume to the cloud with all data (AII) policy creates an alternative to the AWS Tape Gateway with low latency access to all your backed-up data.

As we have seen here, AWS Storage Gateway does effectively tier data to the cloud, but existing NetApp AFF and SSD-backed FAS users already have a powerful, built-in capability to extend their on-premises systems to the cloud, one that’s seamless and cost effective. Cloud Tiering service lets users maximize performant data center storage for critical applications, while keeping infrequently used data and data in the later stages of the data lifecycle stored inexpensively in the cloud, at unlimited scale.

To find out more about tiering in hybrid environments, sign up for Cloud Tiering today.

Oded Berman, Cloud Evangelist

Cloud Evangelist

-