April 27, 2020
Topics: Cloud Tiering Data TieringAdvanced9 minute read
Storage tiering makes it possible to match data with the most optimal type of storage, based on the usage of that data. Tiering infrequently used data to the cloud is the latest development in storage tiering, and as such many of today’s largest storage vendors offer services for it.
In this article, we will cover the different tiering choices from major storage providers including NetApp and the Cloud Tiering service for performant AFF and SSD-backed FAS systems.
Different Forms of Tiering
Storage tiering (also called data tiering) describes the functionality of using at least two storage technologies with different performance levels while presenting them both as a single fabric. Data can be moved between these tiers, for optimizing usage and cost.
In two-tier systems, the higher-performance tier is where data that is needed frequently is located. This is called the hot tier. The lower-performance tier that houses data that isn’t in frequent use is called the cold tier. The data is moved between these two tiers based on an algorithm written by the storage vendor so the movement is completely transparent to users. Three-tier systems may have a hot tier for performance, a warm tier for capacity, and the cold tier for long term storage.
In storage tiering, data is either whole files or blocks of which files are composed, but each piece of data can only exist on a single tier. As opposed to Caching and Backup data, where data copied between tiers, storage tiering moves a single copy of the data from one tier to another.
The algorithm has to manage the placement of data on the best tier to provide an overall performance or capacity benefit. A common approach is to move the most frequently accessed to the hot tier, some algorithms will move data older than a defined age to the cold tier, and if a backup facility is integrated into the storage operating system, the backups could be moved to cold storage immediately. Data moved to archive storage is still effectively online and will be retrieved by the system when accessed
Storage Vendor Approaches to Cloud Tiering
So what technology do the major storage makers offer for tiering data to the cloud? The following is a summary of how the five top storage array vendors tier data to the cloud.
When Dell bought EMC in 2016 they gained storage hardware and a cloud tiering product along with their existing storage hardware and cloud tiering products. Dell EMC has the following three cloud tiering capable systems:
Isilon Storage Array
A hardware storage appliance providing NAS storage, which includes a data tiering framework between the nodes of an Isilon Cluster, called SmartPools.
An extension of SmartPools called CloudPools allows files to be tiered to cloud storage. A file is moved to cloud storage if it matches the criteria specified in the CloudPools policy, and an 8Kb stub file, called a SmartLink File, is left behind. The criteria could be any combination of file metadata attributes such as timestamps, file name, file type and file size. CloudPools policies are scheduled once a day by default.
Dell EMC Virtustream, AWS, Google Cloud , Alibaba Aliyun, Federal C2S clouds, Azure, or private clouds based on Dell EMC ECS are all supported as cloud tiers.
Data Domain Storage System
The PowerProtect DD backup storage appliances run the Data Domain Operating System and use the Data Domain Filesystem have a built-in cloud tiering mechanism named Cloud Tier. This mechanism allows these systems to use two tiers for storing data: the active tier and a cloud tier.
The Data Domain Cloud Unit is the feature connecting these systems to various S3 providers. The Cloud Unit is utilized by data movement policies, which run on a daily schedule, that move data to the cloud tier when the minimum age of data reached. If any data matches the policy it will be deduplicated and moved to the cloud and brought back when requested.
The Data Domain Storage System can also present the iSCSI target of a backup device, this allows the tiered cloud storage to function as a virtual tape library.
Unity Storage Arrays
For tiering to the cloud, Unity Arrays use the Cloud Tiering Appliance (CTA) software. The CTA software can be installed on a physical server or deployed as a Virtual Machine on top of a VMWare ESXi host. As well CTA can be installed in high availability deployments.
For file level tiering CTA is similar to SmartPools on Isilon, in that files are moved to cloud storage and 8Kb stub files are left in their place on the on-prem system. When accessed the file is restored to the physical storage. Supported cold tiers in CTA are Amazon S3, Azure Blob, IBM Cloud Object Storage, Dell EMC Virtustream and ECS.
CTA also enables block data tiering, which in the Dell EMC terminology is known as block archiving. Block archiving can be performed on Dell EMC Unity storage systems to move snapshot data to the cloud based on archive policy. By default block archive is performed as a daily scheduled task. Restoring a snapshot can be done manually or scheduled.
Unity provides a data reduction feature that can cut down physical storage usage by deduplicating and compressing data stored on it. However, if the data to be tiered is "reduced" in Unity, it will be decompressed in memory and sent to the CTA which will perform compression which is more optimal for cold storage, before sending it to the cloud storage.
HPE has two products for tiering data to cloud storage, together they cover most of the HPE storage hardware range.
HPE 3PAR and Nimble Storage Systems
When using HPE 3Par StoreServ arrays and Nimble Storage systems, tiering data to the cloud is achieved through HPE’s StoreOnce Data Protection Backup Appliances with the use of HPE’s Cloud Bank Storage feature. HPE Cloud Bank Storage can tier backup/DR data to Amazon S3, Microsoft Azure Blob storage, and to the Scality object storage solution. To optimize the usage of cloud storage it includes change block tracking and data deduplication to reduce the amount of data that is migrated to cloud storage, and potentially back.
Hitachi Content Platform
The Hitachi Content Platform (HCP) is a software-defined object storage solution rather than a classic NAS or SAN system, supporting various data access protocols such as S3, Swift, and Rest API interfaces, as well as CIFS, NFS, WebDav, and SMTP protocols.
With HCP’s adaptive cloud tiering (ACT) functionality, hybrid storage pools can be constructed that leverage storage from third party sources on-site or off-site. Using the HCP G Node physical appliances (HCP G10's) or HCP VMs running on either ESXi or KVM allow you to use ACT and tier data. Each HCP node will contain some internal storage and can be connected to off-site public cloud resources and tier data into Amazon S3, Azure Blob, Google Cloud Storage, or other S3 compatible storage.
Hitachi NAS Platform (HNAS) and VSP N series
This platform comes with an intelligent file tiering feature that is policy-based, allowing for data to migrate to private cloud object storage based on HCP or public cloud object storage such as Amazon S3, Azure Blob and IBM Cloud Object Storage.
Tiering to the cloud is done at the file level and it is a policy-based process that runs on a predefined schedule. The migration of the data is done when the criteria ,defined in the business rule, is met and can be based on attributes such as timestamps, file name, path and owner or a combination.
From the IBM Spectrum Storage family—which is now all of IBM's enterprise storage under a single brand—comes IBM Spectrum Virtualize software. IBM Spectrum Virtualize pools storage from multiple storage systems, even from multiple vendors, this allows it to apply compression and automatic data tiering across an entire heterogeneous storage environment, providing a single point of management.
Using IBM Easy Tier, which supports up to three tiers, data extents (blocks) are moved automatically to the appropriate tier. Easy Tier can also auto rebalance extents to ensure even IO load across all disks in the same tier.
This solution doesn't tier to cloud storage directly, instead, it replicates to IBM Spectrum Virtualize for Public Cloud instances running in AWS or IBM Cloud, so it could be considered more of a backup or migration solution.
Another tiering capability can be found within the IBM portfolio which is a software appliance from the IBM Spectrum family called IBM Spectrum Protect that can tier to cloud storage. With its many plugins, it can backup many different types of enterprise environments, from applications to virtual hosts, to databases. It can then replicate the backups to other IBM Spectrum Protect servers or tier them to Amazon S3 and Azure Blob, with tiering based on the age of the data.
NetApp All Flash FAS (AFF) and FAS Hybrid Flash Arrays running NetApp’s ONTAP data management software can tier data to a cloud-based object storage via the Cloud Tiering service which is based on NetApp FabricPool technology.
Tiering is completely seamless and automated and offers users three policies for tiering data to the cloud:
- Auto Tiering Policy: Blocks that have been cold for a predefined period are moved to the cloud tier.
- Snapshot-Only Tiering Policy: Snapshot blocks are moved to the cloud tier after a default or specified cooling period.
- All Tiering Policy: Moves all volume data blocks to the cloud, normally on volumes that contain finished projects, historical data, backups or archive data.
When blocks that have been tiered are accessed, only those 4KB blocks are returned, not the whole object. This ensures both latency and costs are minimized. Cloud Tiering supports tiering cold data to Amazon S3, Azure Blob storage, and Google Cloud Storage.
One the major differences to Cloud Tiering with NetApp is that tiering is performed at the block level, without any “stubs” or “smartlinks” left behind. Instead of entire files being migrated to an external cloud tier, only cold blocks are, all of which is done while preserving the namespace, making the entire process transparent to applications that need to access their data.
Cloud Tiering is designed to handle both primary and secondary copies of data, so production ONTAP-based systems can be utilized more efficiently, allowing you to reclaim performant storage space to be used by performance-sensitive applications. Cloud Tiering also allows on-prem machines to tier data directly to the cloud, without data having to move to another intermediary storage level before it ever gets to the cloud.
Cloud Tiering is an excellent way to manage the data lifecycle, and lower the costs of hosting infrequently used data in the data center.
All Tiering Isn’t on the Same Level
The first thing to notice is that there are different interpretations of tiering by different vendors.
File tiering is useful and can regain space on your performance tier, but if you regularly access older files, the default configuration could become expensive in cloud storage egress charges.
Storage or data tiering is a way to intelligently manage data growth and be cost-effective, as you move cold data to a less performant, less expensive storage tier. Some technologies, including NetApp Cloud Tiering, optimize this by only moving the infrequently accessed data blocks to the public cloud tier and back to the performance tier when requested, so unneeded data is always stored cost efficiently.
For organizations that are just beginning to work on a digital transformation, Cloud Tiering can offer an easy, low-cost first step into the cloud,
Get a higher level of tiering: sign up for a free trial of the Cloud Tiering service here