September 10, 2019
Topics: Cloud Tiering Data TieringAdvanced6 minute read
When NetApp first introduced the NetApp AFF A-Series systems, they offered various advantages that users couldn’t find anywhere else. The main advantage was an outstanding improvement in performance to serve intensive use cases such as data analytics and artificial intelligence. Later, AFFs received a significant boost with the introduction of end-to-end NVMe to support latency-sensitive workloads. All of that was achieved without compromising other advantages such as unified SAN/NAS support, storage efficiencies, and data protection. They also came as an integral part of the NetApp Data Fabric architecture, making them capable of doing even more in conjunction with cloud environments and cloud-based services.
What the AFFs couldn’t do was determine how your data was best being used during different periods in its lifecycle. The data that you needed for immediate use was stored no differently than the data that you had in the system for long-term retention. That’s all changed with the introduction of a new powerful data tiering technology and the new Cloud Tiering service for AFF and SSD-backed FAS systems.
In this article we’ll review the importance of data lifecycle management, including the benefits enterprises get from a proper approach to this challenge and how Cloud Tiering offers a cost-effective and intelligent solution to this IT management need.
The Importance of Data Lifecycle Management: Challenges and Benefits
IT heads often encounter all types of challenges and demands that require a practical solution. Handling the lifecycle of a data set effectively and in a cost-efficient way is one of those challenges.
In data lifecycle management, admins need to observe the useful lifespan of their data, from the moment it is generated, passing through periods of sporadic access, to the moment it’s archived or even permanently eliminated. To do that, admins need to define policies to manage, categorize, and prioritize data over its lifetime.
What is the access frequency of the majority of the data?What possible drivers contribute to choosing one data management approach over another? Here are some general questions you can ask to help you gain some insight into how data flows through your organization:
- Is the data accessed frequently or is most of it just sitting around unused?
If new data keeps coming in but is only used for a short time before becoming completely inactive, that is a driver that could shift you to one approach instead of another. Data needs prioritization. How do you intelligently and automatically distinguish between hot and cold data?
- What is the growth rate of the data and is it linked to a current rapid expansion of your organization’s objectives?
You could be forecasting a short or long term need for additional storage capacity.
- What is generating the data?
Is it customer-generated data or is it data coming from development and research teams within your day-to-day organization’s operation, or a combination of both?
Categorizing data for future access is important. There could be inactive data from old projects or data that needs to be kept for regulatory purposes which you can tier down to a lower, more cost-effective level.
- What is your company’s data retention and backup policies towards each kind of data set?
A strong backup policy is very important for redundancy and compliance reasons, but backup hardware is costly and can incrementally increase your TCO (Total Cost of Ownership).
- What are your currently available on-prem storage resources and what is your company’s current budget plan?
If storage expansion is needed due to a rapid increase in data, budget and TCO can drive a decision towards investing short-term in new equipment or shifting to a cloud-based, OPEX option.
- Is your company undertaking a hybrid-cloud approach or planning to migrate to the cloud?
This storage architecture question is another decision driver that can determine if a business prefers to have all its data on proprietary hardware or if they can move some of it to the cloud.
Having a clear understanding of these aspects gives some context on pursuing a certain data lifecycle management strategy over another. Here are some key aspects to keep in mind to summarize:
- Retention, Backup and Archives
- Data Categorization
- Data Prioritization
Data Lifecycle Management with Cloud Tiering Service
With the challenge of ever-increasing amounts of data, on-prem storage system users need to find a way to better use the limited storage space on their storage boxes. Finding a way to identify when data enters the cold portion of its lifecycle is essential in this regard because it helps to determine which data can be moved out of a highly performant storage system, such as NetApp AFFs, to a more cost- and space-efficient capacity storage.
NetApp Cloud Tiering service for AFF and SSD-backed FAS systems can help your business seamlessly integrate data tiering and cost reduction approaches into your data lifecycle management strategy by allowing inactive data on-prem to be automatically tiered to object storage in the cloud.
This service is available for Amazon S3 (ONTAP 9.2 or later) and Azure Blob (ONTAP 9.4 or later), with more clouds on the way. Cloud Tiering, by leveraging NetApp’s powerful FabricPool data tiering technology, automatically and seamlessly tiers cold blocks of data in your AFF or FAS SSD clusters to S3 Standard or Standard-Infrequent-Access storage classes or to Azure Blob’s Hot access tier, depending on your cloud of choice.
Currently, Cloud Tiering leverages two tiering policies:
- Snapshot-Only: Cloud Tiering service tiers only snapshot blocks not related to the active file system which after a two-day cooling period are considered cold blocks.
- Auto-Tiering: Any active data not accessed for a 31-day period will be tiered to lower-cost object storage in the cloud, including snapshot data.
A third option to tier entire volumes of data will be released soon. This option will play a big part in the data lifecycle for projects that consume a lot of data during their creation but will later on need to be shelved, such as volumes used for film production, architecture and building projects, and so on. This policy will play a major role as part of an efficient data protection strategy as well.
If a read request for a cold block of data comes in, Cloud Tiering pulls it from the object storage layer, known as the cloud tier, and puts it in the AFF or FAS system SSD layer, known as the performance tier. This entire process of rewarming data is carried out seamlessly and without any changes needed to be made to the application using the data. Everything happens intelligently in the underlying data layer.
Data lifecycle management can be a major factor in how storage system usage in the data center is optimized. The Cloud Tiering service from NetApp gives you a strong hand with your data lifecycle management strategy. Having this service integrated into your AFF or FAS SSD systems will provide you with different benefits:
- Reducing TCO by an estimated 40% by tiering cold data and freeing up space on your high-performance on-prem storage to use in other workloads.
- By intelligently and automatically tiering the data, IT admins don’t need to worry about which data is cold and which data is hot, solving the data lifecycle’s tiering phase with total ease of management and with no upfront costs.
- Ease of use as a cloud service managed through NetApp Cloud Central, with a centralized management UI that provides automation and reporting capabilities. You can graphically see how much your savings are for the tiered data, how much data is tiered, and how much data is hot, and further savings you can gain by tiering more data.
- A flexible pay-as-you-go model which enables a company to avoid upfront costs involved in provisioning new storage boxes. Cloud Tiering charges only for the data that is tiered.
- A simple way to transition to a hybrid cloud deployment model that works for AFF/FAS clusters on any scale and on a large-scale in particular.