BlueXP Blog

How to Handle Cloud Storage Cost Sprawl

Written by Gali Kovacs | Mar 4, 2018 8:11:15 AM

Sprawl in the IT world isn’t new: We know it well from physical servers in data centers, as well as VMs in virtualized environments. However, the public cloud — with its low friction, pay-as-you-go, micro-pricing model — has introduced a new sprawl challenge: Storage cost and cloud sprawl.In the cloud there are no “free lunches”—every running OS instance, every gigabyte of storage, and every gigabit of transmitted data costs money.

The cloud sprawl problem is exacerbated by two major factors: Shadow IT — resources provisioned by development (or other) teams without explicit organizational approval — and the unprecedented volume, velocity, and variety of data that has to be managed across diverse cloud services, each with its own API and pricing policies.

It’s no wonder, therefore, that the public cloud bill at the end of the month is often much higher than planned. And what is particularly irksome is that it frequently includes wastage.

What’s the best way to handle the cloud data storage cost and cloud sprawl? This blogpost will explore the special challenges and risks of cloud storage cost sprawl, and describe how Cloud Volumes ONTAP (formerly ONTAP Cloud) help to meet those challenges and reduce these risks.

More and More Data

The growth in data is breathtaking, with the digital universe expected to double every two years at least—amounting to a 50X increase from 2010 to 2020. This exponentially-growing amount of data (streams, files, objects, records, and so on) needs to be stored, protected, and, in many cases, to be accessible to multiple workloads. As a result, data movement and cloud data processing can become highly complex.

The figure below shows how a single application running in the AWS cloud may require in many cases a multitude of siloed services to handle its data and workflow—consuming a great deal of capacity and processing resources.

Complexity of siloed services to support an application. (Source: AWS)

A Wealth of Cloud Data Storage Services

AWS, Azure, and other public cloud providers offer a rich variety of cloud data storage and processing services and platforms—each with its own API and pricing policies. If we take file or object storage as an example, both AWS and Azure give the customer options that reduce the cloud data storage cost for infrequently accessed “cold” data, at the expense of less speedy data accessibility.

Conversely, the customer can choose to pay more for the storage of “hot” data with low-latency access that enhances the performance of apps or ongoing dev/test projects.

The table below summarizes the key options for each provider:


 


AWS

Azure
Blob Storage

Amazon EFS:
Scalable network file storage for Amazon EC2 instances

Hot

Amazon S3:
Scalable, highly durable object storage

Cool

Amazon Glacier:
Low-cost storage for archival or backup purposes

Archive

The Cost Impact of Choosing the Wrong Service for the Wrong Use Case. However, each of these services has its own API and a complex pricing plan based not only on storage capacity, but also on network throughput and the number and type of requests.

This siloed approach makes it difficult to efficiently manage cloud storage and processing costs, and contributes to cloud storage sprawl. There are a wide variety of data use cases, ranging from primary, real-time usages, such as transactional or search data, to diverse secondary, offline use cases such as backups, file shares, dev/test, disaster recovery, analytics, and more. Each use case has its own requirements for response time, cost, and retention. This is an example of where data sprawl can take place.

The following chart clearly shows the significant cost differences among various on-premises and cloud-based solutions for storing 1PB of data over a period of three years:

Cost to Store 1PB Over Three Years (Source: Storage Switzerland)

With secondary data storage use cases making up about 80% of an enterprise’s data requirements, using storage solutions that are meant for primary use cases for secondary use cases can needlessly rack up substantial costs.

Compare, for example, two data storage cases, each of which has a total capacity of 10 TB. However, Case 1 has an object size of 2KB and 500 reads/writes per second while Case 2 has an object size of 64KB, but only 50 reads/writes per second:

 

Object Size

Reads/Writes Sec

Total Capacity

 Case 1

2KB

500 of each

10 TB

 Case 2

64KB

50 of each

10 TB

If you take into account the end-to-end costs (capacity, requests, transfer out), then Case 1 (with the high level of reads and writes) costs less than half on Amazon DynamoDB versus Amazon S3. Case 2, on the other hand, costs more than three times higher on Amazon DynamoDB versus Amazon S3:

 

Amazon S3

Amazon DynamoDB

Case 1

$0.76/GB/month

$0.34/GB/month

Case 2

$0.18/GB/month

$0.41/GB/month

Dealing with Cloud Sprawl

To deal with cloud sprawl, there are tools and services provided by the cloud providers themselves, such as server tagging, through which IT can limit resources being provisioned by others. When IT management sets limits on the number of CPUs, amount of memory, amount of storage, or bandwidth that can be used at any given point in time, then developers do learn to conserve these resources and use them more efficiently.

However, the root cause of cloud storage cost sprawl continues to be paying for resources that aren’t being used at all or aren’t being used efficiently—forgotten or abandoned workloads, lack of data tiering, duplication of data across the different architectures and repositories maintained for the different users of the same data.

IT teams struggle to keep track of what data is stored where and for what purpose. According to RightScale’s 2017 state of the cloud report, on average organizations underestimate their cloud waste by more than 30%. To address the root cause, you need a powerful enterprise data management platform such as NetApp’s Cloud VolumesONTAP.

With its rich set of cost-saving storage efficiency features for both AWS storage, Azure storage, and for private cloud storage, Cloud Volumes ONTAP cuts down on cloud data storage cost and cloud sprawl. This is thanks toCloud Volumes ONTAP’s full set of cost-cutting storage efficiencies, which work with Azure and with AWS, including:

  • Automated data tiering: Cloud Volumes ONTAP automatically tiers your backups or infrequently accessed data to a cheaper storage tier, and automatically moves it back into higher performance tiers as needed.
  • Thin provisioning: Avoids pre-allocated storage capacity per app—much of which will unoccupied at any given point in time — by allocating storage capacity dynamically from a single shared storage pool only when data is actually being written to a volume.
  • Data compression, deduplication, and compaction: These inline, adaptive processes work together to reduce storage costs by: compressing eligible data blocks by 50% or more; saving pointers to duplicate blocks in the storage media rather than write the block again; and fits smaller chunks into 4KB physical blocks before sending the block to storage.
  • FlexClone®: Leveraging NetApp’s proprietary, storage-efficient Snapshot® technology, Cloud Volumes ONTAP’s FlexClone lets you create cloned volumes instantly (which also saves you time) with virtually no increase in storage needs. Data sets can be used for multiple purposes without adding to storage costs.

The OnCommand® Cloud Cloud Manager (OCCM) provides an intuitive, single-pane data management interface across hybrid and multi-cloud environments—allowing an enterprise to optimize its data storage costs even further by seamlessly leveraging both on-premises and cloud-based storage solutions.

With its newly enhanced Cloud Storage Automation Report, OnCommand Cloud Manager can show you recommendations for how to save on storage.

By identifying and highlighting unused or underutilized volumes, unattached volumes, and unassociated snapshots, the Cloud Storage Automation Report helps you easily delete such abandoned or forgotten resources, and more.

A Final Note

We have seen that cloud storage costs can easily spiral out of control if data storage use cases are not managed efficiently across the entire enterprise.

Cloud sprawl is a major concern for any organization. As IT departments strive for ever higher levels of visibility and automation, they turn to powerful, enterprise-grade data management platforms such as NetApp’s Cloud Volumes ONTAP and its data compression, deduplication, and thin provisioning storage efficiency features.

We invite you to use our AWS calculator and Azure calculator to see how layering Cloud Volumes ONTAP over the public cloud services can provide cost effective cloud storage.