December 1, 2019
Topics: Cloud Tiering Data TieringGoogle CloudAdvanced7 minute read
Object storage offers the best price and reliability for storing virtually unlimited amounts of unstructured data in the cloud. It does so without the constraints and limitations of traditional file systems or block storage systems. All data is stored in objects using a universally unique identifier and optionally tagged with additional metadata. High availability, extreme durability, and strong consistency are some of the key features expected with object storage.
Any serious cloud provider today offers an object storage solution, and Google Cloud is no exception with its Google Cloud Storage (GCS) service. With the new support in NetApp Cloud Tiering for GCS, the on-premises high-performance NetApp AFF and SSD-backed FAS storage systems can now automatically offload infrequently-accessed data to Google Cloud for increased overall storage and cost-savings.
This blog post will introduce Google Cloud Storage and how to adopt a hybrid cloud storage solution by integrating it as a capacity tier for NetApp AFF and SSD-backed FAS systems through the Cloud Tiering service.
What is Google Cloud Storage?
Google Cloud Storage is an object storage solution providing exabyte-scale storage to businesses of any size. It offers multiple storage classes depending on the use case, all with the same eleven 9's of durability (99.999999999%). In addition to storing data in one of the many available regions, GCS also supports storing data in geo-redundant fashion, by automatically replicating it across dual or multiple regions for decreased latency and increased availability.
Data can be stored and retrieved using a unified API regardless of storage class or location. In addition to its RESTful JSON API, an Amazon S3-compatible XML API is also available. Google Cloud also provides a command-line interface (CLI) and client libraries for C++, C#, Go, Java, Node.js, PHP, Python, and Ruby.
While there are a plethora of use cases for Google Cloud Storage, some of the more common include:
- Low-cost storage for backup and disaster recovery.
- Long-lasting log and data archives.
- Data lakes for analytics and machine learning.
- Storage and delivery of content to apps and websites, such as images or music and video streaming, across geographic regions.
In GCS, buckets are where objects are stored. While the geographic location of the bucket cannot change after creation, the storage class is per-object, with a default storage class defined in the bucket. While a single object has a limit of 5 TB, there is no limit to the total storage capacity or the number of objects in a bucket. A newly created bucket will support around 1000 object writes, and 5000 object reads per second, but this number will automatically scale as the request rate for the bucket grows.
Storage Classes in Google Cloud Storage
Storing infrequently accessed objects is a use-case that Google Cloud takes seriously. It provides two storage classes precisely for this purpose, in addition to its standard storage class used for frequently accessed objects. These different classes allow for cost savings based on the data retention needs and forecasted object access frequency.
All storage classes offer the same millisecond-access responsiveness so that objects can be retrieved immediately. Where they differ, is in the minimum storage duration, data retrieval costs, and API operation costs. The lower the at-rest data storage cost is, the higher the associated expenses are to read and write the objects.
The Nearline storage class is ideal for objects that will be accessed less than once per month and need to remain available and unchanged for at least 30 days. The at-rest storage price is between $0.01 and $0.02 per GB/month, depending on the location and geo-redundancy settings. The price for object write operations (Class A Ops.) is $0.10 per 10,000, while object read operations (Class B Ops.) are $0.01 per 10,000. In addition to the standard network charges, there is the added cost of $0.01 per GB of data retrieved. Billing always includes a minimum at-rest storage duration of 30 days, even if the deletion of the object occurs before 30 days. Beware that modifying an object is effectively the same as deleting the old one because objects in GCS are immutable.
All this makes Nearline the right choice for recent backups, rarely accessed files and any other kind of content that although infrequent, is retrieved a few times per year.
The Coldline storage class offers the lowest at-rest storage price, at only $0.004 to $0.014 per GB/month, again depending on location and geo-redundancy. Write operations have the same cost of $0.10 per 10,000 operations as in the Nearline class but read operations are five times more expensive at $0.05 per 10,000. Similarly, the data retrieval cost also goes up fivefold at $0.05 per GB. The minimum object storage duration for Coldline objects is 90 days.
Coldline is best for long-time storage and archiving, of objects that are expected to be accessed less than once per year, such as for archive or tape migrations as well as for disaster recovery.
In addition to Google Cloud Storage being available in 20 regions across five continents, buckets can also be multi-region. A multi-region bucket spans a large geographic area, by automatically replicating data across various data centers within different regions. At the moment, three multi-region locations are available to GCS users: United States, European Union, and Asia.
The advantages to a multi-region bucket include increased availability defined in the Google Cloud Storage SLA, from 99% in a single region to 99.9% for a multi-region bucket. Latency performance is also better, but only for data accessed by users spread across a wide geographical area, such as an entire continent.
If the access to data will be primarily from the same location, then choosing a single-region can provide additional cost-savings, without sacrificing data durability.
Cloud Tiering with NetApp AFF
While NetApp AFF and SSD-backed FAS systems offer the best possible performance and lowest latency, not all data is accessed with the same frequency. However, it must reside on those systems at all times. The Cloud Tiering service, powered by NetApp's FabricPool technology, automatically and seamlessly moves data between ONTAP’s performance tier (SSDs) and a cloud tier (object storage) such as GCS. This "Lift and DON'T shift" approach requires no changes to the applications using the storage systems.
With Cloud Tiering’s intuitive GUI, it is possible to enable tiering per volume by choosing an appropriate volume tiering policy. The tiering policy defines which type of cold data should move to the cloud, thus freeing space in the SSD-backed performance tier. The available tiering policies are:
- Snapshot only: As the name implies, only cold Snapshot blocks in the volume are tiered to cloud storage. Tiering only happens when the aggregate capacity of the volume is over 50%, and after the snapshot data has reached its cooling period, which is set at two days, by default. If and when there is a read of a cold Snapshot block, it becomes hot and is automatically tiered back to the performance tier.
- Auto: In this tiering policy, all cold blocks are tiered to cloud storage. It includes not only Snapshots but also all cold data in the active file system. In the case of a random read, data becomes hot—just like in the previous policy—and moves to the performance tier. However, this tiering policy is intelligent enough to distinguish sequential reads, such as an index or antivirus scan, and maintain the data cold (i.e., in the cloud tier). Data is only considered cold after a cooling period of 31 days, by default, and is also only tiered if the aggregate capacity of the volume is over 50%.
- All: This policy tiers the entire volume to the cloud, and is useful in cases where completed projects (such as film projects in the media and entertainment industry) that occupy entire volumes are only needed for historical purposes and review.
Based on industry standards, we estimate that about 70% of data on an ONTAP cluster is cold data that could benefit from Cloud Tiering, offering significant cost-savings. By more efficiently utilizing your on-premises ONTAP clusters, you can switch from a CAPEX spending model to an OPEX model. This will decrease your upfront investments and transfer them instead to ongoing monthly cloud charges. The total cost of ownership (TCO) for AFF storage also declines, especially when long-term data retention is needed.
Start Cloud Tiering with Google Cloud Storage
Starting with ONTAP 9.6, it is possible to configure Cloud Tiering with GCS. All that is required is the installation of the NetApp Service Connector inside your Google Cloud Platform VPC, which will connect to the on-premise ONTAP cluster. Using Cloud Tiering, tiering will be done based on the selected volumes and their associated tiering policy.
Currently, the Standard access tier is supported, Nearline and Coldline storage classes are soon to come, as are single and multi-region buckets.
For users of ONTAP clusters, enabling Cloud Tiering could not be easier. It quickly provides excellent benefits in terms of cost, all without any management overhead or application changes. From your Cloud Tiering dashboard, you can see the data-saving opportunities, and–once tiering is enabled–the current savings both in terms of capacity and monetary cost.