Subscribe to our blog
Thanks for subscribing to the blog.
June 28, 2020
Topics: Cloud Tiering Data TieringAdvanced8 minute read
The concept of data gravity was posited by a software engineer named Dave McCrory back in 2010, to describe how various applications and services are attracted to data in the same way that planets in the solar system are held within the natural gravitational pull of the sun. Since then, we have increasingly started to notice the effect of “data gravity” at play in the IT industry. The more frequent use of the term within the IT industry likely coincided with the exponential growth of data across the board.
This article will take a look at the concept of data gravity in today's enterprise IT, why data creates gravity, and how to mitigate some of the unintended consequences caused by data gravity in the data center through next generation technologies such as NetApp Cloud Tiering.
The “Gravity” of Data
By nature, every object in the universe maintains a gravitational pull to other objects surrounding it. The larger the size of the object, the higher its density or mass, the stronger the gravitational pull that object will exert. This pull will be felt more strongly the closer any other body comes to that object. McCrory argued that this natural phenomena also applies to data in the computing universe. As data accumulates in one place, it is similar to an object building mass. McCrory argued that various applications and services that would typically process and use this data will naturally be attracted to that data as they get physically closer to it.
There are two main reasons why the data has the effect of gravity on applications and services that consume it:
- Latency: This is the delay in reading data from or writing data to its resting location (typically a storage platform), by an application or a service accessing it. Often this access would be via a storage fabric or an ethernet network that exists between the application/service and the storage platform. Due to simple laws of physics, the longer the distance is, the higher the latency would be.
High latency can create a poor performance and user experience on the applications/services that access these data due to constantly having to wait for a write acknowledgement or requested data to arrive. As such, most applications that constantly process, read, and store data that are extremely sensitive to this latency would prefer to be physically closer to the data in order to achieve a lower latency and therefore better performance.
- Throughput: The rate of data that is transferred through a system (such as a computer network) or a process (such as an application or a service). In the context of applications/services accessing data from a storage platform, the throughput is typically limited by bandwidth of the storage fabric/network between the two end points.
Throughput is also closely related to latency, where the longer the distance and higher the latency, the lower the throughput is. Therefore, as a general rule, the closer the data is to its applications/services, the lower the latency is and higher the overall throughput will be.
In the world of enterprise IT, there are plenty of examples of where the gravity of data has always been a deciding factor when it comes to where applications and services accessing that data reside.
Starting from the mainframes to client server computing, data and application services have always been closely co-located and interconnected in one physical location, paving the way for the popular data centers of today that consist of storage platforms that store enterprise data and compute engines (application servers) for processing the adjacent data on the same layer 2 network. Hyper-converged infrastructure (HCI) was practically invented to further bring the data even closer to the compute layer (hypervisor) and the significant adoption and growth in the last five years of HCI industry is a clear example of the effect of data gravity.
Cloud computing—which initially started with cloud-based data storage offerings—took this to a new level. The phenomenal rate of growth in the number of various applications and data services offered on these platforms over the last few years alone is a great example of data gravity at play at scale within the industry. Same is true of various cloud based SaaS platforms such as Salesforce and Office 365 as well as various add-on applications and services which have been enjoying monumental growth since their inception.
Relatively new technologies such as IoT are also seen to be driving forward the adoption of Edge computing which brings small pods of computing physically closer to where IoT sensors are in order to provide quick, low latency local data processing, such as in the case of a Tesla automobile that is packed with compute power to locally analyze tons of sensory data generated in the car for example.
As such, it has been proven beyond any doubt that the effect of data gravity is inevitable: Where the bulk of the data is, applications and services will always tend to follow as a result.
Side Effects of Data Gravity in the Enterprise
While the gravity of data is somewhat inevitable in enterprise IT today, that doesn’t mean it’s without issues that can present challenges to many organizations. One of the most common issues that come with data having such gravity is the restricted mobility of that data and the applications between various different platforms, as well as the associated complexity that translate into increased total cost of ownership for many organizations.
Enterprises gather vast sums of data, generated either from various end user applications or IoT devices for example. They can then derive a tremendous amount of intelligence by storing and analysing this data for patterns and various characteristics, which helps those organizations provide more unique and tailored services to their customers and in turn driving their businesses forward.
However, most modern IT infrastructures are typically spread across multiple platforms (cloud, data center, edge, etc.), across multiple geographies (Availability Zones, Regions, data centers, edge locations), often deployed across different storage platforms (All-flash storage, hybrid storage, NAS, object storage, HCI, etc.) and accessed by multiple applications using various different protocols (SMB, iSCSI, NFS, HTTPS, etc). Data doesn't naturally move freely across all these environments and designing and building distributed data fabrics with the ability to move large amounts of data and work across these multitude of environments cost efficiently is often too complicated, cost prohibitive, slow to implement, and above all else, too risky due to various security and availability concerns.
Each of these side effects has a direct business impact on the organization and their business objectives. Many CIOs & CTOs around the world can testify as to why their cloud migration journey has either stalled or exceeded costs due to the challenges encountered within the cloud data migration phase. As a result, many organizations are struggling to fully realize the benefits of their cloud / digital transformation strategy today.
Embrace Data Gravity with NetApp Cloud Tiering
One of the easiest ways to address the data mobility challenges created by data gravity hotspots is through introducing intelligent data tiering and placement that is transparent to the front end application or the service consuming that data. NetApp Cloud Tiering aims to address this problem by providing a seamless, automated, policy-driven, and intelligent data tiering solution for NetApp AFF and SSD-backed FAS systems.
Cloud Tiering is a key component of the NetApp Data Fabric: an overall solution stack that is designed to minimize or, in some instances, completely eliminate the side effects of data gravity by providing an underlying platform-neutral data pathway that spans across multiple data centers and cloud platforms, across multiple geographies that can store and present data using multiple access protocols to its consumers.
NetApp Cloud Tiering allows infrequently accessed data to be automatically tiered to a much cheaper cloud-based object storage platform such as Amazon S3, Azure Blob, or Google Cloud Storage automatically. The data tiering process is seamless to the applications consuming the data, thus requiring no changes or modifications at the front end, enabling data center customers to embrace cloud platforms without the associated cost. Cloud Tiering also provides tight security, with data encryption both at rest (in the cloud tier) as well as in-flight ensuring that there are no security risks for customer’s valuable enterprise data, when it’s spread across both the data center and the cloud.
Much like how HCI blurred the lines between compute and storage by bringing the data storage right next to the compute subsystem, thereby embracing the data gravity, NetApp Cloud Tiering brings various cloud object storage tiers directly into the primary storage system itself which, in most cases would already be physically adjacent to the compute subsystems in the data center. This provides customers with 50x more space on their existing data center storage platform with no additional spend on the hardware, while also providing up to 80% capacity savings to NetApp AFF customers.
Find Your (Data) Center of Gravity
Enforced by the fundamental laws of physics, data gravity is an inevitability: applications and services are forced to follow the mass of data they interact with. This presents a number of side effects, but NetApp Cloud Tiering helps address some of those side effects and enables customers to embrace cloud data mobility challenges without unnecessary headaches and achieve TCO savings of up to 30%.
For additional information, refer to key use cases for tiering data to cloud here. For the NetApp Cloud Tiering overview and to get started with a free trial, visit the Cloud Tiering page here.