hamburger icon close icon

Collaborate better, everywhere: Data caching with Amazon FSx for NetApp ONTAP

Today’s ever-expanding data estates and distributed teams working remotely have changed the demands put on data. It’s essential to be able to retrieve data rapidly and collaborate effectively on shared datasets across diverse environments, no matter where the data is hosted.

That kind of data distribution can be a nightmare to orchestrate, with issues of data integrity, incompatibility, and performance all causing difficulties. Organizations need ways to accelerate data access and promote data agility and collaboration without those challenges.

NetApp and AWS have partnered to offer the solution: data caching with Amazon FSx for NetApp ONTAP. This blog explores the data caching capabilities of FSx for ONTAP that help address the challenges of working with globally dispersed data.

Read on for the full details, or use these links to jump down to the section that interests you:

The complex challenge of distributed data

Distributed data presents a number of challenges that organizations need to overcome:

  • Data consolidation. When data is spread across locations, you need to consolidate the data from various sources. If you aren’t able to view the data coherently as a single file system, you won’t be able to efficiently read and write the data—or analyze it to derive any meaningful business value from it.
  • Multiple namespaces. Lack of coherent namespaces across data in different sources adds an extra layer of complexity. To be able to access and use data consistently, you need to unify the different naming conventions and structures, but this process can require intensive effort.
  • Performance degradation. Making data available to all your users can be difficult without performance degrading somewhere along the way. The further users are from the dataset, the more latency they will experience.

    You need a fine balance between low-latency access, optimized bandwidth, and cost. Creating data silos across different environments and geographies isn’t a solution; although it’ll give local users faster access to data, it causes synchronization problems.
  • Data replication. Data that’s replicated across multiple environments needs to be consistent and up to date. The biggest risk is that discrepancies can arise during replication, and that opens the door for a lack of data integrity.
  • Cost increases. The distributed nature of edge and cloud systems introduces some cost-related challenges. You might end up with different full copies of data in different locations—all of which you have to pay for. You need to think about managing the cost associated with data transfer and centralized management.

Navigating the complexities of working with distributed data can be extremely challenging. That’s where FSx for ONTAP can help.

Data caching with FSx for ONTAP

Amazon FSx for NetApp ONTAP is the fully managed storage service from AWS that delivers trusted NetApp® ONTAP® data management solutions.

FSx for ONTAP is equipped with data caching capabilities that enable faster access to data and seamless, real-time collaboration across multiple environments. There are two main ways that FSx for ONTAP does this: consolidating data at the edge and caching writable copies of data locally.

Consolidating data at the edge

FSx for ONTAP helps organizations consolidate unstructured data for high-performance experience, operational efficiency, and collaboration at scale by using NetApp Global File Cache (GFC) technology

Using this service, FSx for ONTAP caches only the data that’s required in respective locations and leverages the SMB protocol. It’s transparent to the user; collaboration through GFC feels just like working with local files.

GFC can support hybrid cloud architectures composed of on-premises ONTAP and FSx for ONTAP systems by providing a centralized storage solution with a distributed data cache at edge locations. Users across the globe can access this single set of data, with scalability into the petabytes.

As frequently used data is cached, users get better performance for collaborative apps. The intelligent file-locking feature maintains data integrity even if the data is accessed from multiple locations using a global namespace.

Picture1-Mar-12-2024-12-01-20-2948-PMNetApp Global File Cache high-level architecture

Fully writable cached data in remote locations

FSx for ONTAP lets you create a writable, persistent cache in a remote location with the latest, most consistent and coherent copy of your data. These sparsely populated, writable cached volumes can be used to create a cache on the same system or a different one for quicker data access. NetApp FlexCache® technology makes this possible.

Picture2-1FlexCache in FSx for ONTAP.

The cached data is accessible over NFS and SMB/CIFS, which means you can use the cache data without re-architecting your systems in any way. This data is beneficial in read-intensive environments where data is shared by multiple hosts and accessed more than once.

To optimize the size of the cached data copy, only the data read by the client is cached. Clients can mount any of the volumes to access the same prepopulated, up-to-date data from multiple locations. The cached volume acts as a temporary storage location between a host and the data source, and it stores the frequently accessed data chunks so that they can be served faster than fetching from the source.

Picture3-1Cache copies point to relevant data blocks in the source data to optimize the size of the copy.

Use cases for data caching with FSx for ONTAP

Data caching with FSx for ONTAP can help in a wide range of scenarios:

  • Remote office or branch office (ROBO) locations
  • High-performance computing (HPC) workloads
  • Artificial intelligence/machine learning (AI/ML) and deep learning (DL) use cases
  • Cloud bursting

What you get with FSx for ONTAP and data caching

With FSx for ONTAP, you have a low-overhead solution for all your data caching requirements:

  • Quick access to remote data. Data caching makes remote data available closer to users—with minimal or no additional architectural requirements.
  • High performance. Data caching with FSx for ONTAP eliminates the latency challenges associated with accessing data from across the globe—without compromising data integrity or quality.
  • File locking. The FSx for ONTAP file-locking mechanism prevents parallel write operations that might cause data integrity problems.
  • Zero-touch setup. Datasets in all the different environments, both cached and at the origin, are kept consistent by FSx for ONTAP without any user effort.
  • Data protection and resilience. FSx for ONTAP is highly available and resilient by default, leveraging either a single or multiple Availability Zones to maintain uptime. With its automated cross-regional backup and disaster recovery features, data is available even if corruption or regional disasters occur.
  • Single namespace. FSx for ONTAP solves the namespace issue that occurs when data is stored in multiple locations. Data can be consolidated and accessed through a single namespace without the need for any infrastructure consolidation.
  • Reduced storage costs. Data caching with FSx for ONTAP saves space because it caches only active data, not full copies. Plus, built-in FSx for ONTAP storage efficiency features work with intelligent file caching. That reduces both storage and transfer costs.

How one manufacturer collaborates on AWS with FSx for ONTAP 

One company using the data caching capabilities of FSx for ONTAP is a European manufacturer that focuses on the production of printed circuit board (PCB) equipment. Its worldwide operations are located in more than 40 countries.

But with data in the cloud dispersed across the European Union, United States, and Asia-Pacific, the company was experiencing latency and productivity issues. Teams spread out across ROBO locations couldn’t effectively collaborate on the same data.

FSx for ONTAP with Global File Cache solved the latency issues across the ROBO locations and provided several other advantages:

  • Global access to files. Integration with Distributed File System Namespaces (DFS-N) preserves namespaces and access control lists (ACLs). This means that employees can access files stored on any of the FSx for ONTAP file systems in the global centers as if they were stored locally.
  • File-locking. This feature enables the company’s teams to collaborate on shared project files without making conflicting changes, and it improves productivity among teams located across the globe.
  • The ease of a fully managed service. The underlying resources, software updates, and maintenance are all handled by AWS, taking the operational burden off the customer. Likewise, GFC is a simple add-on that doesn’t require any special end-user training.
  • Cost savings from several factors:
    • FSx for ONTAP applies storage efficiencies and cold data tiering that lower the overall costs of storing shared files.
    • Each GFC instance caches only the frequently accessed files at that edge site, and whenever a file is changed, only the changed blocks are transferred. This approach results in minimal data traffic and egress transfer costs.
    • FSx for ONTAP file shares are protected using cost-efficient NetApp Snapshot™ technology. You don’t need to implement additional data protection solutions at the edge sites.

Bring your data and teams together with FSx for ONTAP

Your teams need a way to collaborate across your entire data estate without running into delays or creating data silos that will drive up costs and harm data integrity. For a diverse data estate, it’s easy to do that with Amazon FSx for NetApp ONTAP.

FSx for ONTAP uses data caching features powered by the NetApp FlexCache and Global File Cache technologies to deliver data caching as a seamless part of a first-party AWS service.

Build reliable distributed data architectures, keep your users in sync, and stop costs from spiraling out of control.

New call-to-action

Yifat Perry, Technical Content Manager

Technical Content Manager