hamburger icon close icon

Looking for a Latency Reduction in AWS? Reduce Cloud Latency with Cloud Volumes ONTAP

Latency is defined as the time taken for a data transaction request to complete a round trip between the sender and the receiver. The shorter the time the better. Latency is a key aspect in judging a system’s performance and it becomes even more important for crucial business applications such as transactional databases. Since latency directly impacts user experience, any latency reduction that can be achieved is more than welcome for high-demanding use case such as transactional workloads.

In this article we are going to jump into some general aspects about latency and show how Cloud Volumes ONTAP uses intelligent NVMe caching to minimize cloud storage latency and dramatically improve application performance and user experience.

The Challenges of Latency

Performance is measured by two main factors: throughput and latency. If the bandwidth is the total size of the pipeline, throughput is the maximum amount of data over a specific period of time which can traverse that pipeline, the total bandwidth being the maximum throughput possible. So, latency and throughput have a close relationship: the higher the latency, the lower the throughput. You can basically have a situation where a very high bandwidth network is under-performing a low bandwidth link that has lower latency and higher throughput.

It is important to understand that on a packet journey there are multiple hops or multiple points that together add to the latency counter:

  • Physical medium: The different physical media used to transport the data has an inherent latency. For example, light takes around 200 microseconds to travel through one meter of fiber optic cable.
  • Routers: Data using the public internet has to go through routers which use BGP (Border Gateway Protocol) as their routing protocol. Packets need to be analyzed at each router they go through. A path also has to be selected, and it may not always be the fastest one.
  • Hardware resources on a given device: CPU over utilization, RAM memory constraints, network congestion, and storage medium are examples of hardware resource factors to take into account when considering latency.
  • Type of application being used: In a multi-tiered application, when a request hits the destination server, it has to go through different layers. For example, the request could first hit a web server which has to perform an operation and then fetch data from a database residing on a different server.

There are some other factors such as firewalls, QoS, load balancers, and specific server and app configurations that could optimize or worsen the total latency within a network. If we are talking about the public cloud, problems with application performance related to cloud latency may translate into a bad user experience which, if considerable enough, could lead to customer churn to competitors and loss of credibility and image.

Cloud Latency

With more and more companies relying on the public cloud to either run their applications or store their data, latency reduction over these networks is a major concern. The same as with privately owned networks, cloud latency is related to multiple contributing factors:

  • How close the end users are from the cloud entry point.
  • The type of connection with your cloud provider. For example, you can connect to your AWS services either through a Direct Connect private WAN link or go through a public ISP that gets you to your VPC Internet Gateway.
  • Additional hops within your cloud network such as load balancers, WAF protection, and jumps between multi-tiered app layers.
  • Type of storage behind your cloud-based apps. AWS offers EBS volumes backed by SSD drives which are IOPS optimized or HDD drives which are more cost effective but offer less I/O performance.

If you are in charge of a high-performance application which depends more on the IOPS parameter, such as a transactional workload, and you are trying to figure out how to improve latency times, choosing optimized cloud storage is one of the keys to achieve that goal.

How to Improve Latency Using Amazon EBS Volumes and Cloud Volumes ONTAP with Intelligent NVMe Caching

AWS EBS volumes offer two main kinds of block storage: IOPS-optimized backed by SSDs or throughput-optimized backed by HDDs. Amazon EBS volumes backed by SSDs are designed for latency-sensitive workloads with a high number of IOPS. There are two disk type options you can choose from within this category:

  • Provisioned IOPS SSD (io1)
  • General Purpose SSD (gp2)

Applications attached to Provisioned IOPS volumes can achieve single digit millisecond latencies, which is ideal for a lot of short and small I/O operations that characterize transactional workloads. The combination of EBS-Optimized EC2 instances and io1 EBS volumes achieves the best possible latency times at the storage and compute layers.

But you can still go a step further and improve your latency time even more. With Cloud Volumes ONTAP NVMe caching, you can reduce latency to as low as 0.2 milliseconds, dramatically accelerating the response time and improving application performance for your transactional workloads. NetApp intelligent NVMe caching can handle I/O operations much more effectively due to the underlying technology improvements provided by the NVMe, such as larger command queues and parallelism.

To make use of intelligent NVMe caching, NetApp Cloud Volumes ONTAP needs to be deployed on an Amazon EC2 instance with NVMe storage and attached to the EBS volume. Currently, NVMe caching is supported by Cloud Volumes ONTAP version 9.5 P2 and above, with a Premium license, deployed on one of the following Amazon EC2 instance types:

Instance Type NVMe Instance Store Volume
c5d.4xlarge 400 GB
c5d.9xlarge 900 GB
r5d.2xlarge 300 GB

During operation, Cloud Volumes ONTAP caches data that the application frequently read from the Amazon EBS volumes on its local NVMe instance store. Whenever the application makes a read-request for data that is already cached, Cloud Volumes ONTAP responds to the request directly from its NVMe cache and thus there is a considerable latency reduction. At the same time, this caching offloads some of the work from the underlying Amazon EBS volumes, allowing them to handle write requests or additional non-cached read requests, improving overall performance even more.

The main reason to use NVMe caching is to accelerate the performance of read-intensive transactional workloads. But you can also utilize this capability to store your data on lower-cost Amazon EBS volume types. Using Cloud Volumes ONTAP with NVMe caching to manage General Purpose SSD (gp2) data accelerates the read response time without spending I/O credits. The credits you save allow you achieve a higher performance baseline for the write requests and non-cached reads.


In today’s high-demanding application performance environments, having a lag in response time is not an option. As we’ve seen, latency is a concept involving different sections in a data packet round trip, each one adding to the counter. Writing and retrieving data from the storage medium is one of them. With Cloud Volumes ONTAP the latency consumed in this part of the process is minimized through the locally attached NVMe instance store.

Not only does Cloud Volumes ONTAP help you reduce latency by using this technology, it also lowers your cloud data storage costs with built-in deduplication, compression, thin provisioning, and data tiering between AWS storage tiers. Additionally, the Cloud Volumes ONTAP HA configuration for AWS enables business continuity by maintaining a synchronized copies of the data in two different AWS Availability Zones. This high-availability configuration supports automatic failover with no data loss and recovery times of less than a minute.

New call-to-action
Robert Bell, Product Evangelist

Product Evangelist