BlueXP Blog

Azure Disk Performance: How to Analyze and Monitor Issues

Written by Aviv Degani, Cloud Solutions Architecture Manager, NetApp | Feb 5, 2019 2:24:32 PM

Selecting the right Azure disk type and monitoring the disk performance for any issues that could impact your application is very important in any enterprise Azure deployment. Azure offers premiums disks using SSDs in the backend that can provide up to 80,000 IOPS per VM. Optimal configuration, management, and monitoring are still required to ensure that the application does not get hit by disk performance bottlenecks.

This blog will explore the factors that could affect performance of disks in Azure VMs, how we can monitor disks to eliminate possible performance issues, and discuss the benefits of using Cloud Volumes ONTAP for Azure.



Detecting Performance Issues with Azure Disks


In the backend, Azure disks are .vhd files stored in page blobs of Azure storage. Azure managed disks and unmanaged disks are two Azure disk service models that can be attached to VMs, with two performance levels: Standard and Premium. Standard storage offers HDD and SSD Azure disks, while Premium storage offers SSD disks with assured performance levels based on the selected disk SKU. Read more about Azure storage and Azure disk types here.

Azure disk performance levels can be affected by factors such as Azure storage limits, storage throttling, VM scalability targets, cache restriction, and workload demands. While Azure disk encryption should be enabled for the security of the data stored on the disks, that does not usually lead to performance issues. Let us explore some of these possible causes for Azure disk performance issues.



Azure Storage Limits


Azure storage is subjected to certain data ingress and egress limits for data access requests. Once an unmanaged disk reaches a storage scalability target, any incoming IO requests will be queued, which will lead to performance issues.



Azure Storage Throttling


VMs using Premium storage could hit performance bottlenecks if the IOPS and throughputs defined by the VM and disk SKUs are exceeded. Applications performing more I/O operations with large IO units could choke the throughput limit of the VM, which is the amount of data that can be sent by application to the underlying storage disk. Azure storage will throttle the IOPS and throughput if it exceeds the limits implemented by the VM and Disk SKUs, which could lead to application performance degradation.



VM Scalability Targets and Workload Demands


The VM size selected will determine the compute and storage capacity available for your application. When it comes to disks, VM size also determines the number of disks attached and the maximum possible IOPs and throughput supported. To avoid disk performance issues, make sure to select the VM SKU that can accommodate your workload’s highest performance demands.



Cache Restriction


Disk caching helps to improve performance of VMs that use Premium storage disks. OS disks use a default cache setting of ReadWrite and data disk uses ReadOnly. The cache setting of data disks should be configured according to the nature of the workload. Incorrect cache settings could lead to disk performance issues and data loss. For example, setting ReadWrite as the cache setting on a data disk should only be done in cases where the application can transfer the cached data to persistent disks properly. Note that enabling ReadWrite caching may cause data loss if the application doesn’t support it.



Analyzing Azure Disk Performance Issues


While Azure disk performance could be affected by any of the above discussed factors, Azure storage monitoring along with deep analysis can help to troubleshoot these issues. It is easy to analyze, identify and remediate these issues using tools such as disk storage diagnostics, Azure Monitor, and Log Analytics.



Azure Storage Diagnostics


While using unmanaged disks, the metrics information from the underlying storage can help in identifying resource contention that could lead to disk performance issue. For example, a high ingress/egress traffic to storage could eventually lead to performance issues for disks stored in that Azure storage account. Note that the diagnostics metrics should be enabled at the storage level to view this information. A sample of Azure storage performance metrics chart is given below:




Azure storage performance metrics.

Azure Monitor


Azure Monitor provides disk-level metrics that help isolate performance issues at the OS or the individual data disk level. This feature is currently in public preview. The GA feature provides an aggregate of metrics of all disks attached to the VM. The information is available from Azure Monitor by accessing the metrics feature. It allows you to drill down to the target OS or data disks and then choose the Disk Read or Write bytes/sec. Any IOPS congestions could be flagged from this metrics report and an alert can be sent to administrators to investigate it.




Azure Monitor disk metrics.

Log Analytics


Performance counter monitoring in Azure Log Analytics helps to get insights into disk performance counters and alerts you about any possible performance issues. A search query that looks for “disk transfers/sec” counter can be created in Log Analytics to view the IOPS status of a disk.




The Log Analytics search screen.

This can further be drilled down to each logical disk level to get better insights into disk performance issues.



Improving Azure Disk Performance


Disk performance in Azure can be improved by following a set of design and implementation best practices for Azure disks. These include the following:

Performance benchmarking: Do a performance benchmark of your application during peak utilization hours, especially for key indicators like IOPS, latency, and throughput. It is prudent to select a VM SKU and disk size that offer values above the perk utilization benchmark.

Disk striping: Higher IOPS and throughput limits can be achieved by using multiple disks and then striping them. Depending on the IO pattern, choose either a small or large stripe size. While applications with random IO patterns will benefit from smaller stripe size, it is recommended to use larger stripe size for sequential IO patterns.

Monitoring and alerting: Storage and disk performance metrics can be monitored via Azure Monitor, Log Analytics etc. They can help identify bottlenecks and patterns of performance issues. These tools also help create alerts that will notify administrators when to take remedial actions.

NetApp Cloud Volumes ONTAP: Cloud Volumes ONTAP provides enterprise class data management capabilities in Azure. Multiple disks are used to create aggregates to provide storage to Cloud Volumes ONTAP. This helps in distributing the requests to multiple disks in the backend, thereby improving performance. Cloud Volumes ONTAP uses WAFL (Write Anywhere File layout) to support large-scale high-performance volumes with faster consistency checks and quick restarts in the event of failures. With WAFL, write operations are done in a single sequential consistency point. This improves write performance as each client request need not be acknowledged independently.

Additionally, the snapshot capability provided by Cloud Volumes ONTAP helps to quickly back up data with zero performance impact. Cloud Volumes ONTAP’s storage tiering feature makes it possible to automatically tier data between high performance Azure disk block storage and a capacity tier that leverages Azure Blob object storage based on access patterns, frequency, and usage. Your disk performance is never impacted during the tiering process, and in the meantime, you’re ensured optimal storage usage and economy.

Another feature in Cloud Volumes ONTAP that helps to improve data write performance is the high write speed option. With this option, data is cached in memory before written to disk, thereby boosting write performance. In this way, using Cloud Volumes ONTAP can significantly improve the disk performance of Azure VMs and help organizations to confidently host mission-critical applications in the cloud.



Conclusion


In this blog we discussed some possible reasons for Azure disk performance issues and their mitigation paths. With proper planning and implementation, the Standard and Premium disk offerings from Azure can be leveraged to give the best possible performance per your application’s requirements.

In addition to its cost-cutting efficiencies and the new high availability option, Cloud Volumes ONTAP for Azure combines the capabilities of Azure disks and adds on proprietary NetApp data management technologies to improve disk performance.