BlueXP Blog

HPC on Azure: Best Practices for Successful Deployments

Written by Jeff Whitaker, Cloud Data Services | Sep 22, 2020 6:49:28 AM

What Is High Performance Computing (HPC) on Azure?

HPC systems are systems that you can create to run large and complex computing tasks with aggregated resources. These systems are made up of clusters of servers, devices, or workstations working together to process your workload in parallel.

While traditional HPC deployments are on-premises, many cloud vendors are beginning to offer HPC-compatible Infrastructure as a Service (IaaS) offerings, and Azure is one of them. With Azure resources, you can create a pure cloud HPC deployment or you can create hybrid deployments supplemented by cloud resources.

In Azure, you can run both massively parallel workloads and large workloads that cannot run in parallel with high speed interconnects. This results in shorter processing times and an ability to perform significantly more complex operations with compute-optimized virtual machines (VMs) or GPU-enabled instances.

This is part of an extensive series of guides about managed services.

In this article, you will learn:

Azure HPC Components

While HPC deployments in Azure can vary according to your specific workload needs and budget, there are some standard components in any deployment. These include:

  • Azure Resource Manager—enables you to deploy applications to your clusters via script files or templates.
  • HPC head node—enables you to schedule jobs and workloads to your worker nodes. This is a virtual machine (VM) that you use to manage HPC clusters.
  • Virtual Network—enables you to create an isolated network for your clusters and storage through secure connections with ExpressRoute or IPsec VPN. You can integrate established DNS servers and IP addresses in your network and granularly control traffic between subnets.
  • Virtual Machine Scale Sets—enables you to provision VMs for your clusters and includes features for autoscaling, multi-zone deployments, and load balancing. You can use scale sets to run several databases, including MongoDB, Cassandra, and Hadoop.
  • Storage—enables you to mount persistent storage for your clusters in the form of blob, disk, file, hybrid, or data lake storage.

Related content: read our guide to HPC storage.

Managing Azure HPC Deployments

Azure offers a few native services to help you manage your HPC deployments. These tools provide flexibility for your management and can help you schedule workloads in Azure as well as in hybrid resources.

Microsoft HPC Pack
A set of utilities that enables you to configure and manage VM clusters, monitor operations, and schedule workloads. HPC Pack includes features to help you migrate on-premises workloads or to continue operating with a hybrid deployment. The utility does not provision or manage VMs or network infrastructure for you.

Azure CycleCloud
An interface for the scheduler of your choice. You can use Azure CycleCloud with a range of native and third-party options, including HPC Pack, Grid Engine, Slurm, and Symphony. CycleCloud enables you to manage and orchestrate workloads, define access controls with Active Directory, and customize cluster policies.

Azure Batch
A managed tool that you can use to autoscale deployments and set policies for job scheduling. The Azure Batch service handles provisioning, assignment, runtimes, and monitoring of your workloads. To use it, you just need to upload your workloads and configure your VM pool.

Azure for the Semiconductor Industry
Azure provides a high performance computing (HPC) platform that comes with high availability and scalability, and it is available for users worldwide. The platform infrastructure is secure and provides fully managed supercomputing services.

Azure HPC workloads offer machine learning, visualization, and rendering, all of which can be leveraged for applications in the semiconductor industry. This enables seamless and resilient cloud integration of oil and gas workloads, as well as cloud-based genomic sequencing and semiconductor design.

Related content: read our guide to HPC use cases.

Best Practices for Azure HPC Deployments

When using HPC in Azure, these best practices can help you get the performance and value you expect.

Distribute Deployments Across Cloud Services

Distributing large deployments across cloud services can help you avoid limitations created by overloading or relying on a single service. By splitting your deployment into smaller segments, you can:

  • Stop idle instances after job completion without interrupting other processes
  • Flexibly start and stop node clusters
  • More easily find available nodes in your clusters
  • Use multiple data centers to ensure disaster recovery

When splitting services, aim for a maximum of 500 VMs or 1000 cores per service. If you deploy more resources than this, you may run into issues with IP address assignments and timeouts. You can reliably split deployments across up to 32 services. Larger splits are untested.

Use Multiple Azure Storage Accounts for Node Deployments

Similar to spreading deployments across services, it’s recommended to attach multiple storage accounts to each deployment. This can provide better performance for large deployments, applications restricted by input/output operations, and custom applications.

When setting up your storage accounts, you should have one account for node provisioning and another for moving job or task data. This ensures that both provisioning and data movement are consistent and low latency.

Increase Proxy Node Instances to Match Deployment Size

Proxy nodes enable communication between head nodes you are operating on-premises and Azure worker nodes. These nodes are attached automatically when you deploy workers in Azure.

If you are running large jobs that meet or exceed the resources provided by the proxy nodes, consider increasing the number you have running. Increasing is especially important as your deployment gets bigger.

Connect to Your Head Node With the HPC Client Utilities

The HPC Pack client utilities are the preferred method for connecting to your head node, particularly if you are running large jobs. You can install these utilities on your users’ workstations and remotely access the head node as needed rather than using Remote Desktop Services (RDS). These utilities are especially helpful if many users are connecting at once.

HPC on Azure with NetApp Azure Files

Azure NetApp Files is a Microsoft Azure file storage service built on NetApp technology, giving you the file capabilities in Azure even your core business applications require.

Get enterprise-grade data management and storage to Azure so you can manage your workloads and applications with ease, and move all of your file-based applications to the cloud.

Azure NetApp Files solves availability and performance challenges for enterprises that want to move mission-critical applications to the cloud, including workloads like HPC, SAP, Linux, Oracle and SQL Server workloads, Windows Virtual Desktop, and more.

In particular, Azure NetApp Files allows you to migrate more applications to Azure–even your business-critical workloads–with extreme file throughput with sub-millisecond response times.

Learn More About HPC on Azure

Read more in our series of guides about HPC on Azure

Migrate Legacy Apps to Cloud
Many organizations rely on legacy applications, those applications that are dependent on traditional IT structures. This reliance can make migrations to the cloud seem impossible or problematic for many companies. However, you can lift and shift applications to the cloud, gaining the benefits of cloud operations for other workloads while protecting your legacy investment.

In this article you’ll learn about what’s involved when moving legacy applications to the cloud, how to address cloud migration concerns, and how to speed your migration process with Azure NetApp Files.

Read “Migrate Legacy Apps to Cloud” here.

Solve Azure HPC Challenges eBook
In this eBook you’ll learn how to meet the requirements for an HPC file system without code changes, how to reduce your computational time, and how 3 EDA and Oil and Gas operations have successfully used Azure HPC.

Read “Solve Azure HPC Challenges eBook” here.

Solve Azure EDA Workload Challenges Guide
In this guide you’ll learn how semiconductor developers can gain low-latency, high performance and agility with Azure, what solutions are available to help you scale while controlling performance and costs in the cloud, and how Azure NetApp Files can optimize EDA workloads.

Read “Solve Azure EDA Workload Challenges Guide” here.

How Azure NetApp Files Supports HPC Workloads in Azure
To maximize the benefits of HPC resources in Azure, you need storage resources that can match HPC performance and resilience. While you can combine multiple Azure storage services to achieve this, it may be easier to centralize your storage management with Azure NetApp Files.

In this article you’ll learn how Azure NetApp Files complements HPC workloads, how to manage Azure NetApp Files, and how Azure Storage with NetApp works.

Read “How Azure NetApp Files Supports HPC Workloads in Azure” here.

Energy Leader Repsol Sees a Surge in Performance in Azure NetApp Files
Energy sector operations demand performance and reliability that are far beyond that of many organizations. This industry is one of the main users of HPC deployments and companies in it, like Repsol, demand innovative, effective solutions, like Azure NetApp Files.

In this article you’ll learn how Repsol leveraged Azure NetApp Files to manage mission-critical HPC applications and increase workload performance.

Read “Energy Leader Repsol Sees a Surge in Performance in Azure NetApp Files” here.

Chip Design and the Azure Cloud: An Azure NetApp Files Story
Chip design processes require high bandwidth and low latency to meet the fast workload processing times that the industry demands. The more parallel jobs a design firm can run, the faster their time to market and the more competitive they are.

In this article you’ll learn how Azure NetApp Files supports the workloads required for chip design and see what benchmarks the service is able to meet.

Read “Chip Design and the Azure Cloud: An Azure NetApp Files Story” here.

What is Cloud Performance and How to implement it in Your Organization
The performance of your cloud resources is critical to ensure business continuity. Learn how to ensure performance is not interrupted despite peaks in demand. This article explains which metrics to use, which testing is relevant to performance, and which techniques you can implement.

Read "What is Cloud Performance and How to Implement it in Your Organization" here.

Azure HPC Cache: Use Cases, Examples, and a Quick Tutorial
Azure HPC Cache is a distributed file system caching solution offered by Microsoft Azure. It is designed to improve the performance of high-performance computing (HPC) workloads that access frequently used files stored in remote Azure Blob storage. Azure HPC Cache supports multiple hybrid architectures including NFSv3, Dell EMC Isilon, Azure Blob Storage, and other NAS solutions.

Read "Azure HPC Cache: Use Cases, Examples, and a Quick Tutorial" here.

See Additional Guides on Key Managed Services Topics

Together with our content partners, we have authored in-depth guides on several other topics that can also be useful as you explore the world of managed services.

AWS Big Data

Authored by NetApp


AWS database

Authored by NetApp


Load Balanced Servers

Authored by Atlantic