Monday, March 31, 2025

Microsoft unveils latest Azure virtual machines tailored for AI-powered supercomputing: Introducing the ND H200 v5 series.

We’re empowering our customers by introducing a new generation of cloud-based AI-supercomputing clusters, built on the robust foundation of Azure ND H200 v5 virtual machines, to fuel the development of cutting-edge AI-driven solutions.

As AI technology evolves at an exponential rate, the pressing need for scalable and high-performance infrastructure persists unabated? To power innovative AI-driven solutions for our prospects, we’re deploying cutting-edge cloud-based AI-supercomputing clusters, built on the latest Azure ND H200 v5 series virtual machines (VMs), available now.

Tailored to handle the increasing complexity of advanced AI workloads, these VMs have been designed to efficiently process tasks such as foundation model training and generative inferencing, delivering enhanced performance and scalability. Our ND H200 v5 virtual machines have been gaining traction among prospects and Microsoft AI partners, such as Azure Machine Learning and Azure OpenAI Service, thanks to their enhanced size, effectiveness, and efficiency.

“Trevor Cai, Head of Infrastructure at OpenAI.”

The Azure ND H200 v5 virtual machines are designed in accordance with Microsoft’s methodology to maximize effectiveness and efficiency, featuring a total of eight NVIDIA H200 Tensor Core graphics processing units. In particular, they address the gap that has emerged as a result of GPUs’ rapid increase in raw computational power, which is outpacing both the connected memory and memory bandwidth. The Azure ND H200 v5 collection VMs arrive with a notable enhancement, boasting a 76% increase in Excessive Bandwidth Reminiscence (HBM) to an impressive 141 GB and a substantial 43% boost in HBM bandwidth to 4.8 TB/s compared to the earlier Azure ND H100 v5 VMs. This enhancement in HBM bandwidth enables GPUs to ingest model parameters more swiftly, thereby reducing overall software latency, which is a crucial metric for real-time applications such as interactive brokers? The ND H200 V5 virtual machines can efficiently support complex massive language models (MLMs) within a single VM’s memory, thereby minimizing the need for distributed job processing across multiple VMs and enhancing overall performance. 

Our H200 supercomputing clusters are designed to facilitate efficient management of GPU memory for model weights, key-value cache, and batch sizes, directly impacting throughput, latency, and cost-effectiveness in large language model (LLM)-based generative AI inference workloads. With its enhanced HBM capacity, the ND H200 v5 VM enables larger batch processing, resulting in increased GPU utilization and overall throughput for inference tasks on small language models (SLMs) and large language models (LLMs), as compared to the ND H100 v5 series. Initially, we observed a notable 35% throughput enhancement with ND H200 v5 virtual machines compared to the ND H100 v5 series for inference workloads processing LLAMA 3.1 models on an 8x128x8 architecture, utilizing batch sizes of 32 and 96 respectively for H100 and H200 instances. For detailed information on Azure’s exceptional performance computing benchmarks, visit our GitHub repository at for further details. 

The ND H200 v5 virtual machines arrive pre-integrated with Azure Batch, Azure Kubernetes Service, Azure OpenAI Service, and Azure Machine Learning to help organizations launch their projects immediately. Visit this page for a comprehensive overview of the new Azure ND H200 v5 VMs. 

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles