Microsoft delivers the primary at-scale manufacturing cluster with greater than 4,600 NVIDIA GB300 NVL72, that includes NVIDIA Blackwell Extremely GPUs related by means of the next-generation NVIDIA InfiniBand community.
Microsoft delivers the first at-scale manufacturing cluster with greater than 4,600 NVIDIA GB300 NVL72, that includes NVIDIA Blackwell Extremely GPUs related by means of the next-generation NVIDIA InfiniBand community. This cluster is the primary of many, as we scale to tons of of 1000’s of Blackwell Extremely GPUs deployed throughout Microsoft’s AI datacenters globally, reflecting our continued dedication to redefining AI infrastructure and collaboration with NVIDIA. The large scale clusters with Blackwell Extremely GPUs will allow mannequin coaching in weeks as an alternative of months, delivering excessive throughput for inference workloads. We’re additionally unlocking larger, extra highly effective fashions, and would be the first to assist coaching fashions with tons of of trillions of parameters.
This was made doable by means of collaboration throughout {hardware}, techniques, provide chain, amenities, and a number of different disciplines, in addition to with NVIDIA.
Microsoft Azure’s launch of the NVIDIA GB300 NVL72 supercluster is an thrilling step within the development of frontier AI. This co-engineered system delivers the world’s first at-scale GB300 manufacturing cluster, offering the supercomputing engine wanted for OpenAI to serve multitrillion-parameter fashions. This units the definitive new customary for accelerated computing.
Ian Buck, Vice President of Hyperscale and Excessive-performance Computing at NVIDIA
From NVIDIA GB200 to GB300: A brand new customary in AI efficiency
Earlier this yr, Azure launched ND GB200 v6 digital machines (VMs), accelerated by NVIDIA’s Blackwell structure. These rapidly grew to become the spine of a number of the most demanding AI workloads within the business, together with for organizations like OpenAI and Microsoft who already use huge clusters of GB200 NVL2 on Azure to coach and deploy frontier fashions.
Now, with ND GB300 v6 VMs, Azure is elevating the bar once more. These VMs are optimized for reasoning fashions, agentic AI techniques, and multimodal generative AI. Constructed on a rack-scale system, every rack has 18 VMs with a complete of 72 GPUs:
- 72 NVIDIA Blackwell Extremely GPUs (with 36 NVIDIA Grace CPUs).
- 800 gigabits per second (Gbp/s) per GPU cross-rack scale-out bandwidth by way of next-generation NVIDIA Quantum-X800 InfiniBand (2x GB200 NVL72).
- 130 terabytes (TB) per second of NVIDIA NVLink bandwidth inside rack.
- 37TB of quick reminiscence.
- As much as 1,440 petaflops (PFLOPS) of FP4 Tensor Core efficiency.

Constructing for AI supercomputing at scale
Constructing infrastructure for frontier AI requires us to reimagine each layer of the stack—computing, reminiscence, networking, datacenters, cooling, and energy—as a unified system. The ND GB300 v6 VMs are a transparent illustration of this transformation, from years of collaboration throughout silicon, techniques, and software program.
On the rack stage, NVLink and NVSwitch cut back reminiscence and bandwidth constraints, enabling as much as 130TB per second of intra-rack data-transfer connecting 37TB whole of quick reminiscence. Every rack turns into a tightly coupled unit, delivering larger inference throughput at diminished latencies on bigger fashions and longer context home windows, empowering agentic and multimodal AI techniques to be extra responsive and scalable than ever.
To scale past the rack, Azure deploys a full fat-tree, non-blocking structure utilizing NVIDIA Quantum-X800 Gbp/s InfiniBand, the quickest networking material out there as we speak. This ensures that prospects can scale up coaching of ultra-large fashions effectively to tens of 1000’s of GPUs with minimal communication overhead, thus delivering higher end-to-end coaching throughput. Lowered synchronization overhead additionally interprets to most utilization of GPUs, which helps researchers iterate quicker and at decrease prices regardless of the compute-hungry nature of AI coaching workloads. Azure’s co-engineered stack, together with customized protocols, collective libraries, and in-network computing, ensures the community is very dependable and totally utilized by the purposes. Options like NVIDIA SHARP speed up collective operations and double efficient bandwidth by performing math within the swap, making large-scale coaching and inference extra environment friendly and dependable.
Azure’s superior cooling techniques use standalone warmth exchanger models and facility cooling to attenuate water utilization whereas sustaining thermal stability for dense, high-performance clusters like GB300 NVL72. We additionally proceed to develop and deploy new energy distribution fashions able to supporting the excessive vitality density and dynamic load balancing required by the ND GB300 v6 VM class of GPU clusters.
Additional, our reengineered software program stacks for storage, orchestration, and scheduling are optimized to totally use computing, networking, storage, and datacenter infrastructure at supercomputing scale, delivering unprecedented ranges of efficiency at excessive effectivity to our prospects.

Trying forward
Microsoft has invested in AI infrastructure for years, to permit for quick enablement and transition into the most recent know-how. It’s also why Azure is uniquely positioned to ship GB300 NVL72 infrastructure at manufacturing scale at a fast tempo, to satisfy the calls for of frontier AI as we speak.
As Azure continues to ramp up GB300 worldwide deployments, prospects can anticipate to coach and deploy new fashions in a fraction of the time in comparison with earlier generations. The ND GB300 v6 VMs v6 are poised to turn out to be the brand new customary for AI infrastructure, and Azure is proud to cleared the path, supporting prospects to advance frontier AI improvement.
Keep tuned for extra updates and efficiency benchmarks as Azure expands manufacturing deployment of NVIDIA GB300 NVL72 globally.