Microsoft drives innovation and contributes to the broader artificial intelligence and data center ecosystem, benefiting the entire technology industry.
To rapidly deploy a cloud infrastructure that supports the timely delivery of AI-powered solutions, the need for swift technological adaptation has never been more pressing than it is today? As a leader in driving innovation forward, we recognize the importance of embracing technological advancements while learning from the past, fostering community-driven innovation, and promoting industry-wide standardization to propel our prospects and remain ahead of the curve. Over the past decade, Microsoft has championed deep collaboration through its involvement in industry-wide initiatives such as the Open Compute Project. Consequently, we are driving advancements in hardware innovation across the entire computing stack, encompassing server and rack infrastructure, networking and storage solutions, as well as RAS designs that ensure seamless reliability, availability, and serviceability. Additionally, we are establishing supply chain evaluation frameworks that guarantee uncompromising safety and security throughout the entire value stream.1 sustainability,2 and reliability3 throughout the cloud worth chain.
As we continue innovating within the realm of AI, we’re eager to revisit this year with enhanced contributions, fostering ecosystem innovation through novel energy and cooling solutions addressing the evolving profile of AI datacenters, cutting-edge hardware security frameworks prioritizing trust and resilience at the core of our infrastructure for accelerated computing.
Elevating data center cooling through innovative, modular designs for seamless global deployment.
As AI demands evolve, we’re revolutionizing our data centers by striking a balance between rising rack density and optimizing cooling efficiency. Upon introducing the Azure Maia 100 system last autumn, we complemented it with a dedicated liquid cooling “companion”, a closed-loop system leveraging recirculated fluid to minimize heat generation. Since then, we’ve persisted in pursuing advancements in cooling technology, collaborating with partners to design innovative datacentre cooling solutions that address escalating AI energy consumption and prioritise deployment simplicity. We’re pleased to partner with OCP on designing a cutting-edge liquid cooling warmth exchanger unit, enabling the community to leverage collective knowledge and accelerate innovation as AI advancements unfold rapidly. Learn more here.
Emerging Disaggregated Energy Architectures Enabling Next-Generation Techniques and Innovative Applications.
The advancements in AI technologies have driven a significant increase in power density within large-scale data centers. As techniques evolve, we have discovered fresh options for fostering flexibility and modularity in system architecture. While traditional compute and storage solutions in cloud environments typically exhibit energy densities below 20 kW, advancements in AI technologies have significantly increased energy densities to the order of tens of kilowatts. As the world harnesses the power of Artificial Intelligence (AI), we are revolutionizing the repair and maintenance of our high-voltage energy networks to ensure seamless operations. Introducing Diablo, the latest joint venture between our organization and Meta – a bold fusion of innovative ideas and cutting-edge technology. This innovative, disaggregated rack design tackles crucial spatial and energy limitations. The company has developed an innovative solution that integrates a 400-unit Excessive Voltage Direct Current converter, capable of scaling from kilowatts to megawatts, thereby allowing for the installation of 15% to 35% more artificial intelligence-powered accelerators in each server rack. This modular approach enables energy adjustments within the disassembled energy rack to accommodate changing demands from various inference and training product SKUs. We are thrilled to move forward with Meta in our collaborative engineering effort on this valuable addition to the Open Compute Project community. Discover the opportunity to acquire new knowledge?
Harnessing the power of confidentiality: Charting a secure path forward with cutting-edge AI solutions.
Last month, Microsoft outlined its vision for cloud computing, where security is anchored in hardware-based Trusted Execution Environments (TEEs) and transparency of the Confidential Computing Boundary. We build upon this vision by leveraging innovative open-source silicon technologies, specifically the Adams Bridge quantum-resistant accelerator, and seamlessly integrating it into 2.0, the next-generation open-source silicon Root of Trust.
The growing potential of quantum computers poses significant threats to traditional hardware security, as widely deployed classical cryptographic algorithms used throughout hardware security can be easily compromised by a sufficiently powerful quantum computer. To address this threat, NIST has published guidelines for developing quantum-resistant cryptographic algorithms.
These novel quantum-resistant algorithms differ significantly from their classical counterparts. As hardware system producers, it is imperative to address these modifications swiftly, as they significantly impact fundamental hardware security capabilities, including immutable root-of-trust anchors that ensure code integrity and hardware identity. Currently, the hurdles facing silicon elements are more pressing than those encountered in software development, primarily due to their longer development cycles and the inherent inflexibility of hardware. Given the rapid pace of innovation in hardware design, swift decision-making and implementation are essential to stay ahead of the competition.
As part of Microsoft’s commitment to its Secure Fusion Initiative, Microsoft and the Caliptra consortium are releasing Adams Bridge, a novel silicon block designed to accelerate the adoption of quantum-resistant algorithms and advance cryptography. Please visit our website for more information on Adams Bridge and how we’re safeguarding your future with cutting-edge quantum protection.
Microsoft is taking further measures to enhance safety across its {hardware} supply chain with the launch of the OCP Safety Appraisal Framework Analysis initiative, building on existing efforts such as Caliptra 2.0 and Adams Bridge. Founded in collaboration with Microsoft, the Open Compute Project’s SAFE initiative necessitates rigorous, continuous safety evaluations for both hardware and firmware components to ensure seamless integration and uncompromised performance. Developed in collaboration with Caliptra, OCP-SAFE promotes enhanced transparency and security guarantees along the path towards Hardware Supply Chain Integrity, Transparency, and Trust (SCITT). Discovering new knowledge requires a willingness to explore and learn, often uncovering fresh insights and perspectives.
What drives AI advancements?
Over the past few years, Microsoft has embarked on a mission to significantly expand its supercomputing capabilities, empowering individuals and organizations globally to harness the transformative power of generative AI across industries, including education, healthcare, and business, among others. As part of our unwavering commitment to innovation, we’ve invested in continually refining and expanding our infrastructure, developing some of the globe’s most powerful supercomputers in collaboration with a growing array of high-performance accelerators tailored to support an extensive range of AI workloads. As demand for AI advancements grows, our team has achieved significant boosts in efficiency through system-level optimizations, many of which have been shared with the open-source community.
Through the implementation of our custom-designed silicon and architecture on Azure Maia, we have optimized energy efficiency per watt through a collaboration of hardware and software algorithms. Through an early adoption of open-source technology, we successfully implemented low-precision math, a contribution that has also benefited from our collaboration with AMD, Arm, Intel, Qualcomm, Meta, Microsoft, and NVIDIA in the Open Compute Project (OCP).
With this foundation established, we subsequently addressed the challenge of scaling and widespread implementation by developing a pioneering liquid-cooled server architecture. By leveraging this cutting-edge expertise, our global datacentre network can fully capitalise on its benefits, thereby enriching the {industry} and facilitating widespread take-up.
Ultimately, we recognized that traditional Ethernet was not designed with AI scalability in mind. Through significant advancements in the Extremely Ethernet Consortium (EEC), we have successfully expanded Ethernet’s capabilities to deliver the essential efficiency, scalability, and reliability required for seamless AI operations.
Through these endeavours, Microsoft fosters innovation and contributes to the wider AI and datacentre community, ensuring that our advancements benefit the entire industry.
Attendees of this year’s OCP International Summit are invited to visit Microsoft at booth #B35, where we will showcase our latest cloud-based hardware innovations and collaborate with partners from the OCP community through interactive demonstrations.
Join forces with Microsoft at the OCP International Summit 2024 to propel innovation forward.
1, Rani Borkar. October 18, 2022.
2, Zaid Kahn. November 9, 2021.
3Rani Borkar, along with her colleagues, including Reynold D’Sa. October 17, 2023.