Broadcom and AMD Collaborate to Enhance AI Infrastructure

January 29, 2024
The recent announcement of Broadcom's next-generation PCIe switches supporting AMD's XGMI/Infinity Fabric marks a significant development in AI infrastructure. This collaboration addresses the scaling challenges in AI technologies, enabling more than eight GPUs per node without efficiency loss. The introduction of XGMI-connected NICs also presents a novel approach to AI training cluster communication. As AMD expands its PCIe/XGMI-based solutions ecosystem, this partnership with Broadcom is poised to significantly enhance AMD's AI market competitiveness, offering a crucial edge in scalability, efficiency, and performance. In the article "Next-Gen Broadcom PCIe Switches to Support AMD Infinity Fabric XGMI to Counter NVIDIA NVLink", written by Patrick Kennedy at ServeTheHome, he states, “One of the really neat capabilities of AMD Infinity Fabric/ XGMI controllers is that they can serve multiple functions. AMD’s I/O controllers can do things like handle package-to-package connectivity as Infinity Fabric, PCIe Gen5 for cards, and CXL.

Scaling Beyond the 8-GPU Server

AMD's Instinct MI300X GPU launch illustrates the company's drive to exceed traditional AI technology boundaries. This launch not only introduced powerful GPUs and APUs but also emphasized the necessity of scaling these technologies across server clusters for efficient AI training. AMD's strategy of endorsing PCIe/XGMI as the scaling method is a crucial industry shift, likely to be adopted as a standard by other AI processor developers.

Broadcom's Pivotal Role in PCIe Switch Development

Broadcom's next-generation PCIe switches supporting XGMI/Infinity Fabric are essential for AMD's Infinity Fabric to enable seamless GPU server scaling. Jas Tremblay, Vice President and General Manager of the Data Center Solutions Group at Broadcom, highlighted the significance of this development, which is expected to allow more than eight GPUs per node without sacrificing efficiency, a vital advancement for AMD in the competitive AI sector.

Future Possibilities with XGMI-Connected NICs

The potential of XGMI-connected NICs extends beyond PCIe switches. Envision NICs communicating over XGMI/Infinity Fabric, on the same coherent fabric as CPUs and GPUs. This could significantly streamline communication within AI training clusters, offering an efficient alternative for RDMA transfers between GPU and NIC over a PCIe/XGMI-based fabric.

Looking Ahead: AI's Next Frontier

These advancements represent a future-oriented vision for Broadcom. As the industry strives to scale GPU density and maintain efficiency, Broadcom's PCIe/XGMI-based solutions are set to play a pivotal role. The collaboration and upcoming release of next-gen Infinity Fabric PCIe switches could be transformative for the AI market.

Liqid’s Strategic Advantage in Composable Infrastructure

Liqid is exceptionally well-placed to leverage the announcement of Broadcom's next-generation PCIe switches supporting AMD's XGMI/Infinity Fabric. This strategic collaboration is a significant leap in AI infrastructure, addressing the challenge of scaling AI technologies beyond conventional limits. Liqid's expertise in composable infrastructure, which allows for the dynamic allocation scalability of scaling resources, aligns perfectly with AMD's focus on PCIe/XGMI for AI server scaling and Broadcom's role in supporting XGMI/Infinity Fabric in PCIe switches. As the industry anticipates the transformative impact of these developments, Liqid's composable infrastructure solutions position us at the forefront of PCIe/XGMI-based solutions, making us a key player in the evolving AI infrastructure landscape.

Sumit Puri
