Speak With An Expert

Mail icon to contact Liqid about our composable infrastructure technology

Composing NVMe and GPU-oF for NVIDIA A100 GPUs

Posted on
May 28, 2020
Written By

In its recent earnings report, GPU pioneer NVIDIA announced that its data center business is up a whopping 80 percent year-over-year, with strength across a variety of workloads, including HPC, and training and inference for AI. The company also noted that the recently announced NVIDIA Tensor Core A100 GPU contributed “meaningfully” to its first quarter revenues, with deployments that included Alibaba, AWS, Baidu, Dell, GCP, HPE, Azure and other cloud and supercomputing customers.

The company is positioning the A100 as the “universal accelerator” for AI training and inference, data analytics, scientific computing, and cloud graphics. It is easy to see why IT departments around the world are clamoring to make use of the mind-bending performance that the A100 delivers. Based on the company’s new NVIDIA® Ampere GPU Architecture, the A100 is designed for performance in AI-driven data centers and is up to 20x faster than its predecessors, meaning it is the most significant gen-to-gen performance improvement the company has ever achieved. These powerful A100 devices also allow for software-defined GPU resource pooling to better address data workload requirements at every step of the AI process.

The A100 features NVIDIA’s elastic computing technology, which enables an A100 GPU to be partitioned into as many as seven smaller GPU instances, or multiple A100s used as one large GPU. Elastic computing technology introduces massive scalability for GPUs previously locked into hardware configurations at the point of purchase. The capability becomes particularly powerful when used with Liqid’s composable infrastructure solutions.

Elastic computing technology enables the ultra-fast performance and flexibility of the A100 to be more widely composed as a core data center resource.  True elastic computing enables ecosystem flexibility and provides for composability of GPUs in tandem with CPU, NVMe, FPGA, and networking devices to optimize the server platform to best fit the application layer being deployed.

Liqid composable software and it’s GPU-oF (GPU-over-Fabric) technology enables the A100 to be composed across the data center over low latency networking (Ethernet or InfiniBand) and enables the A100 to be composed alongside other core data center resources include storage and compute.  Also, the Liqid GPU-oF software enables each of those seven smaller GPU instances (MIGS) to be composed across the data center as individual GPU resources again over low latency networking enabling even more resource utilization. Composability even allows for older data center servers already deployed to be seamlessly integrated on-demand with newer high-performance accelerators leveraging pools of composable resources across the data center.    

Liqid’s ability to enable and accelerate GPU-oF operations through composable software means the A100 can now be deployed via PCIe for direct connect performance, or shared across distance via Ethernet or InfiniBand, widely distributing the scalability of A100 GPU technology.

Newly announced high-speed networking from NVIDIA’s recent acquisition of Mellanox, the ConnectX-6 Lx SmartNIC, delivers a highly secure 100/200 Gb/s Ethernet smart network interface controller (SmartNIC) specifically designed to enable greater disaggregated composability in the data center. Designing highly composable systems based Liqid composable infrastructure, running on the A100 and cutting edge Mellanox networking, IT users can create data centers that offer the most adaptive, high-performance architectures possible to manage data-hungry AI tasks. In addition, the widest variety of hardware components can be utilized, across multiple fabrics, meaning a Liqid system based on A100 and Mellanox networking would effectively be the industry’s most comprehensive solution as well.  

The resulting reductions in capital and operational expenditures that come with a composable infrastructure environment, A100 provides significant benefits in both existing and new data center environments.  Composability enables time to solution for a wide variety of goods and services that require compute-intensive operations for marketplace entry. We are excited to test the new A100 in our own composable testing regimen and will report back as new results become available. Given the massive growth in data center deployments reflected in NVIDIA’s earnings, we expect we’ll have plenty to report on.

To learn more about Liqid’s ability to pool and massively scale composable GPU-driven data operations, read Liqid’s recently published paper, “Building One of the World’s Fastest Off-the-Shelf GPU Supercomputers.” It details how Liqid worked with Orange Silicon Valley and Dell Technologies OEM | Embedded & Edge Solutions to deliver a system capable of pooling and composing for up to 20x NVIDIA Quadro RTX 8000 on a single node.

Learn more about Liqid’s technology and contact us to discuss your specific data center needs with a representative.

Written by
Posted on
May 28, 2020
A100 GPU

Would you like to learn more?

Speak with one of our sales experts to learn more about how we aim to deliver complete composability. For other inquiries, you can drop us a line. We'll get back to you as soon as possible.