Next-generation workloads like artificial intelligence and machine learning (AI/ML) are being deployed to accelerate time-to-value in increasingly more areas, including scientific research, data analytics, financial services, military, healthcare, and media & entertainment. As these new and exciting workloads increase in popularity, their supporting IT infrastructure must be optimized to deliver answers to the world’s most important questions, now.
There are two key components that determine a successful deployment in this new world, and they are storage and GPU. And of the two, storage bears the brunt of the responsibility. We often use the term,“Feed the Beast”, where the food is data, and the beast is the GPU. The speed at which GPU can process data is directly correlated to ability of the storage media to deliver data lightning quick.
The second challenge (that can complicate the first) is moving data between servers during each phase of AI/ML workflow. The massive amounts of data have been known to expose bottlenecks that slow the process and ultimately, critical answers.
Okay, so how do we feed the beast and eliminate bottlenecks in the datacenter? Well, it can’t be done via conventional methods.
Why No Status Quo?
Before donning my product marketing hat, I was an IT manager. And I’m old enough to remember way back in 1998 when deploying and managing servers for new workloads was painful. Start by planning for 3- to5-years of life, then order a sever, over-provisioned for good measure, and wait a month for it to arrive. Then rack, stack and deploy. If I needed more resources, I had surplus in the server itself, and if I didn’t use them, no worries as waste was tolerable. Need to scale? Try and fit more devices into the chassis (or tape an HDD to the inner wall of a server like I did once). No room? Then buy a new server. Boy, I don’t miss those days. Oh wait! This is how we still deploy servers today (see what I did there)!
Yes, software innovations have improved resource utilization and simplified deployment (virtualization, HCI, CI), but they all still consume the same rigid servers as before.
The Computing Unit is No Longer the Server, it’s the Datacenter
A new technology, called ComposableDisaggregated Infrastructure (CDI) is changing the game for datacenters, allowing organizations to unlock cloud-like flexibility and agility and radical resource utilization in the datacenter. And Liqid and Western Digital have partnered to deliver CDI solutions ready to take on today’s most challenging workloads, like AI/ML.
But before I get too far ahead of myself, you need to understand how composability can take your datacenter from static to dynamic. Put yourself in the frame of mind that the computing unit is no longer the server, but instead is your datacenter, where you can compose bare metal servers on-demand to meet real-time business needs, all via software. Costly over-provisioning is eliminated because you only deploy only what a workload needs today, via UI, API or CLI. When it’s time to scale resources up or down, do so in seconds, zero-touch, with utter disregard for whether there’s room in the server. When a workload is retired, quickly move resources to a new or existing workload that needs them.
How is it done? You start by disaggregating resources like compute, GPU, Ultrastar® NVMe™ storage, Intel® Optane™ memory,FPGAs and NICs into pools, and then connect them all via high-speed fabric likeEthernet and/or PCIe, then use Liqid Matrix software to compose them into real-time IT solutions to quickly address today’s business challenges.
Why would you consider composability for AI/ML workloads? The amount of storage and GPU required will not likely fit in a single server. They demand the flexibility and shareability of SAN/NAS storage and the performance of DAS. Composability offers this, and NVMe-over fabric(NVMe-oF™) is the answer for storage. You can feed the beast just like NVMe was in the box, but it’s somewhere else in the datacenter. And need 20 GPUs on a single server over PCIe fabric, no sweat.
Please the Beast with Liqid and WesternDigital
Western Digital is leading the composable storage charge with the OpenFlex™ Data24 NVMe-oF Storage Platform that delivers low-latency NVMe performance over Ethernet fabric to deliver similar performance to locally attached NVMe SSDs.
When combined with Liqid Matrix CDI software, the OpenFlex Data24 becomes a disaggregated resource that can easily satisfy the massive throughput demands of GPUs in AI/ML situations.
The NVMe-oF CDI Solution Kit is a quick and simple way to begin evolving your datacenter to support next-gen workloads, while increasing agility and efficiency. Now, IT can compose precise quantities of NVMe resources to servers via software, in seconds via Liqid’s GUI or API.If applications require accelerator resources, Liqid can be used to compose resources like GPU and FPGA to servers as well. A key benefit of CDI is that composed resources are disaggregated from the servers, so running out of drive bays or PCIe slots is not a problem. In addition, Liqid also simplifies support as single-point triage and management of the entire Solution Kit.
Liqid delivers industry-leading performance with the tightest possible physical footprint:
● Multi-protocol support for flexible deployment and performance requirements
● Share hardware resources via Liqid Matrix software without regard to physical limitations
● Address precise workload requirements at massive scale
● Do GPU-over-Fabric (GPU-oF) operations with the same efficiency as those that take place up and down the hardware stack
Download the Solution Brief here.
Find out more about how Western Digital and Liqid are working together to pioneer fabric-based storage performance here.