As chief info officers race to undertake and deploy synthetic intelligence, they finally encounter an uncomfortable fact: Their IT infrastructure is not prepared for AI. From widespread GPU shortages and latency-prone networks to quickly spiking power calls for, they encounter bottlenecks that undermine efficiency and increase prices.
“An inefficient AI framework can enormously diminish the worth of AI,” says Sid Nag, vp of analysis at Gartner. Provides Teresa Tung, international knowledge functionality lead at Accenture: “The shortage of high-end GPUs is a matter, however there are different elements — together with energy, cooling, and knowledge heart design and capability — that affect outcomes.”
The takeaway? Demanding and resource-intensive AI workloads require IT leaders to rethink how they design networks, allocate assets and handle energy consumption. Those that ignore these challenges threat falling behind within the AI arms race — and undercutting enterprise efficiency.
Breaking Factors
Probably the most evident and broadly reported drawback is a shortage of high-end GPUs required for inferencing and working AI fashions. For instance, extremely coveted Nvidia Blackwell GPUs, formally often called GB200 NVL-72, have been practically inconceivable to seek out for months, as main corporations like Amazon, Google, Meta and Microsoft scoop them up. But, even when a enterprise can get hold of these items, the associated fee for a completely configured server can value round $3 million. A inexpensive model, the NVL36 server, runs about $1.8 million.
Whereas this may have an effect on an enterprise immediately, the scarcity of GPUs additionally impacts main cloud suppliers like AWS, Google, and Microsoft. They more and more ration assets and capability, Nag says. For companies, the repercussions are palpable. “Missing an satisfactory {hardware} infrastructure that’s required to construct AI fashions, coaching a mannequin can develop into gradual and unfeasible. It will possibly additionally result in knowledge bottlenecks that undermine efficiency,” he notes.
GPU shortages are only a piece of the general puzzle, nonetheless. As organizations look to plug in AI instruments for specialised functions comparable to pc imaginative and prescient, robotics, or chatbots they uncover that there’s a necessity for quick and environment friendly infrastructure optimized for AI, Tung explains.
Community latency can show significantly difficult. Even small delays in processing AI queries can journey up an initiative. GPU clusters require high-speed interconnects to speak at most pace. Many networks proceed to depend on legacy copper, which considerably slows knowledge transfers, based on Terry Thorn, vp of business operations for Ayar Labs, a vendor that makes a speciality of AI-optimized infrastructure.
Nonetheless one other potential drawback is knowledge heart area and power consumption. AI workloads — significantly these working on high-density GPU clusters — draw huge quantities of energy. As deployment scales, CIOs might scramble so as to add servers, {hardware} and superior applied sciences like liquid cooling. Inefficient {hardware}, community infrastructure and AI fashions exacerbate the issue, Nag says.
Making issues worse, upgrading energy and cooling infrastructure is difficult and time-consuming. Nag factors out that these upgrades might require a 12 months or longer to finish, thus creating extra short-term bottlenecks.
Scaling Sensible
Optimizing AI is inherently difficult as a result of the expertise impacts areas as numerous as knowledge administration, computational assets and consumer interfaces. Consequently, CIOs should determine strategy numerous AI initiatives primarily based on the use case, AI mannequin and organizational necessities. This contains balancing on-premises GPU clusters with totally different mixes of chips and cloud-based AI providers.
Organizations should think about how, when and the place cloud providers and specialty AI suppliers make sense, Tung says. If constructing a GPU cluster internally is both undesirable or out of attain, then it’s essential to discover a appropriate service supplier. “You must perceive the seller’s relationships with GPU suppliers, what forms of different chips they provide, and what precisely you might be getting access to,” she says.
In some circumstances, AWS, Google, or Microsoft might supply an answer by way of particular services and products. Nonetheless, an array of area of interest and specialty AI service corporations additionally exist, and a few consulting corporations — Accenture and Deloitte are two of them — have direct partnerships with Nvidia and different GPU distributors. “In some circumstances,” Tung says, “you may get knowledge flowing by way of these customized fashions and frameworks. You may lean into these relationships to get the GPUs you want.”
For these working GPU clusters, maximizing community efficiency is paramount. As workloads scale, programs wrestle with knowledge switch limitations. One of many essential choke factors is copper. Ayar Labs, for instance, replaces these interconnects with high-speed optical interconnects that scale back latency, energy consumption and warmth era. The result’s higher GPU utilization but in addition extra environment friendly mannequin processing, significantly for large-scale deployments.
In truth, Ayar Labs claims a 10x decrease latency and as much as 10x extra bandwidth over conventional interconnects. There’s additionally a 4x to 8x discount in energy. Not are chips “ready for knowledge moderately than computing,” Thorn states. The issue can develop into significantly extreme as organizations undertake complicated giant language fashions. “Growing the dimensions of the pipe boosts utilization and reduces CapEx,” he provides.
Nonetheless one other piece of the puzzle is mannequin effectivity and distillation processes. By particularly adapting a mannequin for a laptop computer or smartphone, for instance, it’s usually doable to make use of totally different mixtures of GPUs and CPUs. The consequence could be a mannequin that runs sooner, higher and cheaper, Tung says.
Energy Performs
Addressing AI’s energy necessities can be important. An overarching power technique may help keep away from short-term efficiency bottlenecks in addition to long-term chokepoints. “Power consumption goes to be an issue, if it isn’t already an issue for a lot of corporations,” Nag says. With out satisfactory provide, energy can develop into a barrier to success. It can also undermine sustainability and increase greenwashing accusations. He means that CIOs view AI in a broad and holistic approach, together with figuring out methods to scale back reliance on GPUs.
Establishing clear insurance policies and a governance framework round using AI can reduce the danger of non-technical enterprise customers misusing instruments or inadvertently creating bottlenecks. The danger is bigger when these customers flip to hyperscalers like AWS, Google and Microsoft. “With out some steering and course, it may be like strolling right into a sweet retailer and never figuring out what to choose,” Nag factors out.
Ultimately, an enterprise AI framework should bridge each technique and IT infrastructure. The target, Tung explains, is “making certain your organization controls its future in an AI-driven world.”