Palo Alto, CA – Generative AI firm SambaNova introduced final week that DeepSeek-R1 671B is operating at the moment on SambaNova Cloud at 198 tokens per second (t/s), “attaining speeds and effectivity that no different platform can match,” the corporate stated.
DeepSeek-R1 has decreased AI coaching prices by 10X, however its widespread adoption has been hindered by excessive inference prices and inefficiencies — till now, based on the corporate. “SambaNova has eliminated this barrier, unlocking real-time, cost-effective inference at scale for builders and enterprises,” the corporate stated.
“Powered by the SN40L RDU chip, SambaNova is the quickest platform operating DeepSeek at 198 tokens per second per consumer,” said Rodrigo Liang, CEO and co-founder of SambaNova. “It will enhance to 5X quicker than the most recent GPU pace on a single rack — and by yr finish, we are going to provide 100X the capability for DeepSeek-R1.”
“With the ability to run the complete DeepSeek-R1 671B mannequin — not a distilled model — at SambaNova’s blazingly quick pace is a sport changer for builders. Reasoning fashions like R1 must generate a variety of reasoning tokens to provide you with a superior output, which makes them take longer than conventional LLMs. This makes dashing them up particularly necessary,” said Dr. Andrew Ng, Founding father of DeepLearning.AI, Managing Normal Accomplice at AI Fund, and an Adjunct Professor at Stanford College’s Laptop Science Division.
“Synthetic Evaluation has independently benchmarked SambaNova’s cloud deployment of the complete 671 billion parameter DeepSeek- R1 Combination of Consultants mannequin at over 195 output token/s, the quickest output pace we’ve got ever measured for DeepSeek-R1. Excessive output speeds are notably necessary for reasoning fashions, as these fashions use reasoning output tokens to enhance the standard of their responses. SambaNova’s excessive output speeds will help using reasoning fashions in latency delicate use instances,” stated George Cameron, Co-Founder, Synthetic Evaluation.
DeepSeek-R1 has revolutionized AI by collapsing coaching prices by tenfold, nevertheless, widespread adoption has stalled as a result of DeepSeek-R1’s reasoning capabilities require considerably extra compute for inference, making AI manufacturing costlier. In actuality, the inefficiency of GPU-based inference has stored DeepSeek-R1 out of attain for many builders.
SambaNova has solved this drawback. With a proprietary dataflow structure and three-tier reminiscence design, SambaNova’s SN40L Reconfigurable Dataflow Unit (RDU) chips collapse the {hardware} necessities to run DeepSeek-R1 671B effectively from 40 racks (320 of the most recent GPUs) right down to 1 rack (16 RDUs) — unlocking cost-effective inference at unmatched effectivity.
“DeepSeek-R1 is among the most superior frontier AI fashions out there, however its full potential has been restricted by the inefficiency of GPUs,” stated Rodrigo Liang, CEO of SambaNova. “That adjustments at the moment. We’re bringing the subsequent main breakthrough — collapsing inference prices and lowering {hardware} necessities from 40 racks to only one — to supply DeepSeek-R1 on the quickest speeds, effectively.”
“Greater than 10 million customers and engineering groups at Fortune 500 firms depend on Blackbox AI to rework how they write code and construct merchandise. Our partnership with SambaNova performs a essential function in accelerating our autonomous coding agent workflows. SambaNova’s chip capabilities are unmatched for serving the complete DeepSeek-R1 671B mannequin, which supplies significantly better accuracy than any of the distilled variations. We couldn’t ask for a greater associate to work with to serve tens of millions of customers,” said Robert Rizk, CEO of Blackbox AI.
Sumti Jairath, Chief Architect, SambaNova, defined: “DeepSeek-R1 is the proper match for SambaNova’s three-tier reminiscence structure. With 671 billion parameters R1 is the biggest open supply giant language mannequin launched so far, which suggests it wants a variety of reminiscence to run. GPUs are reminiscence constrained, however SambaNova’s distinctive dataflow structure means we will run the mannequin effectively to realize 20000 tokens/s of whole rack throughput within the close to future — unprecedented effectivity when in comparison with GPUs on account of their inherent reminiscence and information communication bottlenecks.”
SambaNova is quickly scaling its capability to satisfy anticipated demand, and by the top of the yr will provide greater than 100x the present world capability for DeepSeek-R1. This makes its RDUs essentially the most environment friendly enterprise resolution for reasoning fashions.
DeepSeek-R1 671B full mannequin is accessible now to all customers to expertise and to pick out customers through API on SambaNova Cloud.