Google has unveiled a preview of Gemini 2.5 Flash-Lite, a reasoning mannequin optimized for price and pace, and introduced that two different Gemini fashions, Gemini 2.5 Professional and Gemini 2.5 Flash, are actually usually out there.
Google made the bulletins June 17. Gemini 2.5 fashions are considering fashions, able to reasoning by means of ideas earlier than responding, leading to enhanced efficiency and improved accuracy, Google mentioned.
Gemini 2.5 Flash-Lite has the bottom price and lowest latency within the Gemini 2.5 mannequin household, Google mentioned. Flash-Lite is a reasoning mannequin that permits dynamic management of the considering finances by way of an API parameter, however as a result of Flash-Lite is optimized for low latency and low price, considering is turned off by default. This mannequin is “nice” for top throughput duties reminiscent of classification or summarization at scale, Google mentioned. Constructed as an improve to Gemini 1.5 Flash and a pair of.0 Flash fashions, Gemini 2.5 Flash-Lite provides higher efficiency throughout most evals and decrease time to the primary token, whereas additionally reaching increased tokens per second decode, in keeping with Google. Every Gemini 2.5 mannequin has management over the considering finances, giving builders the flexibility to decide on when and the way a lot the mannequin thinks earlier than producing a response.