Google unveils Gemini 2.5 Pro, a ‘thinking model’ that tops LMArena and reasserts frontier status

The first model in a new 2.5 line combines native reasoning with a 1 million token context window, putting Google back at the top of key benchmarks after two years of ceding the lead to OpenAI and Anthropic.

Draft — dates, figures and quotes not yet verified against sources

Google DeepMind today released Gemini 2.5 Pro Experimental, a thinking model that already claims the top spot on the LMArena leaderboard by a wide margin and marks the company’s most aggressive bid yet to reclaim the AI frontier.

The model is the first in the Gemini 2.5 family, described by CTO Koray Kavukcuoglu as “our most intelligent AI model.” It integrates chain-of-thought reasoning into the core architecture — what Google calls a “thinking model” — rather than relying on external scaffolding. The release includes a 1 million token context window, with a 2 million token version promised soon, and supports text, audio, images, video, and code repositories natively.

On benchmarks, Gemini 2.5 Pro leads without test-time majority voting: it scores state-of-the-art on GPQA and AIME 2025, and posts 18.8% on Humanity’s Last Exam, a dataset of expert-crafted questions. On SWE-Bench Verified, the standard for agentic coding, it reaches 63.8% with a custom agent setup. The model is available immediately in Google AI Studio and in the Gemini app for Gemini Advanced users, with pricing to follow in the coming weeks.

The release comes as Google works to close a perception gap after OpenAI’s GPT-4 and Anthropic’s Claude 3 set the pace in 2023 and 2024. In AI community forums this week, developers are testing the model against GPT-4o and Claude 3.5 Sonnet, with early results showing strong coding and reasoning performance that some call the first credible Google challenge in two years.

The record

The room reactsas it happened

Koray Kavukcuoglu

Announced the model as Google DeepMind's CTO, stating it is designed to tackle increasingly complex problems with enhanced reasoning and code capabilities.

One year later — open only if you can handle spoilers

Gemini 2.5 proved to be a turning point: subsequent 2.5-family models consistently led LMSYS rankings through late 2025, and Google’s inference-time thinking approach was widely adopted by competitors. The 2 million token context arrived in April 2025 and became a key differentiator for enterprise contracts.

Replay thisPost on X Reddit HN LinkedIn