Which benchmarks show Gemini’s biggest leaps?

title: 'Gemini 2.5 Pro Pokémon Progress Timeline graph.'

The Gemini 2.5 models exhibit significant improvements on coding tasks such as LiveCodeBench, Aider Polyglot, and SWE-bench Verified^[1]. For example, performance on LiveCodeBench increased from 30.5% for Gemini 1.5 Pro to 69.0% for Gemini 2.5 Pro, while that for Aider Polyglot went from 16.9% to 82.2%^[1].

In addition to coding, Gemini 2.5 models are noticeably better at math and reasoning tasks than Gemini 1.5 models^[1]. Performance on AIME 2025 is 88.0% for Gemini 2.5 Pro compared to 17.5% for Gemini 1.5 Pro, while performance on GPQA (diamond) went from 58.1% for Gemini 1.5 Pro to 86.4%^[1]. Image understanding has also increased significantly^[1].

Get more accurate answers with Super Search, upload files, personalized discovery feed, save searches and contribute to the PandiPedia.

Which benchmarks show Gemini’s biggest leaps?

Related Content You May Like