What is the LMArena score for Gemini 2.5 Pro?

 title: 'Figure 11 | Results on the Research Engineer Benchmark (RE-Bench), in which the model must complete simple ML research tasks. Following the original work, scores are normalised against a good quality human-written solution: if a model achieves a score 𝑦 on a challenge, the normalised score is ( 𝑦 − 𝑦𝑠 𝑦𝑠 )/( 𝑦𝑟 𝑦𝑟 − 𝑦𝑠), where 𝑦𝑠 𝑦𝑠 is the 'starting score' of a valid but poor solution provided to the model as an example, and 𝑦𝑟 𝑦𝑟 is the score achieved by a reference solution created by the author of the challenge. Figures for Claude 3.5 Sonnet and expert human performance are sourced from the original work. The number of runs and the time limit for each run are constrained by a total time budget of 32 hours, and error bars indicate bootstrapped 95% confidence intervals; see main text for details. Gemini 2.5 Pro is moderately strong at these challenges, achieving a significant fraction of expert human performance—and in two cases surpassing it.'

Gemini 2.5 Pro has an LMArena score that is over 120 points higher than Gemini 1.5 Pro[1]. The cost-performance plot shows Gemini 2.5 Pro as a marked improvement over Gemini 1.5 Pro[1].

The source for this information is LMArena, imported on 2025-06-16[1]. The cost is a weighted average of input and output tokens pricing per million tokens[1].