Key tool used for Gemini training?

 title: 'Figure 8 | List of Gemini models and their performance on a selection of external multiple-choice question benchmarks for biology and chemistry. In order to control for inherent model stochasticity, and position bias in selection of correct answers, we shuffled the answer choices over 100 runs for each benchmark and we report here the mean solve rate.'

The Gemini 2.5 family is the first to be trained on TPUv5p architecture[1]. Synchronous data-parallel training was employed to parallelise over multiple 8960-chip pods of Google’s TPUv5p accelerators, distributed across multiple datacenters[1].

Since the initial announcement of Gemini 1.5, significant advancements have been made in post-training methodologies, driven by a consistent focus on data quality across the Supervised Fine-Tuning (SFT), Reward Modeling (RM), and Reinforcement Learning (RL) stages[1]. A key focus has been leveraging the model itself to assist in these processes, enabling more efficient and nuanced quality control[1].