Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
Failed to add items
Add to basket failed.
Add to Wish List failed.
Remove from Wish List failed.
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
-
By:
About this listen
Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!
https://epoch.ai/ai-explained-datacenters
Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained
Chapters:
00:00 - Introduction
00:30 - Post-training Dominance
04:00 - ARC-AGI 2 Caveat
05:54 - Simple Bench Record
08:22 - Hallucination Caveat
10:05 - Model Card
11:12 - Exponential Coming
12:20 - Amodei on Generalizing
15:10 - One True Benchmark?
17:02 - Other Metrics…
Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf
Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/
Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy
Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week
Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience
Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1
Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442
METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/
Talaas Fast: https://chatjimmy.ai/
Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved
Metaculus FutureEval: https://www.metaculus.com/futureeval/
Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292
Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/