Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI cover art

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI

Listen for free

View show details

About this listen

Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!

https://epoch.ai/ai-explained-datacenters


Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

AI Insiders ($9!): https://www.patreon.com/AIExplained


Chapters:
00:00 - Introduction
00:30 - Post-training Dominance
04:00 - ARC-AGI 2 Caveat
05:54 - Simple Bench Record
08:22 - Hallucination Caveat
10:05 - Model Card
11:12 - Exponential Coming
12:20 - Amodei on Generalizing
15:10 - One True Benchmark?
17:02 - Other Metrics…

Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf

Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy

Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week

Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience

Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1

Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442

METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/

Talaas Fast: https://chatjimmy.ai/

Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved

Metaculus FutureEval: https://www.metaculus.com/futureeval/

Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.