• Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI
    Feb 20 2026

    Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!

    https://epoch.ai/ai-explained-datacenters


    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    00:30 - Post-training Dominance
    04:00 - ARC-AGI 2 Caveat
    05:54 - Simple Bench Record
    08:22 - Hallucination Caveat
    10:05 - Model Card
    11:12 - Exponential Coming
    12:20 - Amodei on Generalizing
    15:10 - One True Benchmark?
    17:02 - Other Metrics…

    Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf

    Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

    Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy

    Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week

    Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience

    Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
    ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1

    Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442

    METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/

    Talaas Fast: https://chatjimmy.ai/

    Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved

    Metaculus FutureEval: https://www.metaculus.com/futureeval/

    Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    19 mins
  • The Two Best AI Models/Enemies Just Got Released Simultaneously
    Feb 6 2026

    The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much more

    https://assemblyai.com/aiexplained

    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    00:54 - Self-improvement?
    02:44 - Knowledge Work
    05:30 - Overly agentic behaviour
    09:12 - Who Shouldn’t Use Claude Opus
    11:39 - Step-change?
    15:09 - Claude’s ‘Personhood’

    Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869

    Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
    212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
    Claude Code Tip: https://x.com/bcherny/status/2019475897691124107


    GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/

    System Card: https://openai.com/index/gpt-5-3-codex-system-card/

    Browse Comp: https://arxiv.org/pdf/2504.12516v1
    Finance Agent: https://www.vals.ai/benchmarks/finance_agent
    Terminal Bench 2: https://arxiv.org/pdf/2601.11868
    Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-bench

    My X post: https://x.com/AIExplainedYT/status/2016851303436095647

    Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1

    Altman rebuttal: https://x.com/sama/status/2019139174339928189
    https://x.com/sama/status/2019140276246442089

    4% of GitHub: https://x.com/dylan522p/status/2019490550911766763



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    20 mins
  • Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown
    Jan 28 2026

    Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and China's trajectory toward AI-enabled totalitarianism. Additionally, Dario Amodei predicts that AI models will increasingly be understood as collections of distinct personas rather than monolithic systems.

    80,000 Hours: https://www.youtube.com/watch?v=B54EQiuO1UU



    Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Chapters:
    00:00 - Introduction
    01:10 - Scaling to software engineers
    06:11 - Permanent Underclass
    10:18 - Totalitarian Nightmares
    16:38 - Collection of Personas

    Essay: https://www.darioamodei.com/essay/the-adolescence-of-technology

    Physics Prediction: https://www.quantamagazine.org/is-particle-physics-dead-dying-or-just-hard-20260126/

    Axios: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    World GDP: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart

    Demis Hassabis Counter: https://www.youtube.com/watch?v=q6fq4_uP7aM

    Karpathy 80%: https://x.com/karpathy/status/2015883857489522876

    Machines of Loving Grace: https://www.darioamodei.com/essay/machines-of-loving-grace

    Anthropic LessWrong: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy#1__In_private__Dario_frequently_said_he_won_t_push_the_frontier_of_AI_capabilities__later__Anthropic_pushed_the_frontier

    Original Constitution: https://www.anthropic.com/news/claudes-constitution

    New Constitution: https://www.anthropic.com/constitution

    Kimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599

    Societies of Thought, Google DeepMind Paper: https://arxiv.org/pdf/2601.10825

    https://lmcouncil.ai/benchmarks

    https://www.patreon.com/posts/our-new-age-of-133960279



    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    22 mins
  • Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:
    Jan 14 2026

    A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo?

    https://matsprogram.org/s26-aie


    Check out my new app! https://lmcouncil.ai

    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Chapters:
    00:00 - Introduction
    01:12 - Claude Cowork
    06:48 - Productivity Speed-up + jobs
    09:33 - Comparing Models
    12:00 - Brittle AI Paper

    Cowork Intro: https://x.com/claudeai/thread/2010805682434666759

    'All of it': https://x.com/bcherny/status/2010813886052581538

    'AGI' Claims: https://x.com/deepfates/status/2004994698335879383

    Douglas Interview: https://www.youtube.com/watch?v=TOsNrV3bXtQ&t=2313s

    Job Stats: https://www.oxfordeconomics.com/wp-content/uploads/2026/01/Evidence-of-an-AI-driven-shakeup-of-job-markets-is-patchy.pdf
    Amodei Prediction: https://fortune.com/2025/05/28/anthropic-ceo-warning-ai-job-loss/

    GenAI Traffic: https://x.com/demishassabis/status/2009075877347512545

    Illusion of Insight: https://arxiv.org/pdf/2601.00514
    Entropy Exploration: https://arxiv.org/pdf/2506.14758
    ProRL: https://arxiv.org/pdf/2505.24864

    Genesis Mission: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
    https://deepmind.google/blog/how-were-supporting-better-tropical-cyclone-prediction-with-ai/


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/

    Show More Show Less
    18 mins
  • What the Freakiness of 2025 in AI Tells Us About 2026
    Dec 23 2025

    It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.

    http://matsprogram.org/s26-aie


    My new app! https://lmcouncil.ai


    Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094

    Chapters:
    00:00 - Introduction
    00:34 - Reasoning Models … and limits
    02:54 - A playable world
    03:36 - Realism
    03:50 - AI Slop gone mainstream
    05:03 - DolphinGemma
    05:39 - Public Mood
    07:34 - AI Enlisted
    08:30 - GPT-5
    11:05 - Open Weight not out
    13:00 - METR Breakout
    17:30 - VASA-1
    18:28 - Lateral Productivity
    20:15 - 1 or 1000 benchmarks needed?
    24:54 - Continual Learning + Altman on Superintelligence
    28:08 - Automated Information Discovery ft AlphaEvolve


    Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
    https://www.youtube.com/watch?v=PqVbypvxDto

    Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837

    DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

    METR Time Horizon: https://arxiv.org/pdf/2503.14499
    https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
    Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
    https://shash42.substack.com/p/how-to-game-the-metr-plot
    https://x.com/METR_Evals/status/2002203627377574113

    GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems

    https://simple-bench.com/

    AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
    https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan

    Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1

    Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169

    OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ

    AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259

    Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/

    AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
    https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
    Continual Learning: https://abehrouz.github.io/files/NL.pdf

    Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

    GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989

    Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/

    Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
    Turing Test: https://x.com/tunguz/status/1907185471211422147

    Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/

    LLM Brainrot: https://arxiv.org/pdf/2510.13928

    Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report

    Emotional Quotient: https://arxiv.org/pdf/2511.08394

    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Podcast: https://aiexplainedopodcast.buzzsprout.com/


    AI Insiders ($9!): https://www.patreon.com/AIExplained

    Show More Show Less
    33 mins
  • Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …
    Dec 19 2025

    The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…

    https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26

    Also, do check out my new app: https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:50 - Results
    02:44 - But… the Flaw
    04:49 - So Benchmarks are fake? No
    07:37 - Spatial Reasoning + Hassabis
    10:06 - Proto-AGI
    12:07 - Minimal AGI
    15:07 - Compute Slowdown
    17:56 - New Data Paradigm

    Gemini 3 Flash: https://deepmind.google/models/gemini/flash/

    Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto
    Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0
    Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew
    Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
    Brockman Video: https://x.com/OpenAI/status/2001336514786017417
    Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442

    Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
    Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812
    AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience
    https://arxiv.org/pdf/2511.13029


    lmcouncil.ai/benchmarks
    https://simple-bench.com/
    https://x.com/scaling01/status/1999620587744813205

    5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf

    OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

    Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018

    OpenAI Valuation: ​​https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq

    Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/

    TheInformation Data: https://x.com/theinformation/status/2001421225751351778

    Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
    Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
    Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

    METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/


    AI Insiders ($9!): https://www.patreon.com/AIExplained


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/

    Show More Show Less
    20 mins
  • GPT 5.2: OpenAI Strikes Back
    Dec 12 2025

    Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.

    https://www.youtube.com/@eightythousandhours



    AI Insiders ($9!): https://www.patreon.com/AIExplained
    https://lmcouncil.ai

    Chapters:
    00:00 - Introduction
    00:55 - Better than Human @ Professional Tasks?
    04:42 - Test time Compute
    07:05 - Benchmark Selection
    09:32 - Simple Results + council comparison
    13:01 - Long Context
    13:52 - Self-Improvement
    15:00 - 10 Years + New Models

    Release Page: https://openai.com/index/introducing-gpt-5-2/

    GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/
    https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
    https://lmcouncil.ai/benchmarks

    Charxiv: https://charxiv.github.io/#leaderboard

    GDPval: https://arxiv.org/pdf/2510.04374
    My vid: https://www.youtube.com/watch?v=oK5LxMaROSA

    Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1

    Noam Brown: https://x.com/polynoamial/status/1999189845164667132

    New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq

    10 Years of OpenAI: https://openai.com/index/ten-years/

    GPQA: https://x.com/idavidrein/status/1841265634170278063

    ARC-AGI 1-2: https://arcprize.org/arc-agi/2/

    Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813


    Non-hype Newsletter: https://signaltonoise.beehiiv.com/


    https://lmcouncil.ai

    Show More Show Less
    18 mins
  • You Are Being Told Contradictory Things About AI: 8 examples
    Dec 5 2025

    With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide.


    https://epoch.ai/data/data-centers

    Epoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way.


    Chapters:
    00:00 - Introduction
    00:42 - Job Apocalypse?
    01:45 - Scaling to AGI
    04:15 - Recursive Self-Improvement Needed, or Not
    09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.5
    13:27 - DeepSeek Speciale vs Mistral Large v3
    16:45 - Claude Soul Document

    https://lmcouncil.ai/

    AI Insiders ($9!): https://www.patreon.com/AIExplained



    Guardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself

    MIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdf
    vs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html

    Amodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIk
    Claude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document

    Capabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safety

    Ilya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2

    Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946

    Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42

    Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12

    Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdf
    https://x.com/joel_bkr/status/1993023436541903155

    METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

    https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

    OpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safety
    Rocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.html

    DeepSeek Paper: https://arxiv.org/html/2512.02556v1

    DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/

    https://simple-bench.com/

    Patreon Post: https://www.patreon.com/c/aiexplained/posts

    Robot: https://x.com/jloganolson/status/1985850115379351799

    Show More Show Less
    20 mins