Episodes

  • Conditional Intelligence: Inside the Mixture of Experts architecture
    Oct 7 2025

    Send us a text

    What if not every part of an AI model needed to think at once? In this episode, we unpack Mixture of Experts, the architecture behind efficient large language models like Mixtral. From conditional computation and sparse activation to routing, load balancing, and the fight against router collapse, we explore how MoE breaks the old link between size and compute. As scaling hits physical and economic limits, could selective intelligence be the next leap toward general intelligence?

    Sources

    • What is mixture of experts? (IBM)
    • Applying Mixture of Experts in LLM Architectures (Nvidia)
    • A 2025 Guide to Mixture-of-Experts for Lean LLMs
    • A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
    Show More Show Less
    14 mins
  • AI at Work, AI at Home: How we really use LLMs each day?
    Sep 21 2025

    Send us a text

    How are people really using AI, at home, at work, and across the globe? In this episode of The Second Brain AI Podcast, we dive into two reports from OpenAI and Anthropic that reveal the surprising split between consumer and enterprise use.

    From billions in hidden consumer surplus to the rise of automation vs augmentation, and from emerging markets skipping skill gaps to enterprises wrestling with “context bottlenecks,” we explore what these usage patterns mean for productivity, global inequality, and the future of knowledge work.

    Source:

    • Anthropic Economic Index report: Uneven geographic and enterprise AI adoption
    • How people are using ChatGPT
    • Building more helpful ChatGPT experiences for everyone
    Show More Show Less
    16 mins
  • Deterministic by Design: Why "Temp=0" Still Drifts and How to Fix It
    Sep 15 2025

    Send us a text

    Why do LLMs still give different answers even with temperature set to zero? In this episode of The Second Brain AI Podcast, we unpack new research from Thinking Machines Lab on defeating nondeterminism in LLM inference. We cover the surprising role of floating-point math, the real system-level culprit, lack of batch invariance, and how redesigned kernels can finally deliver bit-identical outputs. We also explore the trade-offs, real-world implications for testing and reliability, and how this breakthrough enables reproducible research and true on-policy reinforcement learning.

    Sources:

    • Defeating Nondeterminism in LLM Inference
    • Non-Determinism of “Deterministic” LLM Settings
    Show More Show Less
    25 mins
  • The SLM Advantage: Rethinking Agent Design with SLMs
    Jun 29 2025

    Send us a text

    In this episode, we explore why Small Language Models (SLMs) are emerging as powerful tools for building agentic AI. From lower costs to smarter design choices, we unpack what makes SLMs uniquely suited for the future of AI agents.

    Source:

    • "Small Language Models are the Future of Agentic AI" by NVIDIA Research
    Show More Show Less
    20 mins
  • Getting to Know LLMs: Generative Models Fundamentals (Part 1)
    Jun 23 2025

    Send us a text

    In this episode, we introduce large language models (LLMs), what they are, how they work at a high level, and why prompting is key to using them effectively. You’ll learn about different types of prompts, how to structure them, and what makes an LLM respond the way it does.

    Source:

    • "Foundations of Large Language Models" by Tong Xiao and Jingbo Zhu
    Show More Show Less
    22 mins