📆 ThursdAI - Dec 18 - Gemini 3 Flash, Grok Voice, ChatGPT Appstore, Image 1.5 & GPT 5.2 Codex, Meta Sam Audio & more AI news cover art

📆 ThursdAI - Dec 18 - Gemini 3 Flash, Grok Voice, ChatGPT Appstore, Image 1.5 & GPT 5.2 Codex, Meta Sam Audio & more AI news

📆 ThursdAI - Dec 18 - Gemini 3 Flash, Grok Voice, ChatGPT Appstore, Image 1.5 & GPT 5.2 Codex, Meta Sam Audio & more AI news

Listen for free

View show details

About this listen

Hey folks 👋 Alex here, dressed as 🎅 for our pre X-mas episode!We’re wrapping up 2025, and the AI labs decided they absolutely could NOT let the year end quietly. This week was an absolute banger—we had Gemini 3 Flash dropping with frontier intelligence at flash prices, OpenAI firing off GPT 5.2 Codex as breaking news DURING our show, ChatGPT Images 1.5, Nvidia going all-in on open source with Nemotron 3 Nano, and the voice AI space heating up with Grok Voice and Chatterbox Turbo. Oh, and Google dropped FunctionGemma for all your toaster-to-fridge communication needs (yes, really).Today’s show was over three and a half hours long because we tried to cover both this week AND the entire year of 2025 (that yearly recap is coming next week—it’s a banger, we went month by month and you’ll really feel the acceleration). For now, let’s dive into just the insanity that was THIS week.00:00 Introduction and Overview00:39 Weekly AI News Highlights01:40 Open Source AI Developments01:44 Nvidia's Nemotron Series09:09 Google's Gemini 3 Flash19:26 OpenAI's GPT Image 1.520:33 Infographic and GPT Image 1.5 Discussion20:53 Nano Banana vs GPT Image 1.521:23 Testing and Comparisons of Image Models23:39 Voice and Audio Innovations24:22 Grok Voice and Tesla Integration26:01 Open Source Robotics and Voice Agents29:44 Meta's SAM Audio Release32:14 Breaking News: Google Function Gemma33:23 Weights & Biases Announcement35:19 Breaking News: OpenAI Codex 5.2 MaxTo receive new posts and support my work, consider becoming a free or paid subscriber.Big Companies LLM updatesGoogle’s Gemini 3 Flash: The High-Speed Intelligence KingIf we had to title 2025, as Ryan Carson mentioned on the show, it might just be “The Year of Google’s Comeback.” Remember at the start of the year when we were asking “Where is Google?” Well, they are here. Everywhere.This week they launched Gemini 3 Flash, and it is rightfully turning heads. This is a frontier-class model—meaning it boasts Pro-level intelligence—but it runs at Flash-level speeds and, most importantly, Flash-level pricing. We are talking $0.50 per 1 million input tokens. That is not a typo. The price-to-intelligence ratio here is simply off the charts.I’ve been using Gemini 2.5 Flash in production for a while because it was good enough, but Gemini 3 Flash is a different beast. It scores 71 on the Artificial Analysis Intelligence Index (a 13-point jump from the previous Flash), and it achieves 78% on SWE-bench Verified. That actually beats the bigger Gemini 3 Pro on some agentic coding tasks!What impressed me most, and something Kwindla pointed out, is the tool calling. Previous Gemini models sometimes struggled with complex tool use compared to OpenAI, but Gemini 3 Flash can handle up to 100 simultaneous function calls. It’s fast, it’s smart, and it’s integrated immediately across the entire Google stack—Workspace, Android, Chrome. Google isn’t just releasing models anymore; they are deploying them instantly to billions of users.For anyone building agents, this combination of speed, low latency, and 1 million context window (at this price!) makes it the new default workhorse.Google’s FunctionGemma Open Source releaseWe also got a smaller, quirkier release from Google: FunctionGemma. This is a tiny 270M parameter model. Yes, millions, not billions.It’s purpose-built for function calling on edge devices. It requires only 500MB of RAM, meaning it can run on your phone, in your browser, or even on a Raspberry Pi. As Nisten joked on the show, this is finally the model that lets your toaster talk to your fridge.Is it going to write a novel? No. But after fine-tuning, it jumped from 58% to 85% accuracy on mobile action tasks. This represents a future where privacy-first agents live entirely on your device, handling your calendar and apps without ever pinging a cloud server.OpenAI Image 1.5, GPT 5.2 Codex and ChatGPT AppstoreOpenAI had a busy week, starting with the release of GPT Image 1.5. It’s available now in ChatGPT and the API. The headline here is speed and control—it’s 4x faster than the previous model and 20% cheaper. It also tops the LMSYS Image Arena leaderboards.However, I have to give a balanced take here. We’ve been spoiled recently by Google’s “Nano Banana Pro” image generation (which powers Gemini). When we looked at side-by-side comparisons, especially with typography and infographic generation, Gemini often looked sharper and more coherent. This is what we call “hedonistic adaptation”—GPT Image 1.5 is great, but the bar has moved so fast that it doesn’t feel like the quantum leap DALL-E 3 was back in the day. Still, for production workflows where you need to edit specific parts of an image without ruining the rest, this is a massive upgrade.🚨 BREAKING: GPT 5.2 CodexJust as we were nearing the end of the show, OpenAI decided to drop some breaking news: GPT 5.2 Codex.This is a specialized model optimized ...
No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.