ThursdAI - The top AI news from the past week

By: From Weights & Biases Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
  • Summary

  • Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

    sub.thursdai.news
    Alex Volkov
    Show More Show Less
activate_mytile_page_redirect_t1
Episodes
  • ThursdAI - May 8th - new Gemini pro, Mistral Medium, OpenAI restructuring, HeyGen Realistic Avatars & more AI news
    May 9 2025
    Hey folks, Alex here (yes, real me, not my AI avatar, yet)Compared to previous weeks, this week was pretty "chill" in the world of AI, though we did get a pretty significant Gemini 2.5 Pro update, it basically beat itself on the Arena. With Mistral releasing a new medium model (not OSS) and Nvidia finally dropping Nemotron Ultra (both ignoring Qwen 3 performance) there was also a few open source updates. To me the highlight of this week was a breakthrough in AI Avatars, with Heygen's new IV model, Beating ByteDance's OmniHuman (our coverage) and Hedra labs, they've set an absolute SOTA benchmark for 1 photo to animated realistic avatar. Hell, Iet me record all this real quick and show you how good it is! How good is that?? I'm still kind of blown away. I have managed to get a free month promo code for you guys, look for it in the TL;DR section at the end of the newsletter. Of course, if you’re rather watch than listen or read, here’s our live recording on YTOpenSource AINVIDIA's Nemotron Ultra V1: Refining the Best with a Reasoning Toggle 🧠NVIDIA also threw their hat further into the ring with the release of Nemotron Ultra V1, alongside updated Super and Nano versions. We've talked about Nemotron before – these are NVIDIA's pruned and distilled versions of Llama 3.1, and they've been impressive. The Ultra version is the flagship, a 253 billion parameter dense model (distilled and pruned from Llama 3.1 405B), and it's packed with interesting features.One of the coolest things is the dynamic reasoning toggle. You can literally tell the model "detailed thinking on" or "detailed thinking off" via a system prompt during inference. This is something Qwen also supports, and it looks like the industry is converging on this idea of letting users control the "depth" of thought, which is super neat.Nemotron Ultra boasts a 128K context window and, impressively, can fit on a single 8xH100 node thanks to Neural Architecture Search (NAS) and FFN-Fusion. And performance-wise, it actually outperforms the Llama 3 405B model it was distilled from, which is a big deal. NVIDIA shared a chart from Artificial Analysis (dated April 2025, notably before Qwen3's latest surge) showing Nemotron Ultra standing strong among models like Gemini 2.5 Flash and Opus 3 Mini.What's also great is NVIDIA's commitment to openness here: they've released the models under a commercially permissive NVIDIA Open Model License, the complete post-training dataset (Llama-Nemotron-Post-Training-Dataset), and their training codebases (NeMo, NeMo-Aligner, Megatron-LM). This allows for reproducibility and further community development. Yam Peleg pointed out the cool stuff they did with Neural Architecture Search to optimally reduce parameters without losing performance.Absolute Zero: AI Learning to Learn, Zero (curated) Data Required! (Arxiv)LDJ brought up a fascinating paper that ties into this theme of self-improvement and reinforcement learning: "Absolute Zero: Reinforced Self-play Reasoning with Zero Data" from Andrew Zhao (Tsinghua University) and a few othersThe core idea here is a system that self-evolves its training curriculum and reasoning ability. Instead of needing a pre-curated dataset of problems, the model creates the problems itself (e.g., code reasoning tasks) and then uses something like a Code Executor to validate its proposed solutions, serving as a unified source of verifiable reward. It's open-ended yet grounded learning.By having a verifiable environment (code either works or it doesn't), the model can essentially teach itself to code without external human-curated data.The paper shows fine-tunes of Qwen models (like Qwen Coder) achieving state-of-the-art results on benchmarks like MBBP and AIME (Math Olympiad) with no pre-existing data for those problems. The model hallucinates questions, creates its own rewards, learns, and improves. This is a step beyond synthetic data, where humans are still largely in charge of generation. It's wild, and it points towards a future where AI systems could become increasingly autonomous in their learning.Big Companies & APIsGoogle dropped another update to their Gemini 2.5 Pro, this time the "IO edition" preview, specifically touting enhanced coding performance. This new version jumped to the #1 spot on WebDev Arena (a benchmark where human evaluators choose between two side-by-side code generations in VS Code), with a +147 Elo point gain, surpassing Claude 3.7 Sonnet. It also showed improvements on benchmarks like LiveCodeBench (up 7.39%) and Aider Polyglot (up ~3-6%). Google also highlighted its state-of-the-art video understanding (84.8% on VideoMME) with examples like generating code from a video of an app. Which essentially lets you record a drawing of how your app interaction will happen, and the model will use that video instructions! It's pretty cool. Though, not everyone was as impressed, folks noted that while gaining in a few evals, this model also regressed in several others ...
    Show More Show Less
    1 hr and 34 mins
  • 📆 ThursdAI - May 1- Qwen 3, Phi-4, OpenAI glazegate, RIP GPT4, LlamaCon, LMArena in hot water & more AI news
    May 1 2025
    Hey everyone, Alex here 👋Welcome back to ThursdAI! And wow, what a week. Seriously, strap in, because the AI landscape just went through some seismic shifts. We're talking about a monumental open-source release from Alibaba with Qwen 3 that has everyone buzzing (including us!), Microsoft dropping Phi-4 with Reasoning, a rather poignant farewell to a legend (RIP GPT-4 – we'll get to the wake shortly), major drama around ChatGPT's "glazing" incident and the subsequent rollback, updates from LlamaCon, a critical look at Chatbot Arena, and a fantastic deep dive into the world of AI evaluations with two absolute experts, Hamel Husain and Shreya Shankar.This week felt like a whirlwind, with open source absolutely dominating the headlines. Qwen 3 didn't just release a model; they dropped an entire ecosystem, setting a potential new benchmark for open-weight releases. And while we pour one out for GPT-4, we also have to grapple with the real-world impact of models like ChatGPT, highlighted by the "glazing" fiasco. Plus, video consistency takes a leap forward with Runway, and we got breaking news live on the show from Claude!So grab your coffee (or beverage of choice), settle in, and let's unpack this incredibly eventful week in AI.Open-Source LLMsQwen 3 — “Hybrid Thinking” on TapAlibaba open-weighted the entire Qwen 3 family this week, releasing two MoE titans (up to 235 B total / 22 B active) and six dense siblings all the way down to 0 .6 B, all under Apache 2.0. Day-one support landed in LM Studio, Ollama, vLLM, MLX and llama.cpp.The headline trick is a runtime thinking toggle—drop “/think” to expand chain-of-thought or “/no_think” to sprint. On my Mac, the 30 B-A3B model hit 57 tokens/s when paired with speculative decoding (drafted by the 0 .6 B sibling).Other goodies:* 36 T pre-training tokens (2 × Qwen 2.5)* 128 K context on ≥ 8 B variants (32 K on the tinies)* 119-language coverage, widest in open source* Built-in MCP schema so you can pair with Qwen-Agent* The dense 4 B model actually beats Qwen 2.5-72B-Instruct on several evals—at Raspberry-Pi footprintIn short: more parameters when you need them, fewer when you don’t, and the lawyers stay asleep. Read the full drop on the Qwen blog or pull weights from the HF collection.Performance & Efficiency: "Sonnet at Home"?The benchmarks are where things get really exciting.* The 235B MoE rivals or surpasses models like DeepSeek-R1 (which rocked the boat just months ago!), O1, O3-mini, and even Gemini 2.5 Pro on coding and math.* The 4B dense model incredibly beats the previous generation's 72B Instruct model (Qwen 2.5) on multiple benchmarks! 🤯* The 30B MoE (with only 3B active parameters) is perhaps the star. Nisten pointed out people are getting 100+ tokens/sec on MacBooks. Wolfram achieved an 80% MMLU Pro score locally with a quantized version. The efficiency math is crazy – hitting Qwen 2.5 performance with only ~10% of the active parameters.Nisten dubbed the larger model "Sonnet 3.5 at home," and while acknowledging Sonnet still has an edge in complex "vibe coding," the performance, especially in reasoning and tool use, is remarkably close for an open model you can run yourself.I ran the 30B MoE (3B active) locally using LLM Studio (shoutout for day-one support!) through my Weave evaluation dashboard (Link). On a set of 20 hard reasoning questions, it scored 43%, beating GPT 4.1 mini and nano, and getting close to 4.1 – impressive for a 3B active parameter model running locally!Phi-4-Reasoning — 14B That Punches at 70B+Microsoft’s Phi team layered 1.4 M chain-of-thought traces plus a dash of RL onto Phi-4 to finally ship a resoning Phi and shipped two MIT-licensed checkpoints:* Phi-4-Reasoning (SFT)* Phi-4-Reasoning-Plus (SFT + RL)Phi-4-R-Plus clocks 78 % on AIME 25, edging DeepSeek-R1-Distill-70B, with 32 K context (stable to 64 K via RoPE). Scratch-pads hide in tags. Full details live in Microsoft’s tech report and HF weights.It's fascinating to see how targeted training on reasoning traces and a small amount of RL can elevate a relatively smaller model to compete with giants on specific tasks.Other Open Source Updates* MiMo-7B: Xiaomi entered the ring with a 7B parameter, MIT-licensed model family, trained on 25T tokens and featuring rule-verifiable RL. (HF model hub)* Helium-1 2B: KyutAI (known for their Mochi voice model) released Helium-1, a 2B parameter model distilled from Gemma-2-9B, focused on European languages, and licensed under CC-BY 4.0. They also open-sourced 'dactory', their data processing pipeline. (Blog, Model (2 B), Dactory pipeline)* Qwen 2.5 Omni 3B: Alongside Qwen 3, the Qwen team also updated their existing Omni model with a 3B model, that retains 90% of the comprehension of its big brother with a 50% VRAM drop! (HF)* JetBrains open sources Mellum: Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, they haven't released any ...
    Show More Show Less
    1 hr and 30 mins
  • ThursdAI - Apr 23rd - GPT Image & Grok APIs Drop, OpenAI ❤️ OS? Dia's Wild TTS & Building Better Agents!
    Apr 24 2025
    Hey everyone, Alex here 👋Welcome back to ThursdAI! After what felt like ages of non-stop, massive model drops (looking at you, O3 and GPT-4!), we finally got that "chill week" we've been dreaming of since maybe... forever? It seems the big labs are taking a breather, probably gearing up for even bigger things next week (maybe some open source 👀).But "chill" doesn't mean empty! This week was packed with fascinating developments, especially in the open source world and with long-awaited API releases. We actually had time to dive deeper into things, which was a refreshing change. We had a fantastic lineup of guests joining us too: Kwindla Kramer (@kwindla), our resident voice expert, dropped in to talk about some mind-blowing TTS and her own open-source VAD release. Maziyar Panahi (@MaziyarPanahi) gave us the inside scoop on OpenAI's recent meeting with the open source community. And Dex Horthy (@dexhorthy) from HumanLayer shared some invaluable insights on building robust AI agents that actually work in the real world. It was great having them alongside the usual ThursdAI crew: LDJ, Yam, Wolfram, and Nisten!So, instead of rushing through a million headlines, we took a more relaxed pace. We explored NVIDIA's cool new Describe Anything model, dug into Google's Quantization Aware Training for Gemma, celebrated the much-anticipated API release for OpenAI's GPT Image generation (finally!), checked out the new Grok API, got absolutely blown away by a tiny, open-source TTS model from Korea called Dia, and debated the principles of building better AI agents. Plus, a surprise drop from Send AI with a powerful video model!Let's dive in!Open Source AI Highlights: Community, Vision, and EfficiencyEven with the big players quieter on the model release front, the open source scene was buzzing. It feels like this "chill" period gave everyone a chance to focus on refining tools, releasing datasets, and engaging with the community.OpenAI Inches Closer to Open Source? Insights from the Community MeetingPerhaps the biggest non-release news of the week was OpenAI actively engaging with the open source community. Friend of the show Maziyar Panahi was actually in the room (well, the Zoom room) and joined us to share what went down It sounds like OpenAI came prepared, with Sam Altman himself spending significant time answering questions . Maziyar gave us the inside scoop, mentioning that OpenAI's looking to offload some GPU pressure by embracing open source – a win-win where they help the community, and the community helps lighten their load. He painted a picture of a company genuinely trying to listen and figure out how to best contribute. It felt less like a checkbox exercise and more like genuine engagement, which is awesome to see.What did the community ask for? Based on Maziyar's recap, there was a strong consensus on several key points:* Model Size: The sweet spot seemed to be not tiny, but not astronomically huge either. Something in the 70B-200B parameter range that could run reasonably on, say, 4 GPUs, leaving room for other models. People want power they can actually use without needing a supercomputer.* Capabilities: A strong desire for reliable structured output. Surprisingly, there was less emphasis on complex, built-in reasoning, or at least the ability to toggle reasoning off. This likely stems from practical concerns about cost and latency in production environments. The community seems to value control and efficiency for specific tasks.* Multilingual: Good support for European languages (at least 20) was a major request, reflecting the global nature of the open source community. Needs to be as good as English support.* Base Models: A huge ask was for OpenAI to release base models. The reasoning? Empower the community to handle fine-tuning for specific tasks like coding, roleplay, or supporting underrepresented languages . Let the experts in those niches build on a solid foundation.* Focus: Usefulness over chasing leaderboard glory. The community urged OpenAI to provide a solid, practical model rather than aiming for a temporary #1 spot that gets outdated in days or weeks . Stability, reliability, and long-term utility were prized over fleeting benchmark wins.* Safety: A preference for separate guardrail models (similar to LlamaGuard or GemmaGuard) rather than overly aligning the main model, which often hurts performance and flexibility . Give users the tools to implement safety layers as needed, rather than baking in limitations that might stifle creativity or utility.Perhaps most excitingly, Maziyar mentioned OpenAI seemed committed to regular open model releases, not just a one-off thin=! This, combined with recent moves like approving a community Pull Request to make their open-source Codex agent work with non-OpenAI models (as Yam Peleg excitedly pointed out!), suggests a potentially significant shift. Remember, it's been a long time since GPT-2 and Whisper were OpenAI's main open contributions! ...
    Show More Show Less
    1 hr and 37 mins

What listeners say about ThursdAI - The top AI news from the past week

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.