MLOps.community

Episodes

Everything Hard About Building AI Agents Today

Jun 13 2025

Willem Pienaar and Shreya Shankar discuss the challenge of evaluating agents in production where "ground truth" is ambiguous and subjective user feedback isn't enough to improve performance.

The discussion breaks down the three "gulfs" of human-AI interaction—Specification, Generalization, and Comprehension—and their impact on agent success.

Willem and Shreya cover the necessity of moving the human "out of the loop" for feedback, creating faster learning cycles through implicit signals rather than direct, manual review.The conversation details practical evaluation techniques, including analyzing task failures with heat maps and the trade-offs of using simulated environments for testing.

Willem and Shreya address the reality of a "performance ceiling" for AI and the importance of categorizing problems your agent can, can learn to, or will likely never be able to solve.

// Bio

Shreya Shankar
PhD student in data management for machine learning.

Willem Pienaar

Willem Pienaar, CTO of Cleric, is a builder with a focus on LLM agents, MLOps, and open source tooling. He is the creator of Feast, an open source feature store, and contributed to the creation of both the feature store and MLOps categories.

Before starting Cleric, Willem led the open source engineering team at Tecton and established the ML platform team at Gojek, where he built high scale ML systems for the Southeast Asian decacorn.

// Related Links

https://www.google.com/about/careers/applications/?utm_campaign=profilepage&utm_medium=profilepage&utm_source=linkedin&src=Online/LinkedIn/linkedin_pagehttps://cleric.ai/

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Shreya on LinkedIn: /shrshnk
Connect with Willem on LinkedIn: /willempienaar

Timestamps:

[00:00] Trust Issues in AI Data
[04:49] Cloud Clarity Meets Retrieval
[09:37] Why Fast AI Is Hard
[11:10] Fixing AI Communication Gaps
[14:53] Smarter Feedback for Prompts
[19:23] Creativity Through Data Exploration
[23:46] Helping Engineers Solve Faster
[26:03] The Three Gaps in AI
[28:08] Alerts Without the Noise
[33:22] Custom vs General AI
[34:14] Sharpening Agent Skills
[40:01] Catching Repeat Failures
[43:38] Rise of Self-Healing Software
[44:12] The Chaos of Monitoring AI

Show More Show Less

47 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318

Jun 11 2025

Tricks to Fine Tuning // MLOps Podcast #318 with Prithviraj Ammanabrolu, Research Scientist at Databricks. Join the Community: https://go.mlops.community/YTJoinIn
Get the newsletter: https://go.mlops.community/YTNewsletter // Abstract

Prithviraj Ammanabrolu drops by to break down Tao fine-tuning—a clever way to train models without labeled data. Using reinforcement learning and synthetic data, Tao teaches models to evaluate and improve themselves. Raj explains how this works, where it shines (think small models punching above their weight), and why it could be a game-changer for efficient deployment.

// Bio

Raj is an Assistant Professor of Computer Science at the University of California, San Diego, leading the PEARLS Lab in the Department of Computer Science and Engineering (CSE). He is also a Research Scientist at Mosaic AI, Databricks, where his team is actively recruiting research scientists and engineers with expertise in reinforcement learning and distributed systems.

Previously, he was part of the Mosaic team at the Allen Institute for AI. He earned his PhD in Computer Science from the School of Interactive Computing at Georgia Tech, advised by Professor Mark Riedl in the Entertainment Intelligence Lab.

// Related Links

Website: https://www.databricks.com/

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
Join our Slack community [https://go.mlops.community/slack]
Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]
Sign up for the next meetup: [https://go.mlops.community/register]
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Raj on LinkedIn: /rajammanabrolu

Timestamps:

[00:00] Raj's preferred coffee
[00:36] Takeaways
[01:02] Tao Naming Decision
[04:19] No Labels Machine Learning
[08:09] Tao and TAO breakdown
[13:20] Reward Model Fine-Tuning
[18:15] Training vs Inference Compute
[22:32] Retraining and Model Drift
[29:06] Prompt Tuning vs Fine-Tuning
[34:32] Small Model Optimization Strategies
[37:10] Small Model Potential
[43:08] Fine-tuning Model Differences
[46:02] Mistral Model Freedom
[53:46] Wrap up

Show More Show Less

54 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Packaging MLOps Tech Neatly for Engineers and Non-engineers // Jukka Remes // #322

Jun 10 2025

Packaging MLOps Tech Neatly for Engineers and Non-engineers // MLOps Podcast #322 with Jukka Remes, Senior Lecturer (SW dev & AI), AI Architect at Haaga-Helia UAS, Founder & CTO at 8wave AI.

Join the Community:

https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract

AI is already complex—adding the need for deep engineering expertise to use MLOps tools only makes it harder, especially for SMEs and research teams with limited resources. Yet, good MLOps is essential for managing experiments, sharing GPU compute, tracking models, and meeting AI regulations.

While cloud providers offer MLOps tools, many organizations need flexible, open-source setups that work anywhere—from laptops to supercomputers. Shared setups can boost collaboration, productivity, and compute efficiency.In this session, Jukka introduces an open-source MLOps platform from Silo AI, now packaged for easy deployment across environments. With Git-based workflows and CI/CD automation, users can focus on building models while the platform handles the MLOps.// BioFounder & CTO, 8wave AI | Senior Lecturer, Haaga-Helia University of Applied SciencesJukka Remes has 28+ years of experience in software, machine learning, and infrastructure. Starting with SW dev in the late 1990s and analytics pipelines of fMRI research in early 2000s, he’s worked across deep learning (Nokia Technologies), GPU and cloud infrastructure (IBM), and AI consulting (Silo AI), where he also led MLOps platform development.

Now a senior lecturer at Haaga-Helia, Jukka continues evolving that open-source MLOps platform with partners like the University of Helsinki. He leads R&D on GenAI and AI-enabled software, and is the founder of 8wave AI, which develops AI Business Operations software for next-gen AI enablement, including regulatory compliance of AI.

// Related Links

Open source -based MLOps k8s platform setup originally developed by Jukka's team at Silo AI - free for any use and installable in any environment from laptops to supercomputing: https://github.com/OSS-MLOPS-PLATFORM/oss-mlops-platform

Jukka's new company:https://8wave.ai

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
Join our Slack community [https://go.mlops.community/slack]
Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]
Sign up for the next meetup: [https://go.mlops.community/register]
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Jukka on LinkedIn: /jukka-remes

Timestamps:
[00:00] Jukka's preferred coffee
[00:39] Open-Source Platform Benefits
[01:56] Silo MLOps Platform Explanation
[05:18] AI Model Production Processes
[10:42] AI Platform Use Cases
[16:54] Reproducibility in Research Models
[26:51] Pipeline setup automation
[33:26] MLOps Adoption Journey
[38:31] EU AI Act and Open Source
[41:38] MLOps and 8wave AI
[45:46] Optimizing Cross-Stakeholder Collaboration
[52:15] Open Source ML Platform
[55:06] Wrap up

Show More Show Less

56 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Hard Learned Lessons from Over a Decade in AI

Jun 6 2025

Tecton⁠ Founder and CEO Mike Del Balso talks about what ML/AI use cases are core components generating Millions in revenue. Demetrios and Mike go through the maturity curve that predictive Machine Learning use cases have gone through over the past 5 years, and why a feature store is a primary component of an ML stack.

// Bio

Mike Del Balso is the CEO and co-founder of Tecton, where he’s building the industry’s first feature platform for real-time ML. Before Tecton, Mike co-created the Uber Michelangelo ML platform. He was also a product manager at Google where he managed the core ML systems that power Google’s Search Ads business. He studied Applied Science, Electrical & Computer Engineering at the University of Toronto.

// Related Links

Website: www.tecton.ai

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Mike on LinkedIn: /michaeldelbalso

Timestamps:

[00:00] Smarter decisions, less manual work
[03:52] Data pipelines: pain and fixes
[08:45] Why Tecton was born
[11:30] ML use cases shift
[14:14] Models for big bets
[18:39] Build or buy drama
[20:20] Fintech's data playbook
[23:52] What really needs real-time
[28:07] Speeding up ML delivery
[32:09] Valuing ML is tricky
[35:29] Simplifying ML toolkits
[37:18] AI copilots in action
[42:13] AI that fights fraud
[45:07] Teaming up across coasts
[46:43] Tecton + Generative AI?

Show More Show Less

49 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Product Metrics are LLM Evals // Raza Habib CEO of Humanloop // #320

Jun 3 2025

Raza Habib, the CEO of LLM Eval platform Humanloop, talks to us about how to make your AI products more accurate and reliable by shortening the feedback loop of your evals. Quickly iterating on prompts and testing what works, along with some of his favorite Dario from Anthropic AI Quotes.

// Bio
Raza is the CEO and Co-founder at Humanloop. He has a PhD in Machine Learning from UCL, was the founding engineer of Monolith AI, and has built speech systems at Google. For the last 4 years, he has led Humanloop and supported leading technology companies such as Duolingo, Vanta, and Gusto to build products with large language models. Raza was featured in the Forbes 30 Under 30 technology list in 2022, and Sifted recently named him one of the most influential Gen AI founders in Europe.

// Related Links
Websites: https://humanloop.com

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~
Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Raza on LinkedIn: /humanloop-raza

Timestamps:

[00:00] Cracking Open System Failures and How We Fix Them
[05:44] LLMs in the Wild — First Steps and Growing Pains
[08:28] Building the Backbone of Tracing and Observability
[13:02] Tuning the Dials for Peak Model Performance
[13:51] From Growing Pains to Glowing Gains in AI Systems
[17:26] Where Prompts Meet Psychology and Code
[22:40] Why Data Experts Deserve a Seat at the Table
[24:59] Humanloop and the Art of Configuration Taming
[28:23] What Actually Matters in Customer-Facing AI
[33:43] Starting Fresh with Private Models That Deliver
[34:58] How LLM Agents Are Changing the Way We Talk
[39:23] The Secret Lives of Prompts Inside Frameworks
[42:58] Streaming Showdowns — Creativity vs. Convenience
[46:26] Meet Our Auto-Tuning AI Prototype
[49:25] Building the Blueprint for Smarter AI
[51:24] Feedback Isn’t Optional — It’s Everything

Show More Show Less

53 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Getting AI Apps Past the Demo // Vaibhav Gupta // #319

May 30 2025

Getting AI Apps Past the Demo // MLOps Podcast #319 with Vaibhav Gupta, CEO of BoundaryML.

Join the Community: https://go.mlops.community/YTJoinIn
Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract
It's been two years, and we still seem to see AI disproportionately more in demos than production features. Why? And how can we apply engineering practices we've all learned in the past decades to our advantage here?

// Bio
Vaibhav is one of the creators of BAML and a YC alum. He spent 10 years in AI performance optimization at places like Google, Microsoft, and D.E. Shaw. He loves diving deep and chatting about anything related to Gen AI and Computer Vision!

// Related Links
Website: https://www.boundaryml.com/

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~
Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
Join our Slack community [https://go.mlops.community/slack]
Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)]
Sign up for the next meetup: [https://go.mlops.community/register]
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Vaibhav on LinkedIn: /vaigup

Timestamps:
[00:00] Vaibhav's preferred coffee
[00:38] What is BAML
[03:07] LangChain Overengineering Issues
[06:46] Verifiable English Explained
[11:45] Python AI Integration Challenges
[15:16] Strings as First-Class Code
[21:45] Platform Gap in Development
[30:06] Workflow Efficiency Tools
[33:10] Surprising BAML Insights
[40:43] BAML Cool Projects
[45:54] BAML Developer Conversations
[48:39] Wrap up

Show More Show Less

50 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Building Out GPU Clouds // Mohan Atreya // #317

May 23 2025

Demetrios and Mohan Atreya break down the GPU madness behind AI — from supply headaches and sky-high prices to the rise of nimble GPU clouds trying to outsmart the giants. They cover power-hungry hardware, failed experiments, and how new cloud models are shaking things up with smarter provisioning, tokenized access, and a whole lotta hustle. It's a wild ride through the guts of AI infrastructure — fun, fast, and full of sparks!

Big thanks to the folks at Rafay for backing this episode — appreciate the support in making these conversations happen!

// BioMohan is a seasoned and innovative product leader currently serving as the Chief Product Officer at Rafay Systems. He has led multi-site teams and driven product strategy at companies like Okta, Neustar, and McAfee.

// Related LinksWebsites: https://rafay.co/

~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~
Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Mohan on LinkedIn: /mohanatreya

Timestamps:
[00:00] AI/ML Customer Challenges
[04:21] Dependency on Microsoft for Revenue
[09:08] Challenges of Hypothesis in AI/ML
[12:17] Neo Cloud Onboarding Challenges
[15:02] Elastic GPU Cloud Automation
[19:11] Dynamic GPU Inventory Management
[20:25] Terraform Lacks Inventory Awareness
[26:42] Onboarding and End-User Experience Strategies
[29:30] Optimizing Storage for Data Efficiency
[33:38] Pizza Analogy: User Preferences
[35:18] Token-Based GPU Cloud Monetization
[39:01] Empowering Citizen Scientists with AI
[42:31] Innovative CFO Chatbot Solutions
[47:09] Cloud Services Need Spectrum

Show More Show Less

48 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
A Candid Conversation Around MCP and A2A // Rahul Parundekar and Sam Partee // #316 SF Live

May 21 2025

Demetrios, Sam Partee, and Rahul Parundekar unpack the chaos of AI agent tools and the evolving world of MCP (Model Context Protocol). With sharp insights and plenty of laughs, they dig into tool permissions, security quirks, agent memory, and the messy path to making agents actually useful.

// Bio
Sam Partee
Sam Partee is the CTO and Co-Founder of Arcade AI. Previously a Principal Engineer leading the Applied AI team at Redis, Sam led the effort in creating the ecosystem around Redis as a vector database. He is a contributor to multiple OSS projects including Langchain, DeterminedAI, LlamaIndex and Chapel amongst others. While at Cray/HPE he created the SmartSim AI framework which is now used at national labs around the country to integrate HPC simulations like climate models with AI.

Rahul Parundekar
Rahul Parundekar is the founder of AI Hero. He graduated with a Master's in Computer Science from USC Los Angeles in 2010, and embarked on a career focused on Artificial Intelligence. From 2010-2017, he worked as a Senior Researcher at Toyota ITC working on agent autonomy within vehicles. His journey continued as the Director of Data Science at FigureEight (later acquired by Appen), where he and his team developed an architecture supporting over 36 ML models and managing over a million predictions daily. Since 2021, he has been working on AI Hero, aiming to democratize AI access, while also consulting on LLMOps(Large Language Model Operations), and AI system scalability. Other than his full time role as a founder, he is also passionate about community engagement, and actively organizes MLOps events in SF, and contributes educational content on RAG and LLMOps at learn.mlops.community.

// Related Links
Websites:
arcade.dev
aihero.studio~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~

Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore
MLOps Swag/Merch: [https://shop.mlops.community/]
Connect with Demetrios on LinkedIn: /dpbrinkm
Connect with Rahul on LinkedIn: /rparundekar
Connect with Sam on LinkedIn: /samparteeTimestamps:[00:00] Agents & Tools, Explained (Without Melting Your Brain)
[09:51] MVP Servers: Why Everything’s on Fire (and How to Fix It)
[13:18] Can We Actually Trust the Protocol?
[18:13] KYC, But Make It AI (and Less Painful)
[25:25] Web Automation Tests: The Bugs Strike Back
[28:18] MCP Dev: What Went Wrong (and What Saved Us)
[33:53] Social Login: One Button to Rule Them All
[39:33] What Even Is an AI-Native Developer?
[42:21] Betting Big on Smarter Models (High Risk, High Reward)
[51:40] Harrison’s Bold New Tactic (With Real-Life Magic Tricks)
[55:31] Async Task Handoffs: Herding Cats, But Digitally
[1:00:37] Getting AI to Actually Help Your Workflow
[1:03:53] The Infamous Varma System Error (And How We Dodge It)

Show More Show Less

1 hr and 5 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free

Audiobook Categories

More to Explore

GETTING STARTED

Episodes

Everything Hard About Building AI Agents Today

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Packaging MLOps Tech Neatly for Engineers and Non-engineers // Jukka Remes // #322

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Hard Learned Lessons from Over a Decade in AI

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Product Metrics are LLM Evals // Raza Habib CEO of Humanloop // #320

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Getting AI Apps Past the Demo // Vaibhav Gupta // #319

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Building Out GPU Clouds // Mohan Atreya // #317

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

A Candid Conversation Around MCP and A2A // Rahul Parundekar and Sam Partee // #316 SF Live

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed