• (LLM RAG-Google) On the Theoretical Limitations of Embedding-Based Retrieval
    Sep 2 2025

    Welcome to our podcast! Today, we delve into groundbreaking research from Google DeepMind and Johns Hopkins University titled "On the Theoretical Limitations of Embedding-Based Retrieval". This paper uncovers a fundamental flaw in the widely used single-vector embedding paradigm: the number of unique top-k document combinations an embedding model can represent is inherently limited by its dimension.

    Despite the common belief that better training or larger models can overcome these issues, the researchers demonstrate these theoretical limits in surprisingly simple, realistic settings. They introduce LIMIT, a novel dataset that exposes how even state-of-the-art embedding models severely struggle with straightforward tasks, scoring less than 20 recall@100 in some cases, due to these theoretical underpinnings. This suggests that existing academic benchmarks might be inadvertently hiding these limitations by testing only a minute fraction of possible query-relevance combinations.

    This work calls for a re-evaluation of how we approach information retrieval. While single-vector embeddings are powerful, their capacity for handling diverse, instruction-following queries with complex relevance definitions is fundamentally capped. The paper suggests exploring alternative architectures like cross-encoders, multi-vector models, or sparse models to address these limitations. Tune in to understand why pushing the boundaries of current embedding models requires a shift beyond the single-vector paradigm.

    Find the full paper at: https://arxiv.org/pdf/2508.21038

    Show More Show Less
    13 mins
  • (FM-Pinterest) ItemSage: Learning Product Embeddings for Shopping Recommendations at Pinterest
    Sep 2 2025

    Welcome to our podcast, where we delve into cutting-edge AI in e-commerce! Today, we're exploring ItemSage, Pinterest's innovative product embedding system for shopping recommendations. Developed by engineers at Pinterest, ItemSage revolutionises how users discover products across Home, Closeup, and Search surfaces.

    A key novelty is its transformer-based architecture, combining both text and image modalities to create rich product representations, significantly outperforming single-modality approaches. ItemSage also leverages multi-task learning to optimise for diverse engagement objectives, including purchases and add-to-cart actions, making the recommendation funnel more efficient, particularly for sparse labels. This unified embedding system, compatible with existing PinSage and SearchSage embeddings, drastically reduces infrastructure and maintenance costs by three times across different recommendation verticals.

    While ItemSage has delivered substantial gains—up to +7% Gross Merchandise Value per user and +11% click volume in online A/B experiments—future work aims to enhance text feature modeling with pre-trained Transformers. Join us to understand this powerful system transforming shopping at Pinterest!

    Paper link: https://arxiv.org/pdf/2205.11728

    Show More Show Less
    17 mins
  • (LLM Multiagent UCB) Why Multi-Agent LLM Systems Fail: A Taxonomy
    Aug 18 2025

    Here is a 200-word description for your podcast:

    Ever wondered why Multi-Agent LLM Systems (MAS) often fall short despite their promise? Researchers at UC Berkeley introduce MAST (Multi-Agent System Failure Taxonomy), the first empirically grounded taxonomy to systematically analyse MAS failures.

    Uncover 14 unique failure modes, organised into three crucial categories: specification issues (system design), inter-agent misalignment (agent coordination), and task verification (quality control). Developed through rigorous human annotation and validated with a scalable LLM-as-a-Judge pipeline, MAST offers a structured framework for diagnosing and understanding these challenges.

    Our findings reveal that most failures stem from fundamental system design challenges and agent coordination issues, rather than just individual LLM limitations, requiring more complex solutions than superficial fixes. MAST provides actionable insights for debugging and development, enabling systematic diagnosis and guiding interventions towards building more robust systems. While currently focused on task correctness, future work will explore critical aspects like efficiency, cost, and security.

    Learn how MAST can help build more reliable and effective multi-agent systems.

    Find the paper here: https://arxiv.org/pdf/2503.13657

    Show More Show Less
    12 mins
  • (LLM Application-GOOGLE) Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications
    Aug 5 2025

    Tune into our podcast to explore groundbreaking advancements in AI personal agents! In this episode, we delve into WellMax, a novel sensor-in-the-loop Large Language Model (LLM) agent developed by researchers from the University of Pittsburgh, University of Illinois Urbana-Champaign, and Google.

    WellMax uniquely enhances AI responses by integrating real-time physiological and physical data from wearables, allowing personal agents to understand your context implicitly and automatically. This results in more empathetic and contextually relevant advice compared to non-sensor-informed agents. Imagine an AI tailoring your exercise routine based on your actual activity levels or suggesting stress-reducing activities after a demanding day.

    However, the journey isn't without its challenges. We discuss the difficulties LLMs face in interpreting raw sensor data, the balance between detailed advice and user choice, and the privacy implications of cloud-based LLMs versus the performance trade-offs with smaller, on-device models like Gemma-2. WellMax paves the way for future AI agents that adapt dynamically to your shifting needs, offering holistic support beyond mere question-answering.

    Learn more about this research in "Toward Sensor-In-the-Loop LLM Agent: Benchmarks and Implications": https://doi.org/10.1145/3715014.3722082

    Show More Show Less
    15 mins
  • (Counterfactual-AirBnB) Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking
    Aug 5 2025

    Tune into our podcast as we explore Airbnb's groundbreaking advancements in search ranking evaluation. Traditional A/B testing for significant purchases like accommodation bookings faces challenges: it's time-consuming, with low traffic and delayed feedback. Offline evaluations, while quick, often lack accuracy due to issues like selection bias and disconnect from online metrics.

    To overcome this, Airbnb developed and implemented two novel online evaluation methods: interleaving and counterfactual evaluation. Our competitive pair-based interleaving method offers an impressive 50X speedup in experimentation velocity compared to traditional A/B tests. For even greater generalizability and sensitivity, our online counterfactual evaluation achieves an astonishing 100X speedup. These methods allow for rapid identification of promising candidates for full A/B tests, significantly streamlining the experimental process.

    While interleaving may face limitations with rankers using set-level optimization that can disrupt user experience, counterfactual evaluation provides greater robustness in such scenarios. These innovative techniques are not only proven effective at Airbnb, leading to increased capacity to test new ideas and higher success rates in A/B testing, but are also easily generalizable to other online platforms, especially those with sparse conversion events.

    Paper Link: https://doi.org/10.1145/3711896.3737232

    Show More Show Less
    21 mins
  • (LLM Optimization-MSFT) COLLABLLM: From Passive Responders to Active Collaborators
    Jul 23 2025

    Tune into our podcast to explore COLLABLLM, a groundbreaking framework redefining human-LLM interactions! Traditional Large Language Models often fall short in complex, open-ended tasks by passively responding and failing to grasp long-term user intent.

    Developed by researchers from Stanford University, Microsoft, and Georgia Tech, COLLABLLM addresses this by incorporating Multiturn-aware Rewards (MR). This innovative approach uses collaborative simulation to estimate the long-term impact of responses, moving beyond immediate rewards to foster active collaboration.

    COLLABLLM excels in various applications, including:

    • Document creation
    • Code generation
    • Multiturn mathematics problem-solving

    It significantly improves task performance, conversational efficiency, and interactivity, leading to higher user satisfaction and reduced time spent on tasks. While primarily effective, some users noted COLLABLLM can occasionally feel bland, lack up-to-date information, and require more effort for personalisation.

    Discover how COLLABLLM transforms LLMs from passive responders into active collaborators, paving the way for more human-centred AI.

    Read the full paper here: http://arxiv.org/pdf/2502.00640

    Show More Show Less
    16 mins
  • [RAG-GOOGLE] MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
    Jul 20 2025

    Welcome to our podcast! Today, we're diving into MUVERA (Multi-Vector Retrieval Algorithm), a groundbreaking development from researchers at Google Research, UMD, and Google DeepMind. While neural embedding models are fundamental to modern information retrieval (IR), multi-vector models, though superior, are computationally expensive. MUVERA addresses this by ingeniously reducing complex multi-vector similarity search to efficient single-vector search, allowing the use of highly-optimised MIPS (Maximum Inner Product Search) solvers.

    The core innovation is Fixed Dimensional Encodings (FDEs), single-vector proxies for multi-vector similarity that offer the first theoretical guarantees (ε-approximations). Empirically, MUVERA significantly outperforms prior state-of-the-art implementations like PLAID, achieving an average of 10% higher recall with 90% lower latency across diverse BEIR retrieval datasets. It also incorporates product quantization for 32x memory compression of FDEs with minimal quality loss.

    A current limitation is that MUVERA did not outperform PLAID on the MS MARCO dataset, possibly due to PLAID's extensive tuning for that specific benchmark. Additionally, the effect of the average number of embeddings per document on FDE retrieval quality remains an area for future study. MUVERA's applications primarily lie in enhancing modern IR pipelines, potentially improving the efficiency of components within LLMs.

    Learn more: https://arxiv.org/pdf/2405.19504

    Show More Show Less
    14 mins
  • (LLM Code-Salesforce) CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models
    Jul 5 2025

    Welcome to our podcast! Today, we're exploring CodeTree, a groundbreaking framework developed by researchers at The University of Texas at Austin and Salesforce Research. CodeTree revolutionises code generation by enabling Large Language Models (LLMs) to efficiently navigate the vast coding search space through an agent-guided tree search. This innovative approach employs a unified tree structure for explicitly exploring coding strategies, generating solutions, and refining them.

    At its core, CodeTree leverages dedicated LLM agents: the Thinker for strategy generation, the Solver for initial code implementation, and the Debugger for solution improvement. Crucially, a Critic Agent dynamically guides the exploration by evaluating nodes, verifying solutions, and making crucial decisions like refining, aborting, or accepting a solution. This multi-agent collaboration, combined with environmental and AI-generated feedback, has led to significant performance gains across diverse coding benchmarks, including HumanEval, MBPP, CodeContests, and SWEBench.

    However, CodeTree's effectiveness hinges on LLMs with strong reasoning abilities; smaller models may struggle with its complex instruction-following roles, potentially leading to misleading feedback. The framework currently prioritises functional correctness, leaving aspects like code readability or efficiency for future enhancements. Despite these limitations, CodeTree offers a powerful paradigm for automated code generation, demonstrating remarkable search efficiency, even with limited generation budgets.

    Paper link: https://arxiv.org/pdf/2411.04329

    Show More Show Less
    19 mins