Episodes

  • The Future of AI Systems: Open Models and Infrastructure Challenges
    Jun 1 2025
    SummaryIn this episode of the AI Engineering Podcast Jamie De Guerre, founding SVP of product at Together.ai, explores the role of open models in the AI economy. As a veteran of the AI industry, including his time leading product marketing for AI and machine learning at Apple, Jamie shares insights on the challenges and opportunities of operating open models at speed and scale. He delves into the importance of open source in AI, the evolution of the open model ecosystem, and how Together.ai's AI acceleration cloud is contributing to this movement with a focus on performance and efficiency.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Jamie de Guerre about the role of open models in the AI economy and how to operate them at speed and at scaleInterviewIntroductionHow did you get involved in machine learning?Can you describe what Together AI is and the story behind it?What are the key goals of the company?The initial rounds of open models were largely driven by massive tech companies. How would you characterize the current state of the ecosystem that is driving the creation and evolution of open models?There was also a lot of argument about what "open source" and "open" means in the context of ML/AI models, and the different variations of licenses being attached to them (e.g. the Meta license for Llama models). What is the current state of the language used and understanding of the restrictions/freedoms afforded?What are the phases of organizational/technical evolution from initial use of open models through fine-tuning, to custom model development?Can you outline the technical challenges companies face when trying to train or run inference on large open models themselves?What factors should a company consider when deciding whether to fine-tune an existing open model versus attempting to train a specialized one from scratch?While Transformers dominate the LLM landscape, there's ongoing research into alternative architectures. Are you seeing significant interest or adoption of non-Transformer architectures for specific use cases? When might those other architectures be a better choice?While open models offer tremendous advantages like transparency, control, and cost-effectiveness, are there scenarios where relying solely on them might be disadvantageous?When might proprietary models or a hybrid approach still be the better choice for a specific problem?Building and scaling AI infrastructure is notoriously complex. What are the most significant technical or strategic challenges you've encountered at Together AI while enabling scalable access to open models for your users?What are the most interesting, innovative, or unexpected ways that you have seen open models/the TogetherAI platform used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on powering AI model training and inference?Where do you see the open model space heading in the next 1-2 years? Any specific trends or breakthroughs you anticipate?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksTogether AIFine TuningPost-TrainingSalesforce ResearchMistralAgentforceLlama ModelsRLHF == Reinforcement Learning from Human FeedbackRLVR == Reinforcement Learning from Verifiable RewardsTest Time ComputeHuggingFaceRAG == Retrieval Augmented GenerationPodcast EpisodeGoogle GemmaLlama 4 MaverickPrompt EngineeringvLLMSGLangHazy Research labState Space ModelsHyena ModelMamba ArchitectureDiffusion Model ArchitectureStable DiffusionBlack Forest Labs Flux ModelNvidia BlackwellPyTorchRustDeepseek R1GGUFPika Text To VideoThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
    Show More Show Less
    51 mins
  • The Rise of Agentic AI: Transforming Business Operations
    May 21 2025
    SummaryIn this episode of the AI Engineering Podcast, host Tobias Macey sits down with Ben Wilde, Head of Innovation at Georgian, to explore the transformative impact of agentic AI on business operations and the SaaS industry. From his early days working with vintage AI systems to his current focus on product strategy and innovation in AI, Ben shares his expertise on what he calls the "continuum" of agentic AI - from simple function calls to complex autonomous systems. Join them as they discuss the challenges and opportunities of integrating agentic AI into business systems, including organizational alignment, technical competence, and the need for standardization. They also dive into emerging protocols and the evolving landscape of AI-driven products and services, including usage-based pricing models and advancements in AI infrastructure and reliability.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Ben Wilde about the impact of agentic AI on business operations and SaaS as we know itInterviewIntroductionHow did you get involved in machine learning?Can you start by sharing your definition of what constitutes "agentic AI"?There have been several generations of automation for business and product use cases. In your estimation, what are the substantive differences between agentic AI and e.g. RPA (Robotic Process Automation)?How do the inherent risks and operational overhead impact the calculus of whether and where to apply agentic capabilities?For teams that are aiming for agentic capabilities, what are the stepping stones along that path?Beyond the technical capacity, there are numerous elements of organizational alignment that are required to make full use of the capabilities of agentic processes. What are some of the strategic investments that are necessary to get the whole business pointed in the same direction for adopting and benefitting from AI agents?The most recent splash in the space of agentic AI is the introduction of the Model Context Protocol, and various responses to it. What do you see as the near and medium term impact of this effort on the ecosystem of AI agents and their architecture?Software products have gone through several major evolutions since the days of CD-ROMs in the 90s. The current era has largely been oriented around the model of subscription-based software delivered via browser or mobile-based UIs over the internet. How does the pending age of AI agents upend that model?What are the most interesting, innovative, or unexpected ways that you have seen agentic AI used for business and product capabilities?What are the most interesting, unexpected, or challenging lessons that you have learned while working with businesses adopting agentic AI capabilities?When is agentic AI the wrong choice?What are the ongoing developments in agentic capabilities that you are monitoring?Contact InfoEmailLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksGeorgianAgentic Platforms And ApplicationsDifferential PrivacyAgentic AILanguage ModelReasoning ModelRobotic Process AutomationOFACOpenAI Deep ResearchModel Context ProtocolGeorgian AI Adoption SurveyGoogle Agent to Agent ProtocolGraphQLTPU == Tensor Processing UnitChris LattnerCUDANeuroSymbolic AIPrologThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
    Show More Show Less
    1 hr and 2 mins
  • Protecting AI Systems: Understanding Vulnerabilities and Attack Surfaces
    May 3 2025
    SummaryIn this episode of the AI Engineering Podcast Kasimir Schulz, Director of Security Research at HiddenLayer, talks about the complexities and security challenges in AI and machine learning models. Kasimir explains the concept of shadow genes and shadow logic, which involve identifying common subgraphs within neural networks to understand model ancestry and potential vulnerabilities, and emphasizes the importance of understanding the attack surface in AI integrations, scanning models for security threats, and evolving awareness in AI security practices to mitigate risks in deploying AI systems.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Kasimir Schulz about the relationships between the various models on the market and how that information helps with selecting and protecting models for your applicationsInterviewIntroductionHow did you get involved in machine learning?Can you start by outlining the current state of the threat landscape for ML and AI systems?What are the main areas of overlap in risk profiles between prediction/classification and generative models? (primarily from an attack surface/methodology perspective)What are the significant points of divergence?What are some of the categories of potential damages that can be created through the deployment of compromised models?How does the landscape of foundation models introduce new challenges around supply chain security for organizations building with AI?You recently published your findings on the potential to inject subgraphs into model architectures that are invisible during normal operation of the model. Along with that you wrote about the subgraphs that are shared between different classes of models. What are the key learnings that you would like to highlight from that research?What action items can organizations and engineering teams take in light of that information?Platforms like HuggingFace offer numerous variations of popular models with variations around quantization, various levels of finetuning, model distillation, etc. That is obviously a benefit to knowledge sharing and ease of access, but how does that exacerbate the potential threat in the face of backdoored models?Beyond explicit backdoors in model architectures, there are numerous attack vectors to generative models in the form of prompt injection, "jailbreaking" of system prompts, etc. How does the knowledge of model ancestry help with identifying and mitigating risks from that class of threat?A common response to that threat is the introduction of model guardrails with pre- and post-filtering of prompts and responses. How can that approach help to address the potential threat of backdoored models as well?For a malicious actor that develops one of these attacks, what is the vector for introducing the compromised model into an organization?Once that model is in use, what are the possible means by which the malicious actor can detect its presence for purposes of exploitation?What are the most interesting, innovative, or unexpected ways that you have seen the information about model ancestry used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ShadowLogic/ShadowGenes?What are some of the other means by which the operation of ML and AI systems introduce attack vectors to organizations running them?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksHiddenLayerZero-Day VulnerabilityMCP Blog PostPython Pickle Object SerializationSafeTensorsDeepseekHuggingface TransformersKROP == Knowledge Return Oriented PromptingXKCD "Little Bobby Tables"OWASP Top 10 For LLMsCVE AI Systems Working GroupRefusal Vector AblationFoundation ModelShadowLogicShadowGenesBytecodeResNet == Resideual Neural NetworkYOLO == You Only Look OnceNetronBERTRoBERTAShodanCTF == Capture The FlagTitan Bedrock Image GeneratorThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
    Show More Show Less
    52 mins
  • Understanding The Operational And Organizational Challenges Of Agentic AI
    Apr 21 2025
    SummaryIn this episode of the AI Engineering podcast Julian LaNeve, CTO of Astronomer, talks about transitioning from simple LLM applications to more complex agentic AI systems. Julian shares insights into the challenges and considerations of this evolution, emphasizing the importance of starting with simpler applications to build operational knowledge and intuition. He discusses the parallels between microservices and agentic AI, highlighting the need for careful orchestration and observability to manage complexity and ensure reliability, and explores the technical requirements for deploying AI systems, including data infrastructure, orchestration tools like Apache Airflow, and understanding the probabilistic nature of AI models.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsSeamless data integration into AI applications often falls short, leading many to adopt RAG methods, which come with high costs, complexity, and limited scalability. Cognee offers a better solution with its open-source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data. Cognee enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost. Take full control of your data in LLM apps without unnecessary overhead. Visit aiengineeringpodcast.com/cognee to learn more and elevate your AI apps and agents.Your host is Tobias Macey and today I'm interviewing Julian LaNeve about how to avoid putting the cart before the horse with AI applications. When do you move from "simple" LLM apps to agentic AI and what's the path to get there?InterviewIntroductionHow did you get involved in machine learning?How do you technically distinguish "agentic AI" (e.g., involving planning, tool use, memory) from "simpler LLM workflows" (e.g., stateless transformations, RAG)? What are the key differences in operational complexity and potential failure modes?What specific technical challenges (e.g., state management, observability, non-determinism, prompt fragility, cost explosion) are often underestimated when teams jump directly into building stateful, autonomous agents?What are the pre-requisites from a data and infrastructure perspective before going to production with agentic applications?How does that differ from the chat-based systems that companies might be experimenting with?Technically, where do you most often see ambitious agent projects break down during development or early deployment?Beyond generic data quality, what specific data engineering practices become critical when building reliable LLM applications? (e.g., Designing data pipelines for efficient RAG chunking/embedding, versioning prompts alongside data, caching strategies for LLM calls, managing vector database ETL).From an implementation complexity standpoint, what characterizes tasks well-suited for initial LLM workflow adoption versus those genuinely requiring agentic capabilities?Can you share examples (anonymized if necessary) highlighting how organizations successfully engineered these simpler LLM workflows? What specific technical designs, tooling choices, or MLOps practices were key to their reliability and scalability?What are some hard-won technical or operational lessons from deploying and scaling LLM workflows in production environments? Any surprising performance bottlenecks, cost issues, or monitoring challenges engineers should anticipate?What technical maturity signals (e.g., robust CI/CD for ML, established monitoring/alerting for pipelines, automated evaluation frameworks, cost tracking mechanisms) suggest an engineering team might be ready to tackle the challenges of building and operating agentic systems?How does the technical stack and engineering process need to evolve when moving from orchestrated LLM workflows towards more complex agents involving memory, planning, and dynamic tool use? What new components and failure modes must be engineered for?How do you foresee orchestration platforms evolving to better serve the needs of AI engineers building LLM apps? What are the most interesting, innovative, or unexpected ways that you have seen organizations build toward advanced AI use cases?What are the most interesting, unexpected, or challenging lessons that you have learned while working on supporting AI services?When is AI the wrong choice?What is the single most critical piece of engineering advice you would give to fellow AI engineers who are tasked with integrating LLMs into production systems right now?Contact InfoLinkedInGitHubParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?LinksAstronomerAirflowAnthropicBuilding Effective Agents post from AnthropicAirflow 3.0MicroservicesPydantic AILangchainLlamaIndexLLM As A JudgeSWE (SoftWare Engineer) BenchCursorWindsurfOpenTelemetryDAG == Directed ...
    Show More Show Less
    1 hr and 12 mins
  • The Power of Community in AI Development with Oumi
    Mar 16 2025
    SummaryIn this episode of the AI Engineering Podcast Emmanouil (Manos) Koukoumidis, CEO of Oumi, about his vision for an open platform for building, evaluating, and deploying AI foundation models. Manos shares his journey from working on natural language AI services at Google Cloud to founding Oumi with a mission to advance open-source AI, emphasizing the importance of community collaboration and accessibility. He discusses the need for open-source models that are not constrained by proprietary APIs, highlights the role of Oumi in facilitating open collaboration, and touches on the complexities of model development, open data, and community-driven advancements in AI. He also explains how Oumi can be used throughout the entire lifecycle of AI model development, post-training, and deployment.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Manos Koukoumidis about Oumi, an all-in-one production-ready open platform to build, evaluate, and deploy AI modelsInterviewIntroductionHow did you get involved in machine learning?Can you describe what Oumi is and the story behind it?There are numerous projects, both full suites and point solutions, focused on every aspect of "AI" development. What is the unique value that Oumi provides in this ecosystem?You have stated the desire for Oumi to become the Linux of AI development. That is an ambitious goal and one that Linux itself didn't start with. What do you see as the biggest challenges that need addressing to reach a critical mass of adoption?In the vein of "open source" AI, the most notable project that I'm aware of that fits the proper definition is the OLMO models from AI2. What lessons have you learned from their efforts that influence the ways that you think about your work on Oumi?On the community building front, HuggingFace has been the main player. What do you see as the benefits and shortcomings of that platform in the context of your vision for open and collaborative AI?Can you describe the overall design and architecture of Oumi?How did you approach the selection process for the different components that you are building on top of?What are the extension points that you have incorporated to allow for customization/evolution?Some of the biggest barriers to entry for building foundation models are the cost and availability of hardware used for training, and the ability to collect and curate the data needed. How does Oumi help with addressing those challenges?For someone who wants to build or contribute to an open source model, what does that process look like?How do you envision the community building/collaboration process?Your overall goal is to build a foundation for the growth and well-being of truly open AI. How are you thinking about the sustainability of the project and the funding needed to grow and support the community?What are the most interesting, innovative, or unexpected ways that you have seen Oumi used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Oumi?When is Oumi the wrong choice?What do you have planned for the future of Oumi?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksOumiCloud PaLMGoogle GeminiDeepMindLSTM == Long Short-Term MemoryTransfomers)ChatGPTPartial Differential EquationOLMOOSI AI definitionMLFlowMetaflowSkyPilotLlamaRAGPodcast EpisodeSynthetic DataPodcast EpisodeLLM As JudgeSGLangvLLMFunction Calling LeaderboardDeepseekThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
    Show More Show Less
    56 mins
  • Arch Gateway: Add AI To Your Apps Without Custom Development
    Feb 26 2025
    SummaryIn this episode of the AI Engineering Podcast Adil Hafiz talks about the Arch project, a gateway designed to simplify the integration of AI agents into business systems. He discusses how the gateway uses Rust and Envoy to provide a unified interface for handling prompts and integrating large language models (LLMs), allowing developers to focus on core business logic rather than AI complexities. The conversation also touches on the target audience, challenges, and future directions for the project, including plans to develop a leading planning LLM and enhance agent interoperability.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsYour host is Tobias Macey and today I'm interviewing Adil Hafeez about the Arch project, a gateway for your AI agentsInterviewIntroductionHow did you get involved in machine learning?Can you describe what Arch is and the story behind it?How do you think about the target audience for Arch and the types of problems/projects that they are responsible for?The general category of LLM gateways is largely oriented toward abstracting the specific model provider being called. What are the areas of overlap and differentiation in Arch?Many of the features in Arch are also available in AI frameworks (e.g. LangChain, LlamaIndex, etc.), such as request routing, guardrails, and tool calling. How do you think about the architectural tradeoffs of having that functionality in a gateway service?What is the workflow for someone building an application with Arch?Can you describe the architecture and components of the Arch gateway?With the pace of change in the AI/LLM ecosystem, how have you designed the Arch project to allow for rapid evolution and extensibility?What are the most interesting, innovative, or unexpected ways that you have seen Arch used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Arch?When is Arch the wrong choice?What do you have planned for the future of Arch?Contact InfoLinkedInGitHubParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksArch GatewayGradient BoostingEnvoyLLM GatewayHuggingfaceKatanemo ModelsQwen2.5Rust ClippyThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
    Show More Show Less
    31 mins
  • The Role Of Synthetic Data In Building Better AI Applications
    Feb 16 2025
    SummaryIn this episode of the AI Engineering Podcast Ali Golshan, co-founder and CEO of Gretel.ai, talks about the transformative role of synthetic data in AI systems. Ali explains how synthetic data can be purpose-built for AI use cases, emphasizing privacy, quality, and structural stability. He highlights the shift from traditional methods to using language models, which offer enhanced capabilities in understanding data's deep structure and generating high-quality datasets. The conversation explores the challenges and techniques of integrating synthetic data into AI systems, particularly in production environments, and concludes with insights into the future of synthetic data, including its application in various industries, the importance of privacy regulations, and the ongoing evolution of AI systems.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsSeamless data integration into AI applications often falls short, leading many to adopt RAG methods, which come with high costs, complexity, and limited scalability. Cognee offers a better solution with its open-source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data. Cognee enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost. Take full control of your data in LLM apps without unnecessary overhead. Visit aiengineeringpodcast.com/cognee to learn more and elevate your AI apps and agents.Your host is Tobias Macey and today I'm interviewing Ali Golshan about the role of synthetic data in building, scaling, and improving AI systemsInterviewIntroductionHow did you get involved in machine learning?Can you start by summarizing what you mean by synthetic data in the context of this conversation?How have the capabilities around the generation and integration of synthetic data changed across the pre- and post-LLM timelines?What are the motivating factors that would lead a team or organization to invest in synthetic data generation capacity?What are the main methods used for generation of synthetic data sets?How does that differ across open-source and commercial offerings?From a surface level it seems like synthetic data generation is a straight-forward exercise that can be owned by an engineering team. What are the main "gotchas" that crop up as you move along the adoption curve?What are the scaling characteristics of synthetic data generation as you go from prototype to production scale?domains/data types that are inappropriate for synthetic use cases (e.g. scientific or educational content)managing appropriate distribution of values in the generation processBeyond just producing large volumes of semi-random data (structured or otherwise), what are the other processes involved in the workflow of synthetic data and its integration into the different systems that consume it?What are the most interesting, innovative, or unexpected ways that you have seen synthetic data generation used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on synthetic data generation?When is synthetic data the wrong choice?What do you have planned for the future of synthetic data capabilities at Gretel?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksGretelHadoopLSTM == Long Short-Term MemoryGAN == Generative Adversarial NetworkTextbooks are all you need MSFT paperIlluminaThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
    Show More Show Less
    54 mins
  • Optimize Your AI Applications Automatically With The TensorZero LLM Gateway
    Jan 22 2025
    SummaryIn this episode of the AI Engineering podcast Viraj Mehta, CTO and co-founder of TensorZero, talks about the use of LLM gateways for managing interactions between client-side applications and various AI models. He highlights the benefits of using such a gateway, including standardized communication, credential management, and potential features like request-response caching and audit logging. The conversation also explores TensorZero's architecture and functionality in optimizing AI applications by managing structured data inputs and outputs, as well as the challenges and opportunities in automating prompt generation and maintaining interaction history for optimization purposes.AnnouncementsHello and welcome to the AI Engineering Podcast, your guide to the fast-moving world of building scalable and maintainable AI systemsSeamless data integration into AI applications often falls short, leading many to adopt RAG methods, which come with high costs, complexity, and limited scalability. Cognee offers a better solution with its open-source semantic memory engine that automates data ingestion and storage, creating dynamic knowledge graphs from your data. Cognee enables AI agents to understand the meaning of your data, resulting in accurate responses at a lower cost. Take full control of your data in LLM apps without unnecessary overhead. Visit aiengineeringpodcast.com/cognee to learn more and elevate your AI apps and agents. Your host is Tobias Macey and today I'm interviewing Viraj Mehta about the purpose of an LLM gateway and his work on TensorZeroInterviewIntroductionHow did you get involved in machine learning?What is an LLM gateway?What purpose does it serve in an AI application architecture?What are some of the different features and capabilities that an LLM gateway might be expected to provide?Can you describe what TensorZero is and the story behind it?What are the core problems that you are trying to address with Tensor0 and for whom?One of the core features that you are offering is management of interaction history. How does this compare to the "memory" functionality offered by e.g. LangChain, Cognee, Mem0, etc.?How does the presence of TensorZero in an application architecture change the ways that an AI engineer might approach the logic and control flows in a chat-based or agent-oriented project?Can you describe the workflow of building with Tensor0 and some specific examples of how it feeds back into the performance/behavior of an LLM?What are some of the ways in which the addition of Tensor0 or another LLM gateway might have a negative effect on the design or operation of an AI application?What are the most interesting, innovative, or unexpected ways that you have seen TensorZero used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on TensorZero?When is TensorZero the wrong choice?What do you have planned for the future of TensorZero?Contact InfoLinkedInParting QuestionFrom your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Closing AnnouncementsThank you for listening! Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email hosts@aiengineeringpodcast.com with your story.To help other people find the show please leave a review on iTunes and tell your friends and co-workers.LinksTensorZeroLLM GatewayLiteLLMOpenAIGoogle VertexAnthropicReinforcement LearningTokamak ReactorViraj RLHF PaperContextual Dueling BanditsDirect Preference OptimizationPartially Observable Markov Decision ProcessDSPyPyTorchCogneeMem0LangGraphDouglas HofstadterOpenAI GymOpenAI o1OpenAI o3Chain Of ThoughtThe intro and outro music is from Hitman's Lovesong feat. Paola Graziano by The Freak Fandango Orchestra/CC BY-SA 3.0
    Show More Show Less
    1 hr and 3 mins