Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines cover art

Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines

Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines

Listen for free

View show details

About this listen

In this episode we kick off our DuckDB in Research series with Till Döhmen, a software engineer at MotherDuck, where he leads AI efforts. Till shares insights into DuckDQ, a Python library designed for efficient data quality validation in machine learning pipelines, leveraging DuckDB’s high-performance querying capabilities.


We discuss the challenges of ensuring data integrity in ML workflows, the inefficiencies of existing solutions, and how DuckDQ provides a lightweight, drop-in replacement that seamlessly integrates with scikit-learn. Till also reflects on his research journey, the impact of DuckDB’s optimizations, and the future potential of data quality tooling. Plus, we explore how AI tools like ChatGPT are reshaping research and productivity. Tune in for a deep dive into the intersection of databases, machine learning, and data validation!


Resources:

  • GitHub
  • Paper
  • Slides
  • Till's Homepage
  • datasketches extension (released by a DuckDB community member 2 weeks after we recorded!)

Hosted on Acast. See acast.com/privacy for more information.

What listeners say about Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines

Average Customer Ratings

Reviews - Please select the tabs below to change the source of reviews.

In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.