Disseminate: The Computer Science Research Podcast

Episodes

Haralampos Gavriilidis | SheetReader: Efficient spreadsheet parsing

Apr 17 2025

In this episode of the DuckDB in Research series, Harry Gavriilidis (PhD student at TU Berlin) joins us to discuss Sheet Reader — a high-performance spreadsheet parser that dramatically outpaces traditional tools in both speed and memory efficiency. By taking advantage of the standardized structure of spreadsheet files and bypassing generic XML parsers, Sheet Reader delivers fast and lightweight parsing, even on large files. Now available as a DuckDB extension, it enables users to query spreadsheets directly with SQL and integrate them seamlessly into broader analytical workflows.

Harry shares insights into the development process, performance benchmarks, and the surprisingly complex world of spreadsheet parsing. He also discusses community feedback, feature requests (like detecting multiple tables or parsing colored rows), and future plans — including tighter integration with DuckDB and support for Arrow. The conversation wraps up with a look at Harry’s broader research on composable database systems and data interoperability, highlighting how tools like DuckDB are reshaping modern data analysis.
Hosted on Acast. See acast.com/privacy for more information.

Show More Show Less

41 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Arjen P. de Vries | faiss: An extension for vector data & search

Apr 10 2025

In this episode of the DuckDB in Research series, we’re joined by Arjen de Vries, Professor of Data Science at Radboud University. Arjen dives into his team’s development of a DuckDB extension for FAISS, a library originally developed at Facebook for efficient similarity search and vector operations.

We explore the growing importance of embeddings and dense retrieval in modern information retrieval systems, and how DuckDB’s zero-copy architecture and tight integration with the Python ecosystem make it a compelling choice for managing large-scale vector data. Arjen shares insights into the technical challenges and architectural decisions behind the extension, comparisons with DuckDB’s native VSS (vector search) solution, and the broader vision of integrating vector search more deeply into relational databases.

Along the way, we also touch on DuckDB's extension ecosystem, its potential for future research, and why tools like this are reshaping how we build and query modern AI-enabled systems.
Hosted on Acast. See acast.com/privacy for more information.

Show More Show Less

46 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
David Justen | POLAR: Adaptive and non-invasive join order selection via plans of least resistance

Apr 3 2025
In this episode, we sit down with David Justen to discuss his work on POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance which was implemented in DuckDB. David shares his journey in the database space, insights into performance optimization, and the challenges of working with modern analytical workloads. We dive into the intricacies of query compilation, vectorized execution, and how DuckDB is shaping the future of in-memory databases. Tune in for a deep dive into database internals, industry trends, and what’s next for high-performance data processing!

Links:
VLDB 2024 Paper
David's Homepage
Hosted on Acast. See acast.com/privacy for more information.
Show More Show Less
51 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Daniël ten Wolde | DuckPGQ: A graph extension supporting SQL/PGQ

Mar 20 2025
In this episode, we sit down with Daniël ten Wolde, a PhD researcher at CWI’s Database Architectures Group, to explore DuckPGQ—an extension to DuckDB that brings powerful graph querying capabilities to relational databases. Daniel shares his journey into database research, the motivations behind DuckPGQ, and how it simplifies working with graph data. We also dive into the technical challenges of implementing SQL Property Graph Queries (SQL PGQ) in DuckDB, discuss performance benchmarks, and explore the future of DuckPGQ in graph analytics and machine learning. Tune in to learn how this cutting-edge extension is bridging the gap between research and industry!

Links:
DuckPGQ homepage
Community extension
Daniel's homepage
Hosted on Acast. See acast.com/privacy for more information.
Show More Show Less
49 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines

Mar 13 2025
In this episode we kick off our DuckDB in Research series with Till Döhmen, a software engineer at MotherDuck, where he leads AI efforts. Till shares insights into DuckDQ, a Python library designed for efficient data quality validation in machine learning pipelines, leveraging DuckDB’s high-performance querying capabilities.

We discuss the challenges of ensuring data integrity in ML workflows, the inefficiencies of existing solutions, and how DuckDQ provides a lightweight, drop-in replacement that seamlessly integrates with scikit-learn. Till also reflects on his research journey, the impact of DuckDB’s optimizations, and the future potential of data quality tooling. Plus, we explore how AI tools like ChatGPT are reshaping research and productivity. Tune in for a deep dive into the intersection of databases, machine learning, and data validation!

Resources:
GitHub
Paper
Slides
Till's Homepage
datasketches extension (released by a DuckDB community member 2 weeks after we recorded!)
Hosted on Acast. See acast.com/privacy for more information.
Show More Show Less
58 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Disseminate x DuckDB Coming Soon...

Mar 6 2025

Hey folks!

We have been collaborating with everyone's favourite in-process SQL OLAP database management system DuckDB to bring you a new podcast series - the DuckDB in Research series!

At Disseminate our mission is to bridge the gap between research and industry by exploring research that has a real-world impact. DuckDB embodies this synergy—decades of research underpin its design, and now it’s making waves in the research community as a platform for others to build on and this is what the series will focus on!

Join us as we kick off the series with:
📌 Daniel ten Wolde – DuckPGQ, a graph workload extension for DuckDB supporting SQL/PGQ
📌 David Justen – POLAR: Adaptive, non-invasive join order selection
📌 Till Döhmen – DuckDQ: A Python library for data quality checks in ML pipelines
📌 Arjen de Vries – FAISS extension for vector similarity search in DuckDB
📌 Harry Gavriilidis – SheetReader: Efficient spreadsheet parsing

Whether you're a researcher, engineer, or just curious about the intersection of databases and innovation we are sure you will love this series.

Subscribe now and stay tuned for our first episode! 🚀
Hosted on Acast. See acast.com/privacy for more information.

Show More Show Less

3 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
High Impact in Databases with... Anastasia Ailamaki

Mar 3 2025
In this High Impact in Databases episode we talk to Anastasia Ailamaki.

Anastasia is a Professor of Computer and Communication Sciences at the École Polytechnique Fédérale de Lausanne (EPFL). Tune in to hear Anastasia's story!

The podcast is proudly sponsored by Pometry the developers behind Raphtory, the open source temporal graph analytics engine for Python and Rust.

You can find Anastasia on:
Homepage
Google Scholar
LinkedIn

Hosted on Acast. See acast.com/privacy for more information.
Show More Show Less
46 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free
Anastasiia Kozar | Fault Tolerance Placement in the Internet of Things | #61

Dec 16 2024
In this episode, we chat with Anastasiia Kozar about her research on fault tolerance in resource-constrained environments. As IoT applications leverage sensors, edge devices, and cloud infrastructure, ensuring system reliability at the edge poses unique challenges. Unlike the cloud, edge devices operate without persistent backups or high availability standards, leading to increased vulnerability to failures. Anastasiia explains how traditional methods fall short, as they fail to align resource allocation with fault tolerance needs, often resulting in system underperformance.

To address this, Anastasiia introduces a novel resource-aware approach that combines operator placement and fault tolerance into a unified process. By optimizing where and how data is backed up, her solution significantly improves system reliability, especially for low-end edge devices with limited resources. The result? Up to a tenfold increase in throughput compared to existing methods. Tune to learn more!

Links:
Fault Tolerance Placement in the Internet of Things [SIGMOD'24]
The NebulaStream Platform: Data and Application Management for the Internet of Things [CIDR'20]
nebula.stream

Hosted on Acast. See acast.com/privacy for more information.
Show More Show Less
49 mins

Failed to add items

Sorry, we are unable to add the item because your shopping cart is already at capacity.

Add to basket failed.

Please try again later

Add to Wish List failed.

Please try again later

Remove from Wish List failed.

Please try again later

Follow podcast failed

Unfollow podcast failed

Listen for free

Audiobook Categories

More to Explore

GETTING STARTED

Episodes

Haralampos Gavriilidis | SheetReader: Efficient spreadsheet parsing

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Arjen P. de Vries | faiss: An extension for vector data & search

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

David Justen | POLAR: Adaptive and non-invasive join order selection via plans of least resistance

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Daniël ten Wolde | DuckPGQ: A graph extension supporting SQL/PGQ

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Till Döhmen | DuckDQ: A Python library for data quality checks in ML pipelines

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Disseminate x DuckDB Coming Soon...

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

High Impact in Databases with... Anastasia Ailamaki

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed

Anastasiia Kozar | Fault Tolerance Placement in the Internet of Things | #61

Failed to add items

Add to basket failed.

Add to Wish List failed.

Remove from Wish List failed.

Follow podcast failed

Unfollow podcast failed