DuckDB, Apache Arrow, & the Future of Data Engineering w/ Rusty Conover | S2E3 cover art

DuckDB, Apache Arrow, & the Future of Data Engineering w/ Rusty Conover | S2E3

DuckDB, Apache Arrow, & the Future of Data Engineering w/ Rusty Conover | S2E3

Listen for free

View show details

About this listen

In this episode of The Hedgineer Podcast, host Michael Watson is joined by special guest Rusty Conover, the world's most prolific DuckDB extension builder, for a masterclass on building the next generation of real-time, large-scale data systems.


Rusty, who has an extensive career in data engineering, including at multi-manager hedge funds, pulls back the curtain on what makes DuckDB so revolutionary for developers and data engineers. They explore how its blazingly fast, in-process, C++-based architecture is challenging the big data status quo. The conversation provides a deep dive into the powerful ecosystem growing around DuckDB, from the Apache Arrow columnar format to the evolving landscape of open table formats like Iceberg, Delta Lake, and the new DuckLake.


Join them for a detailed discussion on the nitty-gritty of modern data infrastructure, whether you're building enterprise data platforms or looking for the most efficient tools for your analytics workload.


In this episode, you will learn about:


The DuckDB Revolution: What makes this "blazingly fast" in-process database a game-changer that can simplify and replace entire ETL stacks.

A Tour of DuckDB Extensions: A look inside some of the 15 extensions Rusty has built, from Airport for integrating with Apache Arrow, to Crypto, ShellFS, and TextPlot.


Diving into Apache Arrow: An explanation of the columnar in-memory data format, zero-copy operations, and the Arrow Flight RPC mechanism for efficiently moving data.


The Battle of Open Table Formats: A comparison of Iceberg, Delta Lake, and the new database-centric approach of DuckLake.


DuckDB vs. The World: How DuckDB stacks up against KDB for financial data, ClickHouse for analytics, and its role alongside large-scale compute engines like Apache Spark.


Parquet Deep Dive: The key differences between Parquet V1 and V2 and the importance of modern compression strategies and encodings.


The Future of DuckDB: A sneak peek at powerful upcoming features like time travel and the MERGE INTO statement for simplifying change data capture (CDC) pipelines.


Hosted by Michael Watson, The Hedgineer Podcast dives into AI technology and data in the hedge fund, asset management, and prop trading space.


Follow The Hedgineer Podcast:

YouTube: (https://www.youtube.com/@hedgineer)

LinkedIn: (https://www.linkedin.com/company/90976838)

Twitter: (https://x.com/hedgineering)

Instagram: (https://www.instagram.com/hedgineer/)


Don't forget to like, subscribe, and hit the notification bell to stay updated on our latest episodes!


Hedgineer.io

Hosted on Acast. See acast.com/privacy for more information.

No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.