When Clean Data Is Actually Dirty cover art

When Clean Data Is Actually Dirty

When Clean Data Is Actually Dirty

Listen for free

View show details

About this listen

“Cleaning” data is often treated as a harmless preprocessing step.

Delete missing rows.

Fill gaps with the mean.

Move forward.

But cleaning is not neutral.

It is a modeling decision that can change:

  • The estimand
  • The sampling mechanism
  • The bias–variance trade-off

In this episode, we examine the statistical dangers of deletion and simple imputation — and why naïve cleaning can quietly corrupt inference.

No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.