When Clean Data Is Actually Dirty
Failed to add items
Sorry, we are unable to add the item because your shopping cart is already at capacity.
Add to basket failed.
Please try again later
Add to Wish List failed.
Please try again later
Remove from Wish List failed.
Please try again later
Follow podcast failed
Unfollow podcast failed
-
Narrated by:
About this listen
We often treat data cleaning as a neutral step.
Delete missing rows. Fill gaps with the mean. Move on.
But cleaning is not neutral. It is a modeling decision.
In this episode, we unpack the statistical consequences of deletion and simple imputation, and why what looks “clean” can fundamentally alter your estimand, distort variance, and bias inference.
We walk through:
- The formal role of the missingness indicator
- The difference between MCAR, MAR, and MNAR
- Why complete-case analysis is rarely as safe as it seems
- How mean imputation collapses variance and attenuates regression slopes
- When multiple imputation and inverse probability weighting are appropriate
- Why sensitivity analysis becomes essential under MNAR
If you cannot defend MCAR, deletion and mean imputation are high-risk defaults.
Cleaning is not preprocessing.
Cleaning is inference.
This episode is for data scientists, statisticians, epidemiologists, and analysts who want to bring rigor back to real-world data.
StatHarbor Analytics
Episodes
-
Feb 16 20266 minsFailed to add items
Sorry, we are unable to add the item because your shopping cart is already at capacity.Add to basket failed.
Please try again laterAdd to Wish List failed.
Please try again laterRemove from Wish List failed.
Please try again laterFollow podcast failed
Unfollow podcast failed
No reviews yet
In the spirit of reconciliation, Audible acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.