06. Bootstrapping
The sample as a proxy for the unknown population. Sample from said proxy population (i.e., the sample) to generate a sampling distribution. Bootstrap.
Agenda
October 7, 2024
- Review: logic of confidence intervals
- Logic of bootstrapping (resample from the sample with replacement)
- BS SE of a statistic
October 9, 2024
- Normal CI using BS SE
- Bootstrap-t (studentized) CIs
- Percentile CIs
- properties / advantages / disadvantages
Readings
Class notes: Bootstrapping
Baumer, Horton, and Kaplan (2021), The bootstrap (Chp 9.3) in Modern Data Science for R.
Gareth, Witten, Hastie, and Tibshirani (2021), The Bootstrap (section 5.2) Introduction to Statistical Learning.
Reflection questions
Why would anyone ever want to bootstrap?
What are the differences between a normal CI with Boot SE, a Bootstrap-t CI, and a percentile interval?
Why do we need to bootstrap twice for the Bootstrap-t CI?
What makes a confidence interval procedure good?
Ethics considerations
Why isn’t the bootstrap method a solution for the situation of small sample sizes?
Why isn’t the bootstrap method a solution for the situation with biased / unrepresentative data?
Consider a population with a maximum value (the parameter of interest). Will the sample max have a sampling distribution which is centered on the true maximum? Why or why not? [Quintessential example of how a statistic can be biased for the parameter.]
Slides
In class slides for both 10/7/24 and 10/9/24.
Additional Resources
StatKey applets which demonstrate bootstrapping.
Confidence interval logic from the Rossman & Chance applets.
The Role of Statistical Learning in Applied Statistics Daniela Witten talks to Rafa Irizarry June 15, 2020.
Five ways to fix statistics, Nature Nov 28, 2017
:::