06. Bootstrapping

The sample as a proxy for the unknown population. Sample from said proxy population (i.e., the sample) to generate a sampling distribution. Bootstrap.

Author

Published

October 7, 2024

Artwork by @allison_horst.

Agenda

October 7, 2024

Review: logic of confidence intervals
Logic of bootstrapping (resample from the sample with replacement)
BS SE of a statistic

October 9, 2024

Normal CI using BS SE
Bootstrap-t (studentized) CIs
Percentile CIs
properties / advantages / disadvantages

Readings

Class notes: Bootstrapping
Baumer, Horton, and Kaplan (2021), The bootstrap (Chp 9.3) in Modern Data Science for R.
Gareth, Witten, Hastie, and Tibshirani (2021), The Bootstrap (section 5.2) Introduction to Statistical Learning.

Reflection questions

Why would anyone ever want to bootstrap?
What are the differences between a normal CI with Boot SE, a Bootstrap-t CI, and a percentile interval?
Why do we need to bootstrap twice for the Bootstrap-t CI?
What makes a confidence interval procedure good?

Ethics considerations

Why isn’t the bootstrap method a solution for the situation of small sample sizes?
Why isn’t the bootstrap method a solution for the situation with biased / unrepresentative data?
Consider a population with a maximum value (the parameter of interest). Will the sample max have a sampling distribution which is centered on the true maximum? Why or why not? [Quintessential example of how a statistic can be biased for the parameter.]

Slides

In class slides for both 10/7/24 and 10/9/24.
WU10 - standard errors
WU11 - bootstrap t

Additional Resources

StatKey applets which demonstrate bootstrapping.
Confidence interval logic from the Rossman & Chance applets.
The Role of Statistical Learning in Applied Statistics Daniela Witten talks to Rafa Irizarry June 15, 2020.
Five ways to fix statistics, Nature Nov 28, 2017

:::

Reuse