Bootstrapping

October 1 + 6 + 8 + 15, 2025

Jo Hardin

Agenda 10/1/25

Review: logic of SE
Logic of bootstrapping (resample from the sample with replacement)
BS SE of a statistic

Basic Notation

(n.b., we don’t ever do what is on this slide)

Let $θ$ be the parameter of interest, and let $\hat{θ}$ be the estimate of $θ$ . If we could, we’d take many samples of size $n$ from the population to create a sampling distribution for $\hat{θ}$ . Consider taking $B$ random samples from the population.

$\begin{array}{r} \hat{θ} (\cdot) = \frac{1}{B} \sum_{i = 1}^{B} {\hat{θ}}_{i} \end{array}$ is the best guess for $θ$ . If $\hat{θ}$ is very different from $θ$ , we would call it biased. $\begin{aligned} S E (\hat{θ}) & = [\frac{1}{B - 1} \sum_{i = 1}^{B} ({\hat{θ}}_{i} - \hat{θ} (\cdot))^{2}]^{1 / 2} \end{aligned}$

Bootstrap Notation

(n.b., bootstrapping is the process on this slide)

Take many ( $B$ ) resamples of size $n$ from the sample to create a bootstrap distribution for ${\hat{θ}}^{*}$ (instead of the sampling distribution for $\hat{θ}$ ).

Let ${\hat{θ}}^{*} (b)$ be the calculated statistic of interest for the $b^{t h}$ bootstrap sample. The best guess for $θ$ is: $\begin{array}{r} {\hat{θ}}^{*} = \frac{1}{B} \sum_{b = 1}^{B} {\hat{θ}}^{*} (b) \end{array}$ (if ${\hat{θ}}^{*}$ is very different from $\hat{θ}$ , we call it biased.) And the estimated value for the standard error of the estimate is $\begin{array}{r} {\hat{S E}}^{*} = [\frac{1}{B - 1} \sum_{b = 1}^{B} ({\hat{θ}}^{*} (b) - {\hat{θ}}^{*})^{2}]^{1 / 2} \end{array}$

Bootstrap-t CI

What if we don’t believe that $\hat{θ} \sim N$ ?

Summarizing the double bootstrap

boot_2_stats |>
  unnest(second_boot) |>
  unnest(first_boot)

# A tibble: 476,000 × 12
   resample1    id clinic status times prison  dose resample2 boot_med
       <int> <dbl>  <dbl>  <dbl> <dbl>  <dbl> <dbl>     <int>    <dbl>
 1         1   257      1      1   204      0    50         1      368
 2         1   230      1      0    28      0    50         1      368
 3         1   229      1      1   216      0    50         1      368
 4         1   186      2      0   683      0   100         1      368
 5         1   119      2      0   684      0    65         1      368
 6         1    73      1      0   405      0    80         1      368
 7         1    41      1      1   550      1    60         1      368
 8         1    75      1      0   905      0    80         1      368
 9         1    68      1      0   439      0    80         1      368
10         1   224      1      1   546      1    50         1      368
# ℹ 475,990 more rows
# ℹ 3 more variables: boot_tr_mean <dbl>, boot_2_med <dbl>,
#   boot_2_tr_mean <dbl>

boot_2_stats |>
  unnest(second_boot) |>
  unnest(first_boot) |>
  filter(resample1 == 1)

# A tibble: 4 × 4
  skim_variable  numeric.mean numeric.sd numeric.p50
  <chr>                 <dbl>      <dbl>       <dbl>
1 boot_med               368         0          368 
2 boot_tr_mean           372.        0          372.
3 boot_2_med             365.       32.5        367.
4 boot_2_tr_mean         368.       21.5        367.

boot_t_stats <- boot_2_stats |>
  unnest(second_boot) |>
  unnest(first_boot) |>
  group_by(resample1) |>
  summarize(boot_sd_med = sd(boot_2_med),
            boot_sd_tr_mean = sd(boot_2_tr_mean),
            boot_med = mean(boot_med),  # doesn't do anything, just copies over
            boot_tr_mean = mean(boot_tr_mean))  |> # the variables into the output
  mutate(boot_t_med = (boot_med - mean(boot_med)) / boot_sd_med,
            boot_t_tr_mean = (boot_tr_mean - mean(boot_tr_mean)) / boot_sd_tr_mean)

  
boot_t_stats

# A tibble: 100 × 7
   resample1 boot_sd_med boot_sd_tr_mean boot_med boot_tr_mean
       <int>       <dbl>           <dbl>    <dbl>        <dbl>
 1         1        32.5            21.5     368          372.
 2         2        24.2            18.8     358          363.
 3         3        32.0            21.1     431          421.
 4         4        49.1            34.0     332.         350.
 5         5        22.7            13.4     310.         331.
 6         6        20.3            19.9     376          382.
 7         7        35.3            22.1     366          365.
 8         8        15.0            16.4     378.         382.
 9         9        27.6            20.9     394          386.
10        10        38.5            19.6     392.         402.
# ℹ 90 more rows
# ℹ 2 more variables: boot_t_med <dbl>, boot_t_tr_mean <dbl>

CI	Lower	Obs Med	Upper	Lower	Obs Tr Mean	Upper
Percentile	321	367.50	434.58	334.86	378.30	419.77
w BS SE	309.99	367.50	425.01	336.87	378.30	419.73
BS-t	309.30	367.50	425.31	331.03	378.30	421.17

What makes a confidence interval procedure good?

Symmetry (??): the interval is symmetric, pivotal around some value. Not necessarily a good thing. Maybe a bad thing to force?
Resistant: BS-t is particularly not resistant to outliers or skewed sampling distributions of the statistic (can make it more robust with a variance stabilizing transformation)
Range preserving: the CI always contains only values that fall within an allowable range ( $p, ρ$ ,…)
Transformation respecting: for any monotone transformation, $ϕ = m (θ)$ , the interval for $θ$ is mapped directly by $m (θ)$ . If $[{\hat{θ}}_{(l o)}, {\hat{θ}}_{(h i)}]$ is a $(1 - α) 100$ % interval for $θ$ , then

$[{\hat{ϕ}}_{(l o)}, {\hat{ϕ}}_{(h i)}] = [m ({\hat{θ}}_{(l o)}), m ({\hat{θ}}_{(h i)})]$ are exactly the same interval.

Level of confidence: A central (not symmetric) confidence interval, $[{\hat{θ}}_{(l o)}, {\hat{θ}}_{(h i)}]$ should have probability $α / 2$ of not covering $θ$ from above or below:

$\begin{aligned} P (θ < {\hat{θ}}_{(l o)}) & = α / 2 \\ P (θ > {\hat{θ}}_{(h i)}) & = α / 2 \end{aligned}$

Note: all of the intervals are approximate. We judge them based on how accurately they cover $θ$ .
- A CI is first order accurate if: $\begin{aligned} P (θ < {\hat{θ}}_{(l o)}) & = α / 2 + \frac{c o n s t_{l o}}{\sqrt{n}} \\ P (θ > {\hat{θ}}_{(h i)}) & = α / 2 + \frac{c o n s t_{h i}}{\sqrt{n}} \end{aligned}$
- A CI is second order accurate if: $\begin{aligned} P (θ < {\hat{θ}}_{(l o)}) & = α / 2 + \frac{c o n s t_{l o}}{n} \\ P (θ > {\hat{θ}}_{(h i)}) & = α / 2 + \frac{c o n s t_{h i}}{n} \end{aligned}$

CI	Symmetric	Range Resp	Trans Resp	Accuracy	Normal Samp Dist?	Other
Boot SE	Yes	No	No	1st order	Yes	Parametric assumptions, $F (\hat{θ})$
Boot-t	No	No	No	2nd order	No	Computationally intensive
perc	No	Yes	Yes	1st order	No	Small $n \to$ low accuracy
BCa	No	Yes	Yes	2nd order	No	Limited parametric assumptions

Advantages and Disadvantages

Normal Approximation
- Advantages similar to the familiar parametric approach; useful with a normally distributed $\hat{θ}$ ; requires the least computation ( $B = 50 - 200$ )
- Disadvantages fails to use the entire ${\hat{F}}^{*} ({\hat{θ}}^{*})$ ; only works if $\hat{θ}$ is reasonably normal to start with
Bootstrap-t Confidence Interval
- Advantages highly accurate CI in many cases; handles skewed $F (\hat{θ})$ better than the percentile method
- Disadvantages not invariant to transformations; computationally expensive with the double bootstrap; coverage probabilities are best if the distribution of $\hat{θ}$ is nice (e.g., normal)
Percentile
- Advantages uses the entire ${\hat{F}}^{*} ({\hat{θ}}^{*})$ ; allows $F (\hat{θ})$ to be asymmetrical; invariant to transformations; range respecting; simple to execute
- Disadvantages small samples may result in low accuracy (because of the dependence on the tail behavior); assumes ${\hat{F}}^{*} ({\hat{θ}}^{*})$ to be unbiased
BCa
- Advantages all of those of the percentile method; allows for bias in ${\hat{F}}^{*} ({\hat{θ}}^{*})$ ; $z_{0}$ can be calculated easily from ${\hat{F}}^{*} ({\hat{θ}}^{*})$
- Disadvantages requires a limited parametric assumption; more computational than other intervals

1 / 42

Bootstrapping October 1 + 6 + 8 + 15, 2025 Jo Hardin

Bootstrapping
Agenda 10/1/25
Why bootstrap?
Variability
Intuitive understanding
Basic Notation
Ideally
Bootstrap Procedure
Bootstrap Notation
What do we get?
Example
The data
Observed Statistic(s)
Bootstrapped data!
Bootstrapping with map()
Data distributions
Sampling distributions
Agenda 10/6 + 8/25
Technical derivations
Bootstrap condition:
Normal CI using bootstrap SE
CI for $θ$
CI for $θ$ with bootstrap SE
Bootstrapping with map()
95% normal CI with BS SE
Bootstrap-t CI
Double bootstrap to find the multiplier
CI multiplier
Bootstrap-t CI
Double bootstrapping with map()
Summarizing the double bootstrap
95% Bootstrap-t CI
Bootstrapping with map()
Agenda 10/15/25
95% Percentile CI
Calculating the Percentile CI
Comparison of intervals
What makes a confidence interval procedure good?
What makes a confidence interval procedure good?
What else about intervals?
Advantages and Disadvantages
References