Simulating

September 8 - 17, 2025

Jo Hardin

Agenda 9/8/25

Simulating to understand bias-variance trade-off
map()

Bias-Variance Trade-off

What is variance? What is bias?

Variance refers to the amount by which \(\hat{f}\) would change if we estimated it using a different training set.

Bias refers to the error that is introduced by approximating the “truth” by a model which is too simple.

Mathematically

We can measure how well a model does by looking at how close the fit of the model (\(\hat{f}\)) is to the observed data (\(y\)).

\[MSE = E\bigg[(Y - \hat{f}(x))^2\bigg]\]

Because \(Y = f(x) + \epsilon\), we know that:

\[\begin{eqnarray*} MSE &=& E\bigg[(Y - \hat{f}(x))^2\bigg]\\ &=& E\bigg[(f(x) + \epsilon - \hat{f}(x))^2\bigg]\\ &=& E\bigg[(f(x) - \hat{f}(x))^2\bigg] + 2 E\bigg[(f(x) - \hat{f}(x)) \cdot \epsilon \bigg] + E[\epsilon^2] \end{eqnarray*}\]

Term 3: The third term is \(\sigma^2\), the irreducible error (the part that even the perfect model can’t predict).

Term 2: Turns out to be zero because the errors are assumed to be independent of the explanatory variable, so the expected value can filter through.

Term 1: Takes a little bit of work to expand (something you’d do in, say, Math 152)¹.

Simplified Term 1

\[E\bigg[(f(x) - \hat{f}(x))^2\bigg] = \bigg( f(x) - E[\hat{f}(x)] \bigg)^2 + E\bigg[ (E[\hat{f}(x)] - \hat{f}(x))^2 \bigg]\]

Where:

\(\bigg[\) Bias of \(\hat{f}(x) \bigg]^2\) is \(\bigg( f(x) - E[\hat{f}(x)] \bigg)^2\)

Variance of \(\hat{f}(x)\) is \(E\bigg[ (E[\hat{f}(x)] - \hat{f}(x))^2 \bigg]\)

MSE

\[\begin{eqnarray*} MSE &=& E\bigg[(Y - \hat{f}(x))^2\bigg]\\ &=& \bigg[ \mbox{ Bias of } \hat{f}(x) \bigg]^2 + \mbox{ Variance of } \hat{f}(x) + \sigma^2 \end{eqnarray*}\]

Simulation

Let’s consider a model: \(Y = f(x) + \epsilon\)

plot
code

num_x <- length(seq(0, 10, by = 0.5))
set.seed(4774)
bias_var_data <- tibble(ex = seq(0, 10, by = 0.5),
                        eps = rnorm(num_x, mean = 0, sd = 5),
                        why = ex^2 + eps) |> 
  mutate(underfit = lm(why ~ ex)$fitted,
         `good fit` = lm(why ~ ex + I(ex^2))$fitted,
         overfit = lm(why ~ ex + I(ex^2) + I(ex^3) + I(ex^4) + I(ex^5) + I(ex^6) + I(ex^7) + I(ex^8) + I(ex^9) + I(ex^10) )$fitted) |> 
  pivot_longer(cols = underfit:overfit, 
               names_to = "fit", 
               values_to = "prediction") |> 
  mutate(fit = factor(fit, 
                        levels = c("underfit", "good fit", "overfit") ))

bias_var_data |> 
  ggplot(aes(x = ex, y = why)) + 
  geom_point() + 
  geom_line(aes(y = prediction)) + 
  facet_grid(~ fit)

Errors

Which is worse? Underfitting or overfitting? How can we measure the error?

result
code

# A tibble: 3 × 2
  fit      MSE_OG
  <fct>     <dbl>
1 underfit   74.8
2 good fit   21.9
3 overfit    10.5

bias_var_data |> 
  group_by(fit) |> 
  summarize(MSE_OG = mean((why - prediction)^2))

Fit to new data

WAIT! We don’t want the error on the data that built the model, we want the error on the population.

results
code

# A tibble: 3 × 2
  fit      MSE_newdata
  <fct>          <dbl>
1 underfit        98.6
2 good fit        27.3
3 overfit         35.8

set.seed(470)
new_data <- tibble(ex = seq(0, 10, by = 0.5),
                   eps = rnorm(num_x, mean = 0, sd = 5),
                   why = ex^2 + eps)

bias_var_data |> 
  ggplot(aes(x = ex, y = why)) + 
  geom_point(data = new_data, aes(x = ex, y = why)) + 
  geom_line(aes(y = prediction)) + 
  facet_grid(~ fit)

lm1 <- bias_var_data |> 
  filter(fit == "underfit") |> 
  lm(why ~ ex, data = _)

lm2 <- bias_var_data |> 
  filter(fit == "good fit") |> 
  lm(why ~ ex + I(ex^2), data = _)

lm3 <- bias_var_data |> 
  filter(fit == "overfit") |> 
  lm(why ~ ex + I(ex^2) + I(ex^3) + I(ex^4) + I(ex^5) + 
       I(ex^6) + I(ex^7) + I(ex^8) + I(ex^9) + I(ex^10), data = _)

new_data |> 
  mutate(underfit = predict.lm(lm1, newdata = new_data),
         `good fit` = predict(lm2, newdata = new_data),
         overfit = predict(lm3, newdata = new_data)) |> 
  pivot_longer(cols = underfit:overfit, 
               names_to = "fit", 
               values_to = "prediction") |> 
  mutate(fit = factor(fit, 
                        levels = c("underfit", "good fit", "overfit") )) |> 
  group_by(fit) |> 
  summarize(MSE_newdata = mean((why - prediction)^2))

Back to bias and variance

model 1: y ~ x
model 2: y ~ x + x^2
model 3: y ~ x + x^2 + x^3 + x^4 + x^5 + x^6 + x^7 + x^8 + x^9 + x^10
code

set.seed(47)
datasets <- 1:9 |> 
  map_dfr(\(i) {
    tibble(
      ex  = seq(0, 10, by = 0.5),
      eps = rnorm(num_x, mean = 0, sd = 5),
      why = ex^2 + eps,
      dataset = i
    )
  }) |> 
  group_by(dataset) |> 
  mutate(underfit = lm(why ~ ex)$fitted,
         `good fit` = lm(why ~ ex + I(ex^2))$fitted,
         overfit = lm(why ~ ex + I(ex^2) + I(ex^3) + I(ex^4) + I(ex^5) + I(ex^6) + I(ex^7) + I(ex^8) + I(ex^9) + I(ex^10) )$fitted) |> 
  pivot_longer(cols = underfit:overfit, 
               names_to = "fit", 
               values_to = "prediction") |> 
  mutate(fit = factor(fit, 
                        levels = c("underfit", "good fit", "overfit") ))

datasets

datasets |> 
  filter(fit == "underfit") |> 
  ggplot(aes(x = ex, y = why)) + 
  geom_point(size = 0.25) + 
  geom_line(aes(y = prediction)) + 
  facet_wrap(~ dataset)

datasets |> 
  filter(fit == "good fit") |> 
  ggplot(aes(x = ex, y = why)) + 
  geom_point(size = 0.25) + 
  geom_line(aes(y = prediction)) + 
  facet_wrap(~ dataset)

datasets |> 
  filter(fit == "overfit") |> 
  ggplot(aes(x = ex, y = why)) + 
  geom_point(size = 0.25) + 
  geom_line(aes(y = prediction)) + 
  facet_wrap(~ dataset)

What we know about bias-variance trade-off

The underfit model looks the same for every dataset! That is, the model has low variance.
The underfit model doesn’t fit the data very well. The model has high bias.
The overfit model model is very flexible, and looks like it fits the data very well. It has low bias.
The fitted models look quite different across the different datasets. The overfit model has high variance.

Agenda 9/15/25

Monte Carlo sampling
Simulating to estimate probabilities
Simulating to consider statistical inference with fewer conditions.

Simulating: the big picture.

Why?

Three simulation methods are used for different purposes:

Monte Carlo methods - use repeated sampling from a population with known characteristics.
Randomization / Permutation methods - use shuffling (sampling without replacement from a sample) to test hypotheses of “no effect”.
Bootstrap methods - use resampling (sampling with replacement from a sample) to establish confidence intervals.

Monte Carlo sample from a population to…

…simulate to understand complicated models.
…simulate as easier than calculus for estimating probabilities.
…simulate to consider statistical inference with fewer conditions.

Goals of simulating complicated models

The goal of simulating a complicated model is not only to create a program which will provide the desired results. We also hope to be able to write code such that:

The problem is broken down into small pieces.
The problem has checks in it to see what works (run the lines inside the functions!).
uses simple code (as often as possible).

Examples

bias-variance trade-off
see class notes for more examples

Aside: a few R functions (`ifelse()`)

data.frame(value = c(-2:2)) |>
  mutate(abs_value = ifelse(value >=0, value, -value))  # abs val

  value abs_value
1    -2         2
2    -1         1
3     0         0
4     1         1
5     2         2

Aside: a few R functions (`case_when()`)

set.seed(4747)
diamonds |> select(carat, cut, color, price) |>
  sample_n(20) |>
  mutate(price_cat = case_when(
    price > 10000 ~ "expensive",
    price > 1500 ~ "medium", 
    TRUE ~ "inexpensive"))

# A tibble: 20 × 5
   carat cut       color price price_cat  
   <dbl> <ord>     <ord> <int> <chr>      
 1  1.23 Very Good F     10276 expensive  
 2  0.35 Premium   H       706 inexpensive
 3  0.7  Good      E      2782 medium     
 4  0.4  Ideal     D      1637 medium     
 5  0.53 Ideal     G      1255 inexpensive
 6  2.22 Ideal     G     14637 expensive  
 7  0.3  Ideal     G       878 inexpensive
 8  1.05 Ideal     H      4223 medium     
 9  0.53 Premium   E      1654 medium     
10  1.7  Ideal     H      7068 medium     
11  0.31 Good      E       698 inexpensive
12  0.31 Ideal     F       840 inexpensive
13  1.03 Ideal     H      4900 medium     
14  0.31 Premium   G       698 inexpensive
15  1.56 Premium   G      8858 medium     
16  1.71 Premium   G     11032 expensive  
17  1    Good      E      5345 medium     
18  1.86 Ideal     J     10312 expensive  
19  1.08 Very Good E      3726 medium     
20  0.31 Premium   E       698 inexpensive

Aside: a few R functions (`sample_n()`)

We took a random sample of 20 rows from the diamonds dataset.

set.seed(4747)
diamonds |> select(carat, cut, color, price) |>
  sample_n(20) |>
  mutate(price_cat = case_when(
    price > 10000 ~ "expensive",
    price > 1500 ~ "medium", 
    TRUE ~ "inexpensive"))

# A tibble: 20 × 5
   carat cut       color price price_cat  
   <dbl> <ord>     <ord> <int> <chr>      
 1  1.23 Very Good F     10276 expensive  
 2  0.35 Premium   H       706 inexpensive
 3  0.7  Good      E      2782 medium     
 4  0.4  Ideal     D      1637 medium     
 5  0.53 Ideal     G      1255 inexpensive
 6  2.22 Ideal     G     14637 expensive  
 7  0.3  Ideal     G       878 inexpensive
 8  1.05 Ideal     H      4223 medium     
 9  0.53 Premium   E      1654 medium     
10  1.7  Ideal     H      7068 medium     
11  0.31 Good      E       698 inexpensive
12  0.31 Ideal     F       840 inexpensive
13  1.03 Ideal     H      4900 medium     
14  0.31 Premium   G       698 inexpensive
15  1.56 Premium   G      8858 medium     
16  1.71 Premium   G     11032 expensive  
17  1    Good      E      5345 medium     
18  1.86 Ideal     J     10312 expensive  
19  1.08 Very Good E      3726 medium     
20  0.31 Premium   E       698 inexpensive

Aside: a few R functions (`sample()`)

sampling, shuffling, and resampling: sample()

set.seed(47)
alph <- letters[1:10]
alph

 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

sample(alph, 5, replace = FALSE) # sample (from a population)

[1] "i" "b" "g" "d" "a"

sample(alph, 5, replace = TRUE) # sample (from a population)

[1] "j" "g" "f" "i" "f"

sample(alph, 10, replace = FALSE)  # shuffle

 [1] "f" "h" "i" "e" "g" "d" "c" "j" "b" "a"

sample(alph, 10, replace = TRUE)  # resample

 [1] "e" "j" "e" "b" "e" "c" "f" "a" "e" "a"

Aside: a few R functions (`set.seed()`)

What if we want to be able to generate the same random numbers (here on the interval from 0 to 1) over and over?

set.seed(4747)
runif(4, 0, 1) # random uniform numbers

[1] 0.1949071 0.3390270 0.5147919 0.4516470

set.seed(123)
runif(4, 0, 1) # random uniform numbers

[1] 0.2875775 0.7883051 0.4089769 0.8830174

set.seed(4747)
runif(4, 0, 1) # random uniform numbers

[1] 0.1949071 0.3390270 0.5147919 0.4516470

Aside: a few R functions (`n()` and `n_distinct()`)

n() counts the number of rows. n_distinct() counts the number of distinct rows.

library(nycflights13)

flights |> 
  select(origin, dest) |> 
  group_by(origin) |> 
  summarize(num_flt = n(), num_dest =  n_distinct(dest))

# A tibble: 3 × 3
  origin num_flt num_dest
  <chr>    <int>    <int>
1 EWR     120835       86
2 JFK     111279       70
3 LGA     104662       68

Monte Carlo sample from a population to…

…simulate to understand complicated models.
…simulate as easier than calculus for estimating probabilities.
…simulate to consider statistical inference with fewer conditions.

Small simulation example to calculate a probability

Sally & Joan

Consider a situation where Sally and Joan plan to meet to study in their college campus center (Mosteller 1987; Baumer, Kaplan, and Horton 2021). They are both impatient people who will wait only 10 minutes for the other before leaving.

But their planning was incomplete. Sally said, “Meet me between 7 and 8 tonight at the student center.” When should Joan plan to arrive at the campus center? And what is the probability that they actually meet?

Simulate their meeting

Assume that Sally and Joan are both equally likely to arrive at the campus center anywhere between 7pm and 8pm.

vectorized
function
map()
for() loop

library(tictoc)
n <- 10000

tic()
meet_vect <- tibble(
  sally = runif(n, min = 0, max = 60),
  joan = runif(n, min = 0, max = 60),
  result = ifelse(
    abs(sally - joan) <= 10, "They meet", "They do not"
  )
)
toc()

0.002 sec elapsed

meet_func <- function(nada){
  tibble(
    sally = runif(1, min = 0, max = 60),
    joan = runif(1, min = 0, max = 60),
    result = ifelse(
    abs(sally - joan) <= 10, "They meet", "They do not"
  )
  )
}

tic()
meet_map <- 1:n |> 
  map(meet_func) |> 
  list_rbind()
toc()

2.208 sec elapsed

tic()
meet_for <- data.frame()
for(i in 1:n){
  meet_for <- meet_for |> rbind(
    tibble(
      sally = runif(1, min = 0, max = 60),
      joan = runif(1, min = 0, max = 60),
      result = ifelse(
    abs(sally - joan) <= 10, "They meet", "They do not"
  )))
}
toc()

3.749 sec elapsed

Results

The results themselves are equivalent. Differing values due to randomness in the simulation.

vectorized
map()
for() loop

meet_vect |> 
  group_by(result) |> 
  summarize(number = n()) |> 
  mutate(proprotion = number / sum(number))

# A tibble: 2 × 3
  result      number proprotion
  <chr>        <int>      <dbl>
1 They do not   6966      0.697
2 They meet     3034      0.303

meet_map |> 
  group_by(result) |> 
  summarize(number = n())|> 
  mutate(proprotion = number / sum(number))

# A tibble: 2 × 3
  result      number proprotion
  <chr>        <int>      <dbl>
1 They do not   7087      0.709
2 They meet     2913      0.291

meet_for |> 
  group_by(result) |> 
  summarize(number = n())|> 
  mutate(proprotion = number / sum(number))

# A tibble: 2 × 3
  result      number proprotion
  <chr>        <int>      <dbl>
1 They do not   6942      0.694
2 They meet     3058      0.306

Visualizing the meet up

meet_map |> 
  ggplot(aes(x = joan, y = sally, color = result)) +
  geom_point(alpha = 0.3) + 
  geom_abline(intercept = 10, slope = 1) + 
  geom_abline(intercept = -10, slope = 1) + 
  scale_color_brewer(palette = "Paired")

Simulation best practices

no magic numbers
comment your code
use informative names
set a seed for reproducibility

Approach to simulating

break down the problem into small steps
check to see what works
simple is better

Examples

see class notes for more probability examples
- pass the pigs
- blackjack
- roulette

Agenda 9/17/25

Simulating to consider statistical inference – changes in technical conditions / assumptions.

Sensitivity of inferential conditions

Simulation to understand complicated models..
Simulation as easier than calculus for estimating probabilities.
Simulation to consider statistical inference with fewer conditions

t-test with exponential data

The t-test says that if the null hypothesis is true, we’d expect to reject a true null hypothesis \(\alpha \cdot 100\) percent of the time. That is, if \(\alpha = 0.05\), we’d reject true null hypotheses 5% of the time.

The guarantee of 5% (across many tests) happens only when the data are normal (or reasonably normal).

Non-normal data

data.frame(exp_data = rexp(n = 100000, rate = 20)) |> 
  ggplot(aes(x = exp_data)) +
  geom_histogram()

t-test function

t.test.pval <- function(data1, data2){
  t.test(data1, data2) |> # does the t-test
    broom::tidy() |>      # modifies the output to be a data frame
    select(p.value)       # selectes only the p-value from the output
}

t.test.pval(rexp(n = 10, rate = 20), rexp(n = 7, rate = 20))

# A tibble: 1 × 1
  p.value
    <dbl>
1   0.160

t-test 10 times

map(1:10, ~t.test.pval(rexp(n = 10, rate = 20),
                       rexp(n = 7, rate = 20))) |> 
          list_rbind()

# A tibble: 10 × 1
   p.value
     <dbl>
 1  0.145 
 2  0.324 
 3  0.414 
 4  0.584 
 5  0.385 
 6  0.782 
 7  0.127 
 8  0.0762
 9  0.733 
10  0.305

set.seed(4747)

results <- map(1:10000, 
               ~t.test.pval(rexp(n = 10, rate = 20),
                            rexp(n = 7, rate = 20))) |> 
  list_rbind()

results |> 
  ggplot(aes(x = p.value)) + 
  geom_histogram(breaks = seq(0, 1, by=0.05)) + 
  labs(x = "p-values")

results |> 
  summarize(type1 = mean(p.value < 0.05))

# A tibble: 1 × 1
   type1
   <dbl>
1 0.0377

Takeaway ideas

small takeaway: we shouldn’t necessarily use the t-test without checking the technical conditions (note here that there were pretty small sample sizes, which is also important).
bigger takeaway: we can use simulation to understand the error rates for hypothesis tests under particular initial conditions.

Sensitivity of CI to tech conditions

Consider the following set up:

set.seed(474)
beta0 <- -1; beta1 <- 0.5; beta2 <- 1.5
n_obs <- 100; reps <- 1000


x1 <- rep(c(0,1), each=n_obs/2)
x2 <- runif(n_obs, min=-1, max=1)
y <- beta0 + beta1*x1 + beta2*x2 + rnorm(n_obs, mean=0, sd = 1)

Plot of data

Capture the slope parameter?

We know the truth (!!!!!)
We can create a single CI for the parameter given a single data set
Is the parameter in the interval?
Should it always be in the interval? How often should it be?
Repeat the process to see whether the confidence interval captures the parameter at the claimed rate.

See: Rossman Chance regression applet

Running a linear model

CI <- lm(y~x1+x2) |> tidy(conf.int=TRUE) 
CI

# A tibble: 3 × 7
  term        estimate std.error statistic  p.value conf.low conf.high
  <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1 (Intercept)   -0.929     0.142     -6.52 3.10e- 9  -1.21      -0.647
2 x1             0.330     0.202      1.64 1.05e- 1  -0.0698     0.731
3 x2             1.44      0.192      7.53 2.66e-11   1.06       1.83

CI |>
  filter(term == "x2") |>
  select(term, estimate, conf.low, conf.high) |>
  mutate(inside = between(beta2, conf.low, conf.high))

# A tibble: 1 × 5
  term  estimate conf.low conf.high inside
  <chr>    <dbl>    <dbl>     <dbl> <lgl> 
1 x2        1.44     1.06      1.83 TRUE

What happens to the CI of coefficients in repeated samples? (eq var)

Datasets
Function
map()

eqvar_data <- data.frame(row_id = seq(1, n_obs, 1)) |>
  slice(rep(row_id, each = reps)) |>
  mutate(
    sim_id = rep(1:reps, n_obs),
    x1 = rep(c(0,1), each = n()/2),
    x2 = runif(n(), min = -1, max = 1),
    y = beta0 + beta1*x1 + beta2*x2 + rnorm(n(), mean = 0, sd = 1)
  ) |>
  arrange(sim_id, row_id) |>
  group_by(sim_id) |>
  nest()

eqvar_data

# A tibble: 1,000 × 2
# Groups:   sim_id [1,000]
   sim_id data              
    <int> <list>            
 1      1 <tibble [100 × 4]>
 2      2 <tibble [100 × 4]>
 3      3 <tibble [100 × 4]>
 4      4 <tibble [100 × 4]>
 5      5 <tibble [100 × 4]>
 6      6 <tibble [100 × 4]>
 7      7 <tibble [100 × 4]>
 8      8 <tibble [100 × 4]>
 9      9 <tibble [100 × 4]>
10     10 <tibble [100 × 4]>
# ℹ 990 more rows

beta_coef <- function(df){
  lm(y ~ x1 + x2, data = df) |>
    tidy(conf.int = TRUE) |>
    filter(term == "x2") |>
    select(estimate, conf.low, conf.high, p.value) 
}

eqvar_data |> 
  mutate(b2_vals = map(data, beta_coef)) |>
  select(b2_vals) |> 
  unnest(b2_vals) |>
  summarize(capture = between(beta2, conf.low, conf.high)) |>
  summarize(capture_rate = sum(capture)/reps)

# A tibble: 1 × 1
  capture_rate
         <dbl>
1        0.953

Sensitivity of CI to tech conditions

Consider the following set up (note the difference in variability):

set.seed(470)
beta0 <- -1; beta1 <- 0.5; beta2 <- 1.5
n_obs <- 100; reps <- 1000

  x1 <- rep(c(0,1), each=n_obs/2)
  x2 <- runif(n_obs, min=-1, max=1)
  y <- beta0 + beta1*x1 + beta2*x2 + 
             rnorm(n_obs, mean=0, sd = 1 + x1 + 10*abs(x2))

Plot of data

What happens to the CI of coefficients in repeated samples? (uneq var)

Datasets
map()

uneqvar_data <- data.frame(row_id = seq(1, n_obs, 1)) |>
  slice(rep(row_id, each = reps)) |>
  mutate(sim_id = rep(1:reps, n_obs), 
         x1 = rep(c(0,1), each = n()/2),
         x2 = runif(n(), min = -1, max = 1),
         y = beta0 + beta1*x1 + beta2*x2 + 
              rnorm(n(), mean = 0, sd = 1 + x1 + 10*abs(x2))  ) |>
  arrange(sim_id, row_id) |>
  group_by(sim_id) |> nest()

uneqvar_data |> 
  mutate(b2_vals = map(data, beta_coef)) |>
  select(b2_vals) |> unnest(b2_vals) |>
  summarize(capture = between(beta2, conf.low, conf.high)) |>
  summarize(capture_rate = sum(capture)/reps)

# A tibble: 1 × 1
  capture_rate
         <dbl>
1        0.877

Equal variance: type I error?

Another method for thinking about type I error rates. With equal variance, the type I error rate is close to what we’d expect, 5%.

function
generate data
plot data
summarize p-values

t_test_pval <- function(df){
  t.test(y ~ x1, data = df, var.equal = TRUE) |>
    tidy() |>
    select(estimate, p.value) 
}

set.seed(470)
reps <- 1000
n_obs <- 20
null_data_equal <- 
  data.frame(row_id = seq(1, n_obs, 1)) |>
  slice(rep(row_id, each = reps)) |>
  mutate(
    sim_id = rep(1:reps, n_obs),
    x1 = rep(c("group1", "group2"), each = n()/2),
    y = rnorm(n(), mean = 10, 
               sd = rep(c(1,1), each = n()/2))
  ) |>
  arrange(sim_id, row_id) |>
  group_by(sim_id) |>
  nest()

null_data_equal |> 
  filter(sim_id == 47) |> 
  unnest(cols = c(data)) |> 
  ggplot(aes(x = x1, y = y)) + 
  geom_boxplot() + 
  geom_jitter()

null_data_equal |> 
  mutate(t_vals = map(data,t_test_pval)) |>
  select(t_vals) |> 
  unnest(t_vals) |>
  ungroup(sim_id) |>
  summarize(type1error_rate = sum(p.value < 0.05)/reps)

# A tibble: 1 × 1
  type1error_rate
            <dbl>
1           0.045

Takeaway ideas

small takeaway: the linear model condition of equal variance will impact what the actual capture rate is for a confidence interval.
bigger takeaway: we can use simulations to understand whether the confidence interval procedure is capturing the parameter at the rate we want.

Unequal variance: type I error?

With unequal variance, the type I error rate is much higher than we set. We set the the type I error rate to be 5%, but the simulated rate was 6.8%.

generate data
plot data
summarize p-values

set.seed(47)
reps <- 1000
n_obs <- 20
null_data_unequal <- 
  data.frame(row_id = seq(1, n_obs, 1)) |>
  slice(rep(row_id, each = reps)) |>
  mutate(
    sim_id = rep(1:reps, n_obs),
    x1 = rep(c("group1", "group2"), each = n()/2),
    y = rnorm(n(), mean = 10, 
               sd = rep(c(1,100), each = n()/2))
  ) |>
  arrange(sim_id, row_id) |>
  group_by(sim_id) |>
  nest()

null_data_unequal |> 
  filter(sim_id == 47) |> 
  unnest(cols = c(data)) |> 
  ggplot(aes(x = x1, y = y)) + 
  geom_boxplot() + 
  geom_jitter()

null_data_unequal |> 
  mutate(t_vals = map(data,t_test_pval)) |>
  select(t_vals) |> 
  unnest(t_vals) |>
  ungroup(sim_id) |>
  summarize(type1error_rate = sum(p.value < 0.05)/reps)

# A tibble: 1 × 1
  type1error_rate
            <dbl>
1           0.068

Takeaway ideas

small takeaway: we shouldn’t use the t-test without checking the technical conditions, here the problem was unequal variance (again, small sample sizes were part of the problem).
bigger takeaway: we can use simulation to understand the error rates for hypothesis tests under particular initial conditions.

References

Baumer, Ben, Daniel Kaplan, and Nicholas Horton. 2021. Modern Data Science with r. CRC Press. https://mdsr-book.github.io/mdsr2e/.

Mosteller, Frederick. 1987. Fifty Challenging Problems in Probability with Solutions. Dover Publications: Mineola, NY.

Simulating

Agenda 9/8/25

Bias-Variance Trade-off

What is variance? What is bias?

Mathematically

Simplified Term 1

MSE

Simulation

Errors

Fit to new data

Back to bias and variance

What we know about bias-variance trade-off

Agenda 9/15/25

Simulating: the big picture.

Why?

Monte Carlo sample from a population to…

Goals of simulating complicated models

Examples

Aside: a few R functions (ifelse())

Aside: a few R functions (case_when())

Aside: a few R functions (sample_n())

Aside: a few R functions (sample())

Aside: a few R functions (set.seed())

Aside: a few R functions (n() and n_distinct())

Monte Carlo sample from a population to…

Small simulation example to calculate a probability

Sally & Joan

Simulate their meeting

Results

Visualizing the meet up

Simulation best practices

Approach to simulating

Examples

Agenda 9/17/25

Sensitivity of inferential conditions

t-test with exponential data

Non-normal data

t-test function

t-test 10 times

t-test many times

Takeaway ideas

Sensitivity of CI to tech conditions

Plot of data

Capture the slope parameter?

Running a linear model

What happens to the CI of coefficients in repeated samples? (eq var)

Sensitivity of CI to tech conditions

Plot of data

What happens to the CI of coefficients in repeated samples? (uneq var)

Equal variance: type I error?

Takeaway ideas

Unequal variance: type I error?

Takeaway ideas

References

Aside: a few R functions (`ifelse()`)

Aside: a few R functions (`case_when()`)

Aside: a few R functions (`sample_n()`)

Aside: a few R functions (`sample()`)

Aside: a few R functions (`set.seed()`)

Aside: a few R functions (`n()` and `n_distinct()`)