Permutation Tests

class: right, top, my-title, title-slide

# Permutation Tests
### Jo Hardin
### September 28 & 30, 2021

---

# Agenda  9/28/21

1. Review: logic of hypothesis testing
2. Logic of permutation tests
3. Examples - 2 samples and beyond

---

## Logic of hypothesis tests

1. Choose a statistic that measures the effect

2. Construct the sampling distribution under `$H_0$`

3. Locate the observed statistic in the null sampling distribution

4. **p-value**  is the probability of the observed data or more extreme if the null hypothesis is true

---

## Logic of permutation tests

1. Choose a test statistic

2. Shuffle the data (force the null hypothesis to be true)

3. Create a null sampling distribution of the test statistic (under `$H_0$`)

4. Find the observed test statistic on the null sampling distribution and compute the p-value (observed data or more extreme).  The p-value can be one or two-sided.

---

##  Consider the NHANES dataset.

- Income 
    * (HHIncomeMid - Numerical version of HHIncome derived from the middle income in each category)
- Health 
    * (HealthGen - Self-reported rating of participant's health in general Reported for participants aged 12 years or older. One of Excellent, Vgood, Good, Fair, or Poor.)

---

## Summary of the variables of interest

```r
NHANES %>% select(HealthGen) %>% table()
```

```
## .
## Excellent     Vgood      Good      Fair      Poor 
##       878      2508      2956      1010       187
```

```r
NHANES %>% select(HHIncomeMid) %>% summary()
```

```
##   HHIncomeMid    
##  Min.   :  2500  
##  1st Qu.: 30000  
##  Median : 50000  
##  Mean   : 57206  
##  3rd Qu.: 87500  
##  Max.   :100000  
##  NA's   :811
```

---

## Mean Income broken down by Health

```r
NH.means <- NHANES %>% 
 filter(!is.na(HealthGen) & !is.na(HHIncomeMid)) %>% 
 group_by(HealthGen) %>% 
 summarize(IncMean = mean(HHIncomeMid), count=n())
NH.means
```

```
## # A tibble: 5 × 3
## HealthGen IncMean count
## <fct> <dbl> <int>
## 1 Excellent 69354. 817
## 2 Vgood 65011. 2342
## 3 Good 55662. 2744
## 4 Fair 44194. 899
## 5 Poor 37027. 164
```

Are the differences in means simply due to random chance??

---

## Income and Health

```r
NHANES %>% filter(!is.na(HealthGen)& !is.na(HHIncomeMid)) %>% 
ggplot(aes(x=HealthGen, y=HHIncomeMid)) + geom_boxplot()
```

---

## Differences in Income ($)

```
##           Excellent  Vgood   Good  Fair  Poor
## Excellent         0   4344  13692 25161 32327
## Vgood         -4344      0   9348 20817 27983
## Good         -13692  -9348      0 11469 18635
## Fair         -25161 -20817 -11469     0  7166
## Poor         -32327 -27983 -18635 -7166     0
```

---

## Overall difference

We can measure the overall differences as the amount of variability between each of the means and the overall mean:

`$$F = \frac{\text{between-group variability}}{\text{within-group variability}}$$`
`$$F = \frac{\sum_i n_i(\overline{X}_{i\cdot} - \overline{X})^2/(K-1)}{\sum_{ij} (X_{ij}-\overline{X}_{i\cdot})^2/(N-K)}$$`
`$$SumSqBtwn = \sum_i n_i(\overline{X}_{i\cdot} - \overline{X})^2$$`

---

## Creating a test statistic

```r
NHANES %>% select(HHIncomeMid, HealthGen) %>% 
  filter(!is.na(HealthGen)& !is.na(HHIncomeMid))
```

```
## # A tibble: 6,966 × 2
## HHIncomeMid HealthGen
## <int> <fct> 
## 1 30000 Good 
## 2 30000 Good 
## 3 30000 Good 
## 4 40000 Good 
## 5 87500 Vgood 
## 6 87500 Vgood 
## 7 87500 Vgood 
## 8 30000 Vgood 
## 9 100000 Vgood 
## 10 70000 Fair 
## # … with 6,956 more rows
```

---

## Creating a test statistic

```r
GM <- NHANES %>% summarize(mean(HHIncomeMid, na.rm=TRUE)) %>% pull()
GM
```

```
## [1] 57206
```

```r
NH.means
```

---

## Creating a test statistic

```r
NH.means %>% select(IncMean) %>% pull() - GM
```

```
## [1]  12148   7805  -1544 -13013 -20179
```

```r
(NH.means %>% select(IncMean) %>% pull() - GM)^2
```

```
## [1] 1.48e+08 6.09e+07 2.38e+06 1.69e+08 4.07e+08
```

```r
NH.means %>% select(count) %>% pull()
```

```
## [1]  817 2342 2744  899  164
```

```r
NH.means %>% select(count) %>% pull() * 
  (NH.means %>% select(IncMean) %>% pull() - GM)^2
```

```
## [1] 1.21e+11 1.43e+11 6.54e+09 1.52e+11 6.68e+10
```

---

## Creating a test statistic

`$$SumSqBtwn = \sum_i n_i(\overline{X}_{i\cdot} - \overline{X})^2$$`

```r
sum(NH.means %>% select(count) %>% pull() * 
      (NH.means %>% select(IncMean) %>% pull() - GM)^2)
```

```
## [1] 4.89e+11
```

---

##  Permuting the data

```r
NHANES %>% 
  filter(!is.na(HealthGen)& !is.na(HHIncomeMid)) %>%
  mutate(IncomePerm = sample(HHIncomeMid, replace=FALSE)) %>%
  select(HealthGen, HHIncomeMid, IncomePerm) 
```

```
## # A tibble: 6,966 × 3
## HealthGen HHIncomeMid IncomePerm
## <fct> <int> <int>
## 1 Good 30000 60000
## 2 Good 30000 60000
## 3 Good 30000 87500
## 4 Good 40000 100000
## 5 Vgood 87500 50000
## 6 Vgood 87500 40000
## 7 Vgood 87500 87500
## 8 Vgood 30000 12500
## 9 Vgood 100000 40000
## 10 Fair 70000 87500
## # … with 6,956 more rows
```

---

##  Permuting the data & a new test statistic

```r
NHANES %>% 
  filter(!is.na(HealthGen)& !is.na(HHIncomeMid)) %>%
  mutate(IncomePerm = sample(HHIncomeMid, replace=FALSE)) %>%
  group_by(HealthGen) %>% 
  summarize(IncMeanP = mean(IncomePerm), count=n()) %>%
  summarize(teststat = sum(count*(IncMeanP - GM)^2))
```

```
## # A tibble: 1 × 1
## teststat
## <dbl>
## 1 18325300437.
```

---

## Lots of times...

```r
reps <- 1000

SSB_perm_func <- function(.x){
 NHANES %>% 
 filter(!is.na(HealthGen)& !is.na(HHIncomeMid)) %>%
 mutate(IncomePerm = sample(HHIncomeMid, replace=FALSE)) %>%
 group_by(HealthGen) %>% 
 summarize(IncMeanP = mean(IncomePerm), count=n()) %>%
 summarize(teststat = sum(count*(IncMeanP - GM)^2)) 
}

SSB_perm_val <- map_dfr(1:reps, SSB_perm_func)

SSB_perm_val
```

```
## # A tibble: 1,000 × 1
## teststat
## <dbl>
## 1 15558101722.
## 2 16043464541.
## 3 13479105192.
## 4 12751834446.
## 5 12969481403.
## 6 12574155002.
## 7 17025427397.
## 8 15218082491.
## 9 17741088638.
## 10 14550341129.
## # … with 990 more rows
```

---

## Compared to the real data

```r
SSB_obs <- NHANES %>%
 filter(!is.na(HealthGen) & !is.na(HHIncomeMid)) %>% 
 group_by(HealthGen) %>% 
 summarize(IncMean = mean(HHIncomeMid), count=n()) %>%
 summarize(obs_teststat = sum(count*(IncMean - GM)^2))

SSB_obs 
```

```
## # A tibble: 1 × 1
## obs_teststat
## <dbl>
## 1 488767088754.
```

```r
sum(SSB_perm_val %>% pull() > SSB_obs %>% pull() ) / reps
```

```
## [1] 0
```

---

## Compared to the observed test statistic

---

# Agenda  9/30/21

1. Conditions, exchangeability, random structure
2. Different statistics within the permutation test
3. Power
4. Permutation vs. Randomization tests (Binomial)

---

## Exchangeability

> *If the null hypothesis is true, the labels assigning groups are interchangeable with respect to the probability distribution.*

typically,

`$$H_0: F_1(x) = F_2(x)$$`

(there are no distributional or parametric conditions)

---

## Probability as measured by what?

* **Random Sample** The concept of a p-value usually comes from the idea of taking a sample from a population and comparing it to a sampling distribution (from many many random samples).

* **Randomized Experiment** The p-value represents the observed data compared to the treatment variable being allocated to the groups "by chance."

---

## Gender bias in teaching evaluations

*The Economist*, Sep 21, 2017

---

## Gender bias in teaching evaluations

---

## Gender bias in teaching evaluations

<img src="../images/genderbias3a.png" width="100%" style="display: block; margin: auto;" />
*Innovative Higher Education*, **40**, pages 291–303 (2015).
---

## Gender bias in teaching evaluations

---

## Gender bias: MacNell data

---

##  Analysis goal

Want to know if the population average score for the **perceived** gender is different.

`$$H_0:  \mu_{ID.Female} = \mu_{ID.Male}$$`

> Although for the permutation test, under the null hypothesis not only are the means of the population distributions the same, but the variance and all other aspects of the distributions across perceived gender.

---

## MacNell Data without permutation

```r
macnell %>%
  select(overall, tagender, taidgender) 
```

```
##    overall tagender taidgender
## 1        4        0          1
## 2        4        0          1
## 3        5        0          1
## 4        5        0          1
## 5        5        0          1
## 6        4        0          1
## 7        4        0          1
## 8        5        0          1
## 9        4        0          1
## 10       3        0          1
## 11       5        0          1
## 12       4        0          1
## 13       5        1          1
## 14       5        1          1
## 15       4        1          1
## 16       4        1          1
## 17       4        1          1
## 18       3        1          1
## 19       4        1          1
## 20       4        1          1
## 21       3        1          1
## 22       4        1          1
## 23       4        1          1
## 24       1        0          0
## 25       1        0          0
## 26       5        0          0
## 27       4        0          0
## 28       4        0          0
## 29       5        0          0
## 30       4        0          0
## 31       5        0          0
## 32     NaN        0          0
## 33     NaN        0          0
## 34     NaN        0          0
## 35       4        1          0
## 36       5        1          0
## 37       5        1          0
## 38       4        1          0
## 39       4        1          0
## 40       4        1          0
## 41       3        1          0
## 42       4        1          0
## 43       1        1          0
## 44       4        1          0
## 45       4        1          0
## 46       3        1          0
## 47     NaN        1          0
```

---

## Permuting MacNell data

Conceptually, there are two levels of randomization:

1. `$N_m$` students are randomly assigned to the male instructor and `$N_f$` are assigned to the female instructor.

2.  Of the `$N_j$` assigned to instructor `$j$`, `$N_{jm}$` are told that the instructor is male, and `$N_{jf}$` are told that the instructor is female for `$j=m,f$`.

```r
macnell %>%
  group_by(tagender, taidgender) %>%
  summarize(n())
```

```
## # A tibble: 4 × 3
## # Groups: tagender [2]
## tagender taidgender `n()`
## <int> <int> <int>
## 1 0 0 11
## 2 0 1 12
## 3 1 0 13
## 4 1 1 11
```
---

**Stratified two-sample test:**

* **For each instructor**, permute *perceived* gender assignments.
* Use difference in mean ratings for female-identified vs male-identified instructors.

---

## MacNell Data with permutation

```

## # A tibble: 47 × 4
## # Groups: tagender [2]
## overall tagender taidgender permTAID
## <dbl> <int> <int> <int>
## 1 4 0 1 0
## 2 4 0 1 1
## 3 5 0 1 1
## 4 5 0 1 0
## 5 5 0 1 1
## 6 4 0 1 1
## 7 4 0 1 0
## 8 5 0 1 0
## 9 4 0 1 1
## 10 3 0 1 1
## # … with 37 more rows

```

---

## MacNell Data with permutation

<code class ='r hljs remark-code'>macnell %>% &nbsp;&nbsp;group_by(tagender) %>% &nbsp;&nbsp;mutate(permTAID = sample(taidgender, replace=FALSE)) %>% &nbsp;&nbsp;ungroup(tagender) %>% &nbsp;&nbsp;group_by(permTAID) %>% &nbsp;&nbsp;summarize(pmeans = mean(overall, na.rm=TRUE)) %>% &nbsp;&nbsp;summarize(diff(pmeans))</code>

```

## # A tibble: 1 × 1
## `diff(pmeans)`
## <dbl>
## 1 -0.294

```

---

## MacNell Data with permutation

```r
diff_means_func <- function(.x){
 macnell %>% group_by(tagender) %>%
 mutate(permTAID = sample(taidgender, replace=FALSE)) %>%
 ungroup(tagender) %>%
 group_by(permTAID) %>%
 summarize(pmeans = mean(overall, na.rm=TRUE)) %>%
 summarize(diff_mean = diff(pmeans))
 }

map_df(1:5, diff_means_func)
```

```
## # A tibble: 5 × 1
## diff_mean
## <dbl>
## 1 0.277 
## 2 -0.468 
## 3 0.180 
## 4 -0.00216
## 5 0.184
```

---

## Observed vs. Permuted statistic

```r
# observed
macnell %>% 
  group_by(taidgender) %>%
  summarize(pmeans = mean(overall, na.rm=TRUE)) %>%
  summarize(diff_mean = diff(pmeans))
```

```
## # A tibble: 1 × 1
## diff_mean
## <dbl>
## 1 0.474
```

```r
# permuted
set.seed(47)
reps = 1000
perm_diff_means <- map_df(1:reps, diff_means_func)
```

---

## MacNell Data with permutation

permutation sampling distribution:

.pull-left[
![](2021-09-28-permutations_files/figure-html/unnamed-chunk-27-1.png)
]

.pull-right[

```r
# permutation p-value
perm_diff_means %>%
  summarize(p_val = 
      sum(diff_mean > 0.474) / 
      reps)
```

```
## # A tibble: 1 × 1
## p_val
## <dbl>
## 1 0.048
```
]
---

## MacNell results