Clicker Questions

to go along with

Modern Data Science with R, 3rd edition by Baumer, Kaplan, and Horton

Introduction to Statistical Learning with Applications in R by James, Witten, Hastie, and Tibshirani


  1. The reason to take random samples is:1
  1. to make cause and effect conclusions
  2. to get as many variables as possible
  3. it’s easier to collect a large dataset
  4. so that the data are a good representation of the population
  5. I have no idea why one would take a random sample

  1. The reason to allocate/assign explanatory variables is:2
  1. to make cause and effect conclusions
  2. to get as many variables as possible
  3. it’s easier to collect a large dataset
  4. so that the data are a good representation of the population
  5. I have no idea what you mean by “allocate/assign” (or “explanatory variable” for that matter)

  1. Approximately how big is a tweet?3
    1. 0.01Kb
    2. 0.1Kb
    3. 1Kb
    4. 100Kb
    5. 1000Kb = 1Mb

  1. \(R^2\) measures:4
  1. the proportion of variability in vote margin as explained by tweet share.
  2. the proportion of variability in tweet share as explained by vote margin.
  3. how appropriate the linear part of the linear model is.
  4. whether or not particular variables should be included in the model.

  1. R / R Studio / Quarto5
    1. all good
    2. started, progress is slow and steady
    3. started, very stuck
    4. haven’t started yet
    5. what do you mean by “R”?

  1. Git / GitHub6
    1. all good
    2. started, progress is slow and steady
    3. started, very stuck
    4. haven’t started yet
    5. what do you mean by “Git”?

  1. Which of the following includes talking to the remote version of GitHub?7
    1. changing your name (updating the YAML)
    2. committing the file(s)
    3. pushing the file(s)
    4. some of the above
    5. all of the above

  1. What is the error?8
    1. poor assignment operator
    2. unmatched quotes
    3. improper syntax for function argument
    4. invalid object name
    5. no mistake
shup2 <-- "Hello to you!"

  1. What is the error?9
    1. poor assignment operator
    2. unmatched quotes
    3. improper syntax for function argument
    4. invalid object name
    5. no mistake
3shup <-  "Hello to you!"

  1. What is the error?10
    1. poor assignment operator
    2. unmatched quotes
    3. improper syntax for function argument
    4. invalid object name
    5. no mistake
shup4 <-  "Hello to you!

  1. What is the error?11
    1. poor assignment operator
    2. unmatched quotes
    3. improper syntax for function argument
    4. invalid object name
    5. no mistake
shup5 <-  date()

  1. What is the error?12
    1. poor assignment operator
    2. unmatched quotes
    3. improper syntax for function argument
    4. invalid object name
    5. no mistake
shup6 <-  sqrt 10

  1. Do you keep a calendar / schedule / planner?13
    1. Yes
    2. No

  1. Do you keep a calendar / schedule / planner? If you answered “Yes” …14
    1. Yes, on Google Calendar
    2. Yes, on Calendar for macOS
    3. Yes, on Outlook for Windows
    4. Yes, in some other app
    5. Yes, by hand

  1. Where should I put things I’ve created for the HW (e.g., data, .ics file, etc.)15
    1. Upload into remote GitHub directory
    2. In the local folder which also has the R project
    3. In my Downloads
    4. Somewhere on my Desktop
    5. In my Home directory

  1. The goal of making a figure is…16
    1. To draw attention to your work.
    2. To facilitate comparisons.
    3. To provide as much information as possible.

  1. A good reason to make a particular choice of a graph is:17
    1. Because the journal / field has particular expectations for how the data are presented.
    2. Because some variables naturally fit better on some graphs (e.g., numbers on scatter plots).
    3. Because that graphic displays the message you want as optimally as possible.

  1. Why are the points orange?18
    1. R translates “navy” into orange.
    2. color must be specified in geom_point()
    3. color must be specified outside the aes() function
    4. the default plot color is orange

ggplot(data = Births78, 
       aes(x = date, y = births, color = "navy")) + 
  geom_point() +          
  labs(title = "US Births in 1978")

  1. Why are the dots blue and the lines colored?19
    1. dot color is given as “navy”, line color is given as wday.
    2. both colors are specified in the ggplot() function.
    3. dot coloring takes precedence over line coloring.
    4. line coloring takes precedence over dot coloring.


  1. Setting vs. Mapping. If I want information to be passed to all data points (not variable):20
    1. map the information inside the aes() function.
    2. set the information outside the aes() function

  1. The Snow figure was most successful at:21
    1. making the data stand out
    2. facilitating comparison
    3. putting the work in context
    4. simplifying the story

  1. The Challenger figure(s) was(were) least successful at:22
    1. making the data stand out
    2. facilitating comparison
    3. putting the work in context
    4. simplifying the story

  1. The biggest difference between Snow and the Challenger was:23
    1. The amount of information portrayed.
    2. One was better at displaying cause.
    3. One showed the relevant comparison better.
    4. One was more artistic.

  1. Caffeine and Calories. What was the biggest concern over the average value axes?24
    1. It isn’t at the origin.
    2. They should have used all the data possible to find averages.
    3. There wasn’t a random sample.
    4. There wasn’t a label explaining why the axes were where they were.

  1. What is wrong with the following code?25
    1. should only be one =
    2. Sydney should be lower case
    3. name should not be in quotes
    4. use mutate instead of filter
    5. babynames in wrong place
Result <- |> filter(babynames,
        name== "Sydney")

  1. Which data represents the ideal format for ggplot2 and dplyr?26
table a
year Algeria Brazil Columbia
2000 7 12 16
2001 9 14 18
table b
country Y2000 Y2001
Algeria 7 9
Brazil 12 14
Columbia 16 18
table c
country year value
Algeria 2000 7
Algeria 2001 9
Brazil 2000 12
Brazil 2001 14
Columbia 2000 16
Columbia 2001 18

  1. Each of the statements except one will accomplish the same calculation. Which one does not match?27
#(a) 
babynames |> 
  group_by(year, sex) |> 
  summarize(totalBirths = sum(num))

#(b) 
group_by(babynames, year, sex) |> 
  summarize(totalBirths = sum(num))

#(c)
group_by(babynames, year, sex) |> 
  summarize(totalBirths = mean(num))

#(d)
temp <- group_by(babynames, year, sex)

summarize(temp, totalBirths = sum(num))

#(e)
summarize(group_by(babynames, year, sex), 
          totalBirths = sum(num))

  1. Fill in Q1.28
    1. filter()
    2. arrange()
    3. select()
    4. mutate()
    5. group_by()
result <- babynames |>
  Q1(name %in% c("Jane", "Mary")) |> 
  # just the Janes and Marys
  group_by(Q2, Q2) |> 
  summarize(total = Q3)

  1. Fill in Q2.29
    1. (year, sex)
    2. (year, name)
    3. (year, num)
    4. (sex, name)
    5. (sex, num)
result <- babynames |>
  Q1(name %in% c("Jane", "Mary")) |> 
  group_by(Q2, Q2) |> 
  # for each year for each name
  summarize(total = Q3)

  1. Fill in Q3.30
    1. n_distinct(name)
    2. n_distinct(n)
    3. sum(name)
    4. sum(num)
    5. mean(num)
result <- babynames |>
  Q1(name %in% c("Jane", "Mary")) |> 
  group_by(Q2, Q2) |> 
  summarize(total = Q3)
  # number of babies (each year, each name)

  1. Running the code.31
babynames <- babynames::babynames |> 
  rename(num = n)

babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  # just the Janes and Marys
  group_by(name, year) |> 
  # for each year for each name
  summarize(total = sum(num))
# A tibble: 276 × 3
# Groups:   name [2]
   name   year total
   <chr> <dbl> <int>
 1 Jane   1880   215
 2 Jane   1881   216
 3 Jane   1882   254
 4 Jane   1883   247
 5 Jane   1884   295
 6 Jane   1885   330
 7 Jane   1886   306
 8 Jane   1887   288
 9 Jane   1888   446
10 Jane   1889   374
# ℹ 266 more rows
babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(number = sum(num))
# A tibble: 276 × 3
# Groups:   name [2]
   name   year number
   <chr> <dbl>  <int>
 1 Jane   1880    215
 2 Jane   1881    216
 3 Jane   1882    254
 4 Jane   1883    247
 5 Jane   1884    295
 6 Jane   1885    330
 7 Jane   1886    306
 8 Jane   1887    288
 9 Jane   1888    446
10 Jane   1889    374
# ℹ 266 more rows
babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(n_distinct(name))
# A tibble: 276 × 3
# Groups:   name [2]
   name   year `n_distinct(name)`
   <chr> <dbl>              <int>
 1 Jane   1880                  1
 2 Jane   1881                  1
 3 Jane   1882                  1
 4 Jane   1883                  1
 5 Jane   1884                  1
 6 Jane   1885                  1
 7 Jane   1886                  1
 8 Jane   1887                  1
 9 Jane   1888                  1
10 Jane   1889                  1
# ℹ 266 more rows
babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(n_distinct(num))
# A tibble: 276 × 3
# Groups:   name [2]
   name   year `n_distinct(num)`
   <chr> <dbl>             <int>
 1 Jane   1880                 1
 2 Jane   1881                 1
 3 Jane   1882                 1
 4 Jane   1883                 1
 5 Jane   1884                 1
 6 Jane   1885                 1
 7 Jane   1886                 1
 8 Jane   1887                 1
 9 Jane   1888                 1
10 Jane   1889                 1
# ℹ 266 more rows
babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(sum(name))
Error in `summarize()`:
ℹ In argument: `sum(name)`.
ℹ In group 1: `name = "Jane"` and `year = 1880`.
Caused by error in `base::sum()`:
! invalid 'type' (character) of argument
babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(mean(num))
# A tibble: 276 × 3
# Groups:   name [2]
   name   year `mean(num)`
   <chr> <dbl>       <dbl>
 1 Jane   1880         215
 2 Jane   1881         216
 3 Jane   1882         254
 4 Jane   1883         247
 5 Jane   1884         295
 6 Jane   1885         330
 7 Jane   1886         306
 8 Jane   1887         288
 9 Jane   1888         446
10 Jane   1889         374
# ℹ 266 more rows
babynames |>
  filter(name %in% c("Jane", "Mary")) |> 
  group_by(name, year) |> 
  summarize(median(num))
# A tibble: 276 × 3
# Groups:   name [2]
   name   year `median(num)`
   <chr> <dbl>         <dbl>
 1 Jane   1880           215
 2 Jane   1881           216
 3 Jane   1882           254
 4 Jane   1883           247
 5 Jane   1884           295
 6 Jane   1885           330
 7 Jane   1886           306
 8 Jane   1887           288
 9 Jane   1888           446
10 Jane   1889           374
# ℹ 266 more rows

  1. Fill in Q1.32
    1. gdp
    2. year
    3. gdpval
    4. country
    5. –country
GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

  1. Fill in Q2.33
    1. gdp
    2. year
    3. gdpval
    4. country
    5. –country
GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

  1. Fill in Q3.34
    1. gdp
    2. year
    3. gdpval
    4. country
    5. –country
GDP |>  
  select(country = starts_with("Income"), everything()) |> 
       pivot_longer(cols = Q1, 
                    names_to = Q2, 
                    values_to = Q3)

  1. Response to stimulus (in ms) after only 3 hrs of sleep for 9 days. You want to make a plot with the subject’s reaction time (y-axis) vs the number of days of sleep restriction (x-axis) using the following ggplot() code. Which data frame should you use?35
    1. use raw data
    2. use pivot_wider() on raw data
    3. use pivot_longer() on raw data
ggplot(___, aes(x = ___, y = ___, color = ___)) + 
  geom_line()
# A tibble: 18 × 11
   Subject day_0 day_1 day_2 day_3 day_4 day_5 day_6 day_7 day_8 day_9
     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     308  250.  259.  251.  321.  357.  415.  382.  290.  431.  466.
 2     309  223.  205.  203.  205.  208.  216.  214.  218.  224.  237.
 3     310  199.  194.  234.  233.  229.  220.  235.  256.  261.  248.
 4     330  322.  300.  284.  285.  286.  298.  280.  318.  305.  354.
 5     331  288.  285   302.  320.  316.  293.  290.  335.  294.  372.
 6     332  235.  243.  273.  310.  317.  310   454.  347.  330.  254.
 7     333  284.  290.  277.  300.  297.  338.  332.  349.  333.  362.
 8     334  265.  276.  243.  255.  279.  284.  306.  332.  336.  377.
 9     335  242.  274.  254.  271.  251.  255.  245.  235.  236.  237.
10     337  312.  314.  292.  346.  366.  392.  404.  417.  456.  459.
11     349  236.  230.  239.  255.  251.  270.  282.  308.  336.  352.
12     350  256.  243.  256.  256.  269.  330.  379.  363.  394.  389.
13     351  251.  300.  270.  281.  272.  305.  288.  267.  322.  348.
14     352  222.  298.  327.  347.  349.  353.  354.  360.  376.  389.
15     369  272.  268.  257.  278.  315.  317.  298.  348.  340.  367.
16     370  225.  235.  239.  240.  268.  344.  281.  348.  365.  372.
17     371  270.  272.  278.  282.  279.  285.  259.  305.  351.  369.
18     372  269.  273.  298.  311.  287.  330.  334.  343.  369.  364.

sleep_long <- sleep_wide |>
  pivot_longer(cols = -Subject,
               names_to = "day",
               names_prefix = "day_",
               values_to = "reaction_time")

sleep_long
# A tibble: 180 × 3
   Subject day   reaction_time
     <dbl> <chr>         <dbl>
 1     308 0              250.
 2     308 1              259.
 3     308 2              251.
 4     308 3              321.
 5     308 4              357.
 6     308 5              415.
 7     308 6              382.
 8     308 7              290.
 9     308 8              431.
10     308 9              466.
# ℹ 170 more rows

  1. Consider band members from the Beatles and the Rolling Stones. Who is removed in a right_join()?36
  1. Mick
  2. John
  3. Paul
  4. Keith
  5. Impossible to know
band_members |> 
  right_join(band_instruments, by = "name")
band_members
# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles
band_instruments
# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

  1. Consider band members from the Beatles and the Rolling Stones. Which variables are removed in a right_join()?37
  1. name
  2. band
  3. plays
  4. none of them
band_members |> 
  right_join(band_instruments, by = "name")
band_members
# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles
band_instruments
# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

  1. What happens to Mick’s plays variable in a full_join()?38
  1. Mick is removed
  2. changes to guitar
  3. changes to bass
  4. NA
  5. NULL
band_members |> 
  full_join(band_instruments, by = "name")
band_members
# A tibble: 3 × 2
  name  band   
  <chr> <chr>  
1 Mick  Stones 
2 John  Beatles
3 Paul  Beatles
band_instruments
# A tibble: 3 × 2
  name  plays 
  <chr> <chr> 
1 John  guitar
2 Paul  bass  
3 Keith guitar

  1. Consider the addTen() function. The following output is a result of which map_*() call?39
  1. map(c(1,4,7), addTen)
  2. map_dbl(c(1,4,7), addTen)
  3. map_chr(c(1,4,7), addTen)
  4. map_lgl(c(1,4,7), addTen)
addTen <- function(wow) {
  return(wow + 10)
}
[1] "11.000000" "14.000000" "17.000000"

  1. Which of the following input is allowed?40
    1. map(c(1, 4, 7), addTen)
    2. map(list(1, 4, 7), addTen)
    3. map(data.frame(a=1, b=4, c=7), addTen)
    4. some of the above
    5. all of the above

  1. Which of the following produces a different output?41
    1. map(c(1, 4, 7), addTen)
    2. map(c(1, 4, 7), ~addTen(.x))
    3. map(c(1, 4, 7), ~addTen)
    4. map(c(1, 4, 7), function(hi) (hi + 10))
    5. map(c(1, 4, 7), ~(.x + 10))

  1. What will the following code output?42
    1. 3 random normals
    2. 6 random normals
    3. 18 random normals
input
# A tibble: 3 × 3
      n  mean    sd
  <dbl> <dbl> <dbl>
1     1     1     3
2     2     3     1
3     3    47    10
input |> 
  pmap(rnorm)

  1. In R the ifelse() function takes the arguments:43
  1. question, yes, no
  2. question, no, yes
  3. statement, yes, no
  4. statement, no, yes
  5. option1, option2, option3

  1. What is the output of the following:44
    1. “cat”, 30, “cat”, “cat”, 6
    2. “cat”, “30”, “cat”, “cat”, “6”
    3. 1, “cat”, 5, “cat”, “cat”
    4. 1, “cat”, 5, NA, “cat”
    5. “1”, “cat”, “5”, NA, “cat”
data <- c(1, 30, 5, NA, 6)

ifelse(data > 5, "cat", data)

  1. In R, the set.seed() function45
  1. makes your computations go faster
  2. keeps track of your computation time
  3. provides an important parameter
  4. repeats the function
  5. makes your results reproducible

  1. If I run a hypothesis test with a type I error cut off of \(\alpha = 0.05\) and the null hypothesis is true, what is the probability of rejecting \(H_0\)?46
  1. 0.01
  2. 0.05
  3. 0.1
  4. I don’t know.
  5. No one knows.

  1. If I run a hypothesis test with a type I error cut off of \(\alpha = 0.05\) and the null hypothesis is true, and also the technical conditions do not hold what is the probability of rejecting \(H_0\)?47
  1. 0.01
  2. 0.05
  3. 0.1
  4. I don’t know.
  5. No one knows.

  1. If I run a hypothesis test with a type I error cut off of \(\alpha = 0.05\) and the null hypothesis is false, what is the probability of rejecting \(H_0\)?48
  1. 0.01
  2. 0.05
  3. 0.1
  4. I don’t know.
  5. No one knows.

  1. If I aim to create a 95% confidence interval, and the technical conditions hold, what is the probability that the CI will contain the true value of the parameter?49
  1. 0.90
  2. 0.95
  3. 0.99
  4. I don’t know.
  5. No one knows.

  1. If I aim to create a 95% confidence interval, and the technical conditions do not hold, what is the probability that the CI will contain the true value of the parameter?50
  1. 0.90
  2. 0.95
  3. 0.99
  4. I don’t know.
  5. No one knows.

  1. We typically compare means (across two groups) instead of medians because:51
  1. we don’t know the SE of the difference of medians
  2. means are inherently more interesting than medians
  3. permutation tests don’t work with medians
  4. the Central Limit Theorem doesn’t apply for medians.

  1. What are the technical assumptions for a t-test?52
  1. none
  2. normal data
  3. \(n \geq 30\)
  4. random sampling / random allocation for appropriate conclusions

  1. What are the technical conditions for permutation tests?53
  1. none
  2. normal data
  3. \(n \geq 30\)
  4. random sampling / random allocation for appropriate conclusions

  1. Follow up to permutation test: the assumptions change based on whether the statistic used is the mean, median, proportion, etc.54
  1. TRUE
  2. FALSE

  1. Why care about the distribution of the test statistic?55
  1. Better estimator
  2. So we can find rejection region
  3. So we can control power
  4. Because we love the CLT

  1. Given statistic T = r(X), how do we find a (sensible) test?56
  1. Maximize power
  2. Minimize type I error
  3. Control type I error
  4. Minimize type II error
  5. Control type II error

  1. Type I error is57
  1. We give him a raise when he deserves it.
  2. We don’t give him a raise when he deserves it.
  3. We give him a raise when he doesn’t deserve it.
  4. We don’t give him a raise when he doesn’t deserve it.

  1. Type II error is58
  1. We give him a raise when he deserves it.
  2. We don’t give him a raise when he deserves it.
  3. We give him a raise when he doesn’t deserve it.
  4. We don’t give him a raise when he doesn’t deserve it.

  1. Power is the probability that:59
  1. We give him a raise when he deserves it.
  2. We don’t give him a raise when he deserves it.
  3. We give him a raise when he doesn’t deserve it.
  4. We don’t give him a raise when he doesn’t deserve it.

  1. Why don’t we always reject \(H_0\)?60
    1. type I error too high
    2. type II error too high
    3. level of sig too high
    4. power too high

  1. The player is more worried about61
    1. A type I error
    2. A type II error

  1. The coach is more worried about62
    1. A type I error
    2. A type II error

  1. Increasing your sample size63
    1. Increases your power
    2. Decreases your power

  1. Making your significance level more stringent (\(\alpha\) smaller)64
  1. Increases your power
  2. Decreases your power

  1. A more extreme alternative65
    1. Increases your power
    2. Decreases your power

  1. What is the primary reason to use a permutation test (instead of a test built on calculus)?66
  1. more power
  2. lower type I error
  3. more resistant to outliers
  4. can be done on statistics with unknown sampling distributions

  1. What is the primary reason to bootstrap a CI (instead of creating a CI from calculus)?67
  1. larger coverage probabilities
  2. narrower intervals
  3. more resistant to outliers
  4. can be done on statistics with unknown sampling distributions

  1. Which of the following could not possibly be a bootstrap sample from the vector: c(4, 10, 8, 1, 2, 4)68
    1. c(4, 4, 4, 4, 4, 4)
    2. c(4, 10, 8, 1, 2, 4)
    3. c(1, 2, 2, 4, 4, 2)
    4. c(10, 8, 1, 1, 8, 10)
    5. c(1, 2, 4, 3, 4, 10)

  1. You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples. What is the sample size of each bootstrap sample?69
  1. 50
  2. 1000

  1. You have a sample of size n = 50. You sample with replacement 1000 times to get 1000 bootstrap samples. How many bootstrap statistics will you have?70
  1. 50
  2. 1000

  1. The bootstrap distribution is centered around the71
  1. population parameter
  2. sample statistic
  3. bootstrap statistic
  4. bootstrap parameter

(NEED TO ADD A PLOT FROM STATKEY HERE)

  1. 95% CI for the difference in proportions:72
  1. (0.39, 0.43)
  2. (0.37, 0.45)
  3. (0.77, 0.81)
  4. (0.75, 0.85)

  1. Suppose a 95% bootstrap CI for the difference in means was (3,9), would you reject H0?73 (uh… What is the null hypothesis here???)
  1. yes
  2. no
  3. not enough information to know

  1. Given the situation where \(H_a\) is TRUE. Consider 100 CIs (for true difference in means, where each of the 100 CIs is created using a different dataset). The power of the test can be approximated by:74
  1. The proportion that contain the true difference in means.
  2. The proportion that do not contain the true difference in means.
  3. The proportion that contain zero.
  4. The proportion that do not contain zero.

:::

Footnotes

    1. so that the data are a good representation of the population
    ↩︎
    1. to make cause and effect conclusions
    ↩︎
    1. about 0.1Kb. Turns out that 3.5 billion tweets * 0.1Kb = 350Gb (0.35 Tb). My laptop is pretty good, and it has 36 Gb of memory (RAM) and 4 Tb of storage. It would not be able to work with 3.5 billion tweets.
    ↩︎
    1. the proportion of variability in vote margin as explained by tweet share.
    ↩︎
  1. wherever you are, make sure you are communicating with me when you have questions!↩︎

  2. wherever you are, make sure you are communicating with me when you have questions!↩︎

    1. pushing the file(s)
    ↩︎
    1. poor assignment operator
    ↩︎
    1. invalid object name
    ↩︎
    1. unmatched quotes
    ↩︎
    1. no mistake
    ↩︎
    1. improper syntax for a function argument
    ↩︎
    1. I mean, the right answer has to be Yes, right!??!
    ↩︎
  3. no right answer here!↩︎

    1. In the local folder which also has the R project. It could be on the Desktop or the Home directory, but it must be in the same place as the R project. Do not upload files to the remote GitHub directory or you will find yourself with two different copies of the files.
    ↩︎
  4. Yes! All the responses are reasons to make a figure.↩︎

    1. Because that graphic displays the message you want as optimally as possible.
    ↩︎
    1. color must be specified outside the aes() function
    ↩︎
    1. dot color is specified as “navy”, line color is specified as wday.
    ↩︎
    1. set the information outside the aes() function
    ↩︎
  5. answers may vary. I’d say c. putting the work in context. Others might say b. facilitating comparison or d. simplifying the story. However, I don’t think a correct answer is a. making the data stand out.↩︎

    1. making the data stand out
    ↩︎
    1. One showed the relevant comparison better.
    ↩︎
    1. It isn’t at the origin. in combination with d. There wasn’t a label explaining why the axes were where they were. The story associated with the average value axes is not clear to the reader.
    ↩︎
    1. babynames in wrong place
    ↩︎
    1. Table c is best because the columns allow us to work with each of the variable separately.
    ↩︎
    1. does something different because it takes the mean() (average) instead of the sum(). The other commands compute the total number of births broken down by year and sex.
    ↩︎
    1. filter()
    ↩︎
    1. (year, name)
    ↩︎
    1. sum(num)
    ↩︎
  6. running the different code chunks with relevant output.↩︎

    1. -country
    ↩︎
    1. year
    ↩︎
    1. gdpval (if possible, good idea to name variables something different from the name of the data frame)
    ↩︎
    1. use pivot_longer() on raw data. The reference to the study is: Gregory Belenky, Nancy J. Wesensten, David R. Thorne, Maria L. Thomas, Helen C. Sing, Daniel P. Redmond, Michael B. Russo and Thomas J. Balkin (2003) Patterns of performance degradation and restoration during sleep restriction and subsequent recovery: a sleep dose-response study. Journal of Sleep Research 12, 1–12.
    ↩︎
    1. Mick
    ↩︎
    1. none of them (the default is to retain all the variables)
    ↩︎
    1. NA (it would be NULL in SQL)
    ↩︎
    1. map_chr(c(1,4,7), addTen) because the output is in quotes, the values are strings, not numbers.
    ↩︎
    1. all of the above. The map() function allows vectors, lists, and data frames as input.
    ↩︎
    1. map(c(1, 4, 7), ~addTen). The ~ acts on functions that do not have their own name or that are defined by function(...). By adding the argument (.x) we’ve expanded the addTen() function, and so it needs a ~. The addTen() function all alone does not use a ~.
    ↩︎
    1. 6 random normals (1 with mean 1, sd 3; 2 with mean 3, sd 1; 3 with mean 47, sd 10)
    ↩︎
    1. question, yes, no
    ↩︎
    1. “1”, “cat”, “5”, NA, “cat” (Note that the numbers were converted to character strings!)
    ↩︎
    1. makes your results reproducible
    ↩︎
    1. 0.05 If the null hypothesis is true and the technical conditions hold, then we should reject the null hypothesis \(\alpha \cdot 100\)% of the time.
    ↩︎
    1. No one knows. It totally depends on how and how much the technical conditions are violated and how resistant the test is to the technical conditions.
    ↩︎
    1. No one knows. It totally depends on the degree to which the null hypothesis is false.
    ↩︎
    1. 0.95 If the technical conditions hold, 95% of all confidence intervals should contain the true parameter.
    ↩︎
    1. No one knows. If the technical conditions do not hold, the CI may or may not contain the true value of the parameter at the given confidence level (i.e., 95%).
    ↩︎
    1. the Central Limit Theorem doesn’t apply for medians.
    ↩︎
  7. we always need d. random sampling / random allocation for appropriate conclusions. The theory is derived from b. normal data. If c. \(n \geq 30\), then the theory holds really well, regardless of whether the data are normal.↩︎

    1. random sampling / random allocation for appropriate conclusions
    ↩︎
    1. FALSE
    ↩︎
    1. So we can find rejection region
    ↩︎
    1. Control type I error
    ↩︎
    1. We give him a raise when he doesn’t deserve it.
    ↩︎
    1. We don’t give him a raise when he deserves it.
    ↩︎
    1. We give him a raise when he deserves it.
    ↩︎
    1. type I error too high
    ↩︎
    1. A type II error
    ↩︎
    1. A type I error
    ↩︎
    1. Increases your power
    ↩︎
    1. Increases your power
    ↩︎
    1. Increases your power
    ↩︎
    1. can be done on statistics with unknown sampling distributions
    ↩︎
    1. can be done on statistics with unknown sampling distributions
    ↩︎
    1. c(1, 2, 4, 3, 4, 10) because there is no 3 in the original dataset.
    ↩︎
    1. 50
    ↩︎
    1. 1000
    ↩︎
    1. sample statistic
    ↩︎
  8. depends on the plot↩︎

    1. yes (because the interval for the true difference in population means does not overlap zero.)
    ↩︎
    1. The proportion that do not contain zero.
    ↩︎

Reuse