01. Start with R + Git

The importance of reproducibility. Ideas of computational statistics, data science, and machine learning. Some resources for starting with R + RStudio + Git + GitHub.

Author
Published

August 26, 2024

Artwork by @allison_horst.

Agenda

August 26, 2024

  1. Syllabus & Course Outline
  2. Stitch Fix Algorithm
  3. Can Twitter predict election results?

Before Wednesday, listen to the full conversation of Not So Standard Deviations - Compromised Shoe Situation.

August 28, 2024

  1. Reproducibility & GitHub
  2. Design Challenge (Not So Standard Deviations)

Before next Thursday, read: Tufte. 1997. Visual and Statistical Thinking: Displays of Evidence for Making Decisions. (Use Google to find it.)

Readings

Reflection questions

  • What can statistics & data science do? How do they do that?

  • What can’t statistics & data science do? Why not?

  • What choices were made to collect the Twitter data?

  • What choices were made to model the Twitter data?

  • What are the advantages and disadvantages of high touch versus low touch data?

Ethics considerations

  • Why is it problematic if the analysis isn’t reproducible?

  • Is every analysis worth doing? (e.g., time to get to work, predicting presidential results, etc.). Can the act of doing the analysis be ethically questionable?

Slides

Additional Resources

:::