01. Start with R + Git

The importance of reproducibility. Ideas of computational statistics, data science, and machine learning. Some resources for starting with R + RStudio + Git + GitHub.

Author
Published

August 25, 2025

Judge monster confirming that the RStudio monster has reproducible work.

Artwork by @allison_horst.

Agenda

August 25, 2025

  1. Syllabus & Course Outline
  2. Reproducibility & GitHub

Readings

Reflection questions

  • What can statistics & data science do? How do they do that?

  • What can’t statistics & data science do? Why not?

  • What choices were made to collect the Twitter data?

  • What choices were made to model the Twitter data?

  • What are the advantages and disadvantages of high touch versus low touch data?

Ethics considerations

  • Why is it problematic if the analysis isn’t reproducible?

  • Is every analysis worth doing? (e.g., time to get to work, predicting presidential results, etc.). Can the act of doing the analysis be ethically questionable?

Slides

Additional Resources