1. Start with R + Git

The importance of reproducibility. Ideas of computational statistics, data science, and machine learning. Some resources for starting with R + RStudio + Git + GitHub.

Jo Hardin https://m154-comp-stats.netlify.app/
08-31-2021

Judge monster confirming that the RStudio monster has reproducible work.

Figure 1: Artwork by @allison_horst.

Agenda

August 31, 2021

Questionnaire
Syllabus & Course Outline
Stitch Fix Algorithm
College Rankings
Can Twitter predict election results?

Before next Thursday, listen to the full conversation of Not So Standard Deviations - Compromised Shoe Situation.

September 2, 2021

Reproducibility & GitHub
Design Challenge (Not So Standard Deviations)

Before next Tuesday, read: Tufte. 1997. Visual and Statistical Thinking: Displays of Evidence for Making Decisions. (Use Google to find it.)

Readings

The syllabus
Modern Data Science with R Prologue
Class notes: Introduction
Why Git? + monsters

Reflection questions

What can statistics & data science do? How do they do that?
What can’t statistics & data science do? Why not?
What choices were made to collect the Twitter data?
What choices were made to model the Twitter data?
What are the advantages and disadvantages of high touch versus low touch data?

Ethics considerations

Why is it problematic if the analysis isn’t reproducible?
Is every analysis worth doing? (e.g., time to get to work, predicting presidential results, etc.). Can the act of doing the analysis be ethically questionable?

Slides

In class slides for both 8/31/21 and 9/2/21.
Twitter activity
WU1 - working with R

Additional Resources

Great algorithm for the whole process
Design Challenge (Not So Standard Deviations), listen to the full conversation.
Video (less than 2 min) on the strengths of reproducible research
R vs. Python? (My personal opinion is that neither of the languages is “best”.)
2017 Kaggle user survey and 2019 Stack Overflow Developer Survey
PNAS paper retracted due to problems with figure and reproducibility (April 2016)
Analysis of Trump’s tweets with evidence that someone else tweets from his account using an iPhone part 1 and part 2

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".