The importance of reproducibility. Ideas of computational statistics, data science, and machine learning. Some resources for starting with R + RStudio + Git + GitHub.
Before next Thursday, listen to the full conversation of Not So Standard Deviations - Compromised Shoe Situation.
Before next Tuesday, read: Tufte. 1997. Visual and Statistical Thinking: Displays of Evidence for Making Decisions. (Use Google to find it.)
What can statistics & data science do? How do they do that?
What can’t statistics & data science do? Why not?
What choices were made to collect the Twitter data?
What choices were made to model the Twitter data?
What are the advantages and disadvantages of high touch versus low touch data?
Why is it problematic if the analysis isn’t reproducible?
Is every analysis worth doing? (e.g., time to get to work, predicting presidential results, etc.). Can the act of doing the analysis be ethically questionable?
In class slides for both 8/31/21 and 9/2/21.
Design Challenge (Not So Standard Deviations), listen to the full conversation.
Video (less than 2 min) on the strengths of reproducible research
R vs. Python? (My personal opinion is that neither of the languages is “best”.)
2017 Kaggle user survey and 2019 Stack Overflow Developer Survey
PNAS paper retracted due to problems with figure and reproducibility (April 2016)
Analysis of Trump’s tweets with evidence that someone else tweets from his account using an iPhone part 1 and part 2
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".