Computational Statistics

1. Start with R + Git

The importance of reproducibility. Ideas of computational statistics, data science, and machine learning. Some resources for starting with R + RStudio + Git + GitHub.

2. Data Viz

Examples, good and bad. Theory underlying what makes a viz good and bad. Tools to implement viz tasks.

3. Wrangling

Data wrangling skills are among the most important to hone.

4. Simulating

Simulating scenarios, simulating datasets, simulating random variables.

5. Permutation Tests

Simulating scenarios, simulating datasets, simulating random variables.

6. Bootstrapping

The sample as a proxy for the unknown population. Sample from said proxy population (i.e., the sample) to generate a sampling distribution. Bootstrap.

7. First half review

Resources for review of the first half of the semester.

8. Recipes

And old adage says: garbage in, garbage out. Here we avoid garbage in.

9. k-NN + trees

k-Nearest Neighbors is a classification algorithm based on the premise that points which are close to one another (in some predictor space) are likely to be similar with respect to the outcome variable. trees represent a set of methods where prediction is based on majority vote or average outcome based on a partition of the predictor space.

10. Random Forests

Many trees make a forest. Bagging gives FREE independent model assessment or parameter tuning. Random Forests have a fantastic variance - bias trade-off.

11. Support Vector Machines

Here, support vector machines will be used only to classify objects which can be categorized into one of exactly two classes. As with other classification and regression methods, support vector machines as a method can be used more generally. However, we will work to understand the mathematical derivation of the binary-classification SVM.

12. Clustering

A quick dive into unsupervised methods. We cover two clutering methods: partitioning (k-means and k-medoids) and hierarchical.

13. Second half review

Resources for review of the second half of the semester.

14. Awesome extensions

So many topics, so little time. A short survey on some of the many many topics we could have covered but didn't.

More articles »

Computational Statistics

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".