The importance of reproducibility. Ideas of computational statistics, data science, and machine learning. Some resources for starting with R + RStudio + Git + GitHub.
Examples, good and bad. Theory underlying what makes a viz good and bad. Tools to implement viz tasks.
Data wrangling skills are among the most important to hone.
Simulating scenarios, simulating datasets, simulating random variables.
Simulating scenarios, simulating datasets, simulating random variables.
The sample as a proxy for the unknown population. Sample from said proxy population (i.e., the sample) to generate a sampling distribution. Bootstrap.
Resources for review of the first half of the semester.
And old adage says: garbage in, garbage out. Here we avoid garbage in.
k-Nearest Neighbors is a classification algorithm based on the premise that points which are close to one another (in some predictor space) are likely to be similar with respect to the outcome variable. trees represent a set of methods where prediction is based on majority vote or average outcome based on a partition of the predictor space.
Many trees make a forest. Bagging gives FREE independent model assessment or parameter tuning. Random Forests have a fantastic variance - bias trade-off.
Here, support vector machines will be used only to classify objects which can be categorized into one of exactly two classes. As with other classification and regression methods, support vector machines as a method can be used more generally. However, we will work to understand the mathematical derivation of the binary-classification SVM.
A quick dive into unsupervised methods. We cover two clutering methods: partitioning (k-means and k-medoids) and hierarchical.
Resources for review of the second half of the semester.
So many topics, so little time. A short survey on some of the many many topics we could have covered but didn't.
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".