Computational Statistics: 12. Clustering

Monsters as cluster centers moving around throughout the k-means algorithm.

Figure 1: Artwork by @allison_horst.

Agenda

November 16, 2021

unsupervised methods

November 18, 2021

k-means clustering
k-medoid clustering

November 23, 2021

hierarchical clustering

Readings

Class notes: Unsupervised Learning
Gareth, Witten, Hastie, and Tibshirani (2021), Unsupervised Learning (Chapter 12) Introduction to Statistical Learning.

Reflection questions

Why does the plot of within-cluster sum of squares vs. \(k\) make an elbow-shape (hint: think about \(k\) as it ranges from 1 to \(n)?\)
How are the centers of the clusters in \(k\)-means calculated? What about in \(k\)-medoids?
Will a different initialization of the cluster centers always produce the same cluster output?
How do distance metrics change a hierarchical clustering?
How can you choose \(k\) with hierarchical clustering?
What is the difference between single, average, and complete linkage in hierarchical clustering?
What is the difference between agglomerative and divisive hierarchical clustering?

Ethics considerations

What type of feature engineering is required for \(k\)-means / hierarchical clustering?
How do you (can you?) know if your clustering has uncovered any real patterns in the data?

Slides

In class slides - clustering for 11/18/21 and 11/23/21.
WU19 - k-means clustering
WU20 - hierarchical clustering

Additional Resources

Fantastic k-means applet by Naftali Harris.
Analyzing networks of characters in ‘Love Actually’
Network Analysis of political books – Bridging the divide: political books

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".