11. Unsupervised Learning
A quick dive into unsupervised methods. We cover two clutering methods: hierarchical and partitioning (k-means and k-medoids). Additionally, we discuss latent Dirichlet allocation (LDA).
Agenda
November 20, 2024
- unsupervised methods
November 25, 2024
- distances
- hierarchical clustering
December 2, 2024
- k-means clustering
- k-medoids clustering
December 4, 2024
- latent Dirichlet allocation (LDA)
Readings
Class notes: Unsupervised Methods
Gareth, Witten, Hastie, and Tibshirani (2021), Unsupervised Learning (Chapter 12) Introduction to Statistical Learning.
Reflection questions
Why does the plot of within-cluster sum of squares vs. \(k\) make an elbow-shape (hint: think about \(k\) as it ranges from 1 to \(n)?\)
How are the centers of the clusters in \(k\)-means calculated? What about in \(k\)-medoids?
Will a different initialization of the cluster centers always produce the same cluster output?
How do distance metrics change a hierarchical clustering?
How can you choose \(k\) with hierarchical clustering?
What is the difference between single, average, and complete linkage in hierarchical clustering?
What is the difference between agglomerative and divisive hierarchical clustering?
Ethics considerations
What type of feature engineering is required for \(k\)-means / hierarchical clustering?
How do you (can you?) know if your clustering has uncovered any real patterns in the data?
Slides
In class slides - clustering for 11/25/24 + 12/2/24.
In class slides - latent Dirichlet allocation for 12/4/24.
Additional Resources
Fantastic k-means applet by Naftali Harris.