11. Unsupervised Learning

A quick dive into unsupervised methods. We cover two clutering methods: hierarchical and partitioning (k-means and k-medoids). Additionally, we discuss latent Dirichlet allocation (LDA).

Author
Published

November 19, 2025

Monsters as cluster centers moving around throughout the k-means algorithm.

Artwork by @allison_horst.

Agenda

November 19, 2025

  1. distances
  2. hierarchical clustering

November 24, 2025

  1. k-means clustering
  2. k-medoids clustering

December 1, 2025

  1. latent Dirichlet allocation (LDA)

Readings

Reflection questions

  • Why does the plot of within-cluster sum of squares vs. k make an elbow-shape (hint: think about k as it ranges from 1 to n)?

  • How are the centers of the clusters in k-means calculated? What about in k-medoids?

  • Will a different initialization of the cluster centers always produce the same cluster output?

  • How do distance metrics change a hierarchical clustering?

  • How can you choose k with hierarchical clustering?

  • What is the difference between single, average, and complete linkage in hierarchical clustering?

  • What is the difference between agglomerative and divisive hierarchical clustering?

Ethics considerations

  • What type of feature engineering is required for k-means / hierarchical clustering?

  • How do you (can you?) know if your clustering has uncovered any real patterns in the data?

Slides

Additional Resources