11. Unsupervised Learning

A quick dive into unsupervised methods. We cover two clutering methods: hierarchical and partitioning (k-means and k-medoids). Additionally, we discuss latent Dirichlet allocation (LDA).

Author
Published

November 18, 2024

Monsters as cluster centers moving around throughout the k-means algorithm.

Artwork by @allison_horst.

Agenda

November 20, 2024

  1. unsupervised methods

November 25, 2024

  1. distances
  2. hierarchical clustering

December 2, 2024

  1. k-means clustering
  2. k-medoids clustering

December 4, 2024

  1. latent Dirichlet allocation (LDA)

Readings

Reflection questions

  • Why does the plot of within-cluster sum of squares vs. \(k\) make an elbow-shape (hint: think about \(k\) as it ranges from 1 to \(n)?\)

  • How are the centers of the clusters in \(k\)-means calculated? What about in \(k\)-medoids?

  • Will a different initialization of the cluster centers always produce the same cluster output?

  • How do distance metrics change a hierarchical clustering?

  • How can you choose \(k\) with hierarchical clustering?

  • What is the difference between single, average, and complete linkage in hierarchical clustering?

  • What is the difference between agglomerative and divisive hierarchical clustering?

Ethics considerations

  • What type of feature engineering is required for \(k\)-means / hierarchical clustering?

  • How do you (can you?) know if your clustering has uncovered any real patterns in the data?

Slides

Additional Resources