11. Support Vector Machines

Here, support vector machines will be used only to classify objects which can be categorized into one of exactly two classes. As with other classification and regression methods, support vector machines as a method can be used more generally. However, we will work to understand the mathematical derivation of the binary-classification SVM.

Johanna Hardin https://m154-comp-stats.netlify.app/
2021-11-09

Monsters supporting one another by believing in each other.

Figure 1: Artwork by @allison_horst.

Agenda

November 9, 2021

linearly separable
dot products
support vector formulation

November 11, 2021

not linearly separable (SVM)
kernels (SVM)
support vector formulation

Readings

Class notes: Support Vector Machines
Gareth, Witten, Hastie, and Tibshirani (2021), Support Vector Machines (Chapter 9) Introduction to Statistical Learning.
Max Kuhn and Julia Silge (2021), Tidy Modeling with R

Reflection questions

How is an SVM built (how do we find the model)?
Why is it often advantageous to transform the data into a higher dimensional space?
What is the kernel trick and how is it related to the SVM decision rule?
Can SVMs work on data that are not linearly separable (even in high dimensions)? How?
What are the advantages of the SVM algorithm?
What are the disadvantages of the SVM algorithm?

Ethics considerations

What type of feature engineering is required for Support Vector Machines?
Do Support Vector Machines produce a closed form “model” that can be written down or visualized and handed to a client?
If the model produces near perfect predictions on the test data, what are some potential concerns about putting that model into production?

Slides

In class slides - support vector machines for 11/9/21 and 11/11/21.
WU16 - SVM 1
WU17 - SVM 2
WU18 - SVM 3

Additional Resources

ROC curve of science
Tidymodels SVM vignette
Julia Silge’s blog SVMs to predict if a post office is in Hawaii
Julia Silge’s blog SVMs to predict Netflix shows as TV or movies

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".