9. k-NN + trees

k-Nearest Neighbors is a classification algorithm based on the premise that points which are close to one another (in some predictor space) are likely to be similar with respect to the outcome variable. trees represent a set of methods where prediction is based on majority vote or average outcome based on a partition of the predictor space.

Johanna Hardin https://m154-comp-stats.netlify.app/
2021-10-26
Predicting dragon weight from one continuous variable (height) and one categorical variable (whether or not the dragon is spotted).

Figure 1: Artwork by @allison_horst.

Agenda

October 26, 2021

  1. Redux - model process
  2. \(k\)-Nearest Neighbors
  3. Cross Validation
  4. Example

October 28, 2021

  1. Decision Trees
  2. Example

Readings

Reflection questions

Ethics considerations

Slides

Additional Resources

With the help of the Rand Corp., the city tried to measure fire response times, identify redundancies in service, and close or re-allocate fire stations accordingly. What resulted, though, was a perfect storm of bad data: The methodology was flawed, the analysis was rife with biases, and the results were interpreted in a way that stacked the deck against poorer neighborhoods. The slower response times allowed smaller fires to rage uncontrolled in the city’s most vulnerable communities.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".