08. Trees

Trees represent a set of methods where prediction is given by majority vote or average outcome based on a partition of the predictor space.

Author

Johanna Hardin

Published

October 28, 2024

Agenda

October 28, 2024

Decision Trees
Example

October 30, 2024

Bagging
Example

Readings

Class notes: decision trees
Gareth, Witten, Hastie, and Tibshirani (2021), k Nearest Neighbors (section 3.5) Introduction to Statistical Learning.
Gareth, Witten, Hastie, and Tibshirani (2021), the basics of decision trees (section 8.1) Introduction to Statistical Learning.
Max Kuhn and Julia Silge (2021), Tidy Modeling with R

Reflection questions

What does CART stand for?
How does CART make predictions on test data?
Can CART be used for both classification and regression or only one of the two tasks?
Can you use categorical / character predictors with CART?
How is tree depth chosen?
What does it mean for CART to be high variance?
What are the advantages of the CART algorithm?
What are the disadvantages of the CART algorithm?

Ethics considerations

What type of feature engineering is required for CART?
If the model produces near perfect predictions on the test data, what are some potential concerns about putting that model into production?

Slides

In class slides - decision trees for 10/28/24.
WS13 - decision trees

Additional Resources

Why the Bronx really burned – “adjusting” data to give the wrong information. FiveThirtyEight, Jody Avirgan, 10/29/2015.

With the help of the Rand Corp., the city tried to measure fire response times, identify redundancies in service, and close or re-allocate fire stations accordingly. What resulted, though, was a perfect storm of bad data: The methodology was flawed, the analysis was rife with biases, and the results were interpreted in a way that stacked the deck against poorer neighborhoods. The slower response times allowed smaller fires to rage uncontrolled in the city’s most vulnerable communities.

SF vs. NYC housing – a great example of a classification tree.
Julia Silge’s blog Tuning Decision Trees

:::

Reuse

CC BY 4.0