09. Random Forests

Many trees make a forest. Bagging gives FREE independent model assessment or parameter tuning. Random Forests have a fantastic variance - bias trade-off.

Author

Johanna Hardin

Published

November 4, 2024

Agenda

November 4, 2024

Redux - CART
bagging process
bagging error rate (OOB error)

November 6, 2024

Random Forests
Example

Readings

Class notes: bagging
Class notes: random forests
Gareth, Witten, Hastie, and Tibshirani (2021), bagging & random forests (section 8.2) Introduction to Statistical Learning.
Max Kuhn and Julia Silge (2021), Tidy Modeling with R

Reflection questions

How does bagging improve on a single tree? How does tuning mtry (with aggregation) improve on a single tree? (That is, what advantage do forests have over single trees.)
How do Random Forests make predictions on test data?
Can Random Forests be used for both classification and regression or only one of the two tasks?
Can you use categorical / character predictors with Random Forests?
How are mtry and the number of trees chosen?
How do the bias and variance change for different values of mtry and number of trees?
What are the advantages of the Random Forests algorithm?
What are the disadvantages of the Random Forest algorithm?

Ethics considerations

What type of feature engineering is required for Random Forests?
Do Random Forests produce a closed form “model” that can be written down or visualized and handed to a client?
If the model produces near perfect predictions on the test data, what are some potential concerns about putting that model into production?

Slides

In class slides - bagging & random forests for 11/4/24 and 11/6/24.
WS14 - Bias-Variance trade-off

Additional Resources

The end of science (???) “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”, Science, June 23, 2008.
Maybe not so fast. “10 things statistics taught us about big data analysis”, Jeff Leek, May 22, 2014.
Julia Silge’s blog <a href = “https://juliasilge.com/blog/sf-trees-random-tuning/” target_“blank”>Tuning Random Forest parameters
Julia Silge’s blog <a href = “https://juliasilge.com/blog/water-sources/” target_“blank”>Predicting water sources with Random Forests

:::

Reuse

CC BY 4.0