09. Random Forests

Many trees make a forest. Bagging gives FREE independent model assessment or parameter tuning. Random Forests have a fantastic variance - bias trade-off.

Author
Published

November 4, 2024

Artwork by @allison_horst.

Agenda

November 4, 2024

  1. Redux - CART
  2. bagging process
  3. bagging error rate (OOB error)

November 6, 2024

  1. Random Forests
  2. Example

Readings

Reflection questions

  • How does bagging improve on a single tree? How does tuning mtry (with aggregation) improve on a single tree? (That is, what advantage do forests have over single trees.)

  • How do Random Forests make predictions on test data?

  • Can Random Forests be used for both classification and regression or only one of the two tasks?

  • Can you use categorical / character predictors with Random Forests?

  • How are mtry and the number of trees chosen?

  • How do the bias and variance change for different values of mtry and number of trees?

  • What are the advantages of the Random Forests algorithm?

  • What are the disadvantages of the Random Forest algorithm?

Ethics considerations

  • What type of feature engineering is required for Random Forests?

  • Do Random Forests produce a closed form “model” that can be written down or visualized and handed to a client?

  • If the model produces near perfect predictions on the test data, what are some potential concerns about putting that model into production?

Slides

Additional Resources

  • The end of science (???) “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”, Science, June 23, 2008.

  • Maybe not so fast. “10 things statistics taught us about big data analysis”, Jeff Leek, May 22, 2014.

  • Julia Silge’s blog <a href = “https://juliasilge.com/blog/sf-trees-random-tuning/” target_“blank”>Tuning Random Forest parameters

  • Julia Silge’s blog <a href = “https://juliasilge.com/blog/water-sources/” target_“blank”>Predicting water sources with Random Forests

:::