Many trees make a forest. Bagging gives FREE independent model assessment or parameter tuning. Random Forests have a fantastic variance - bias trade-off.
Class notes: bagging
Class notes: random forests
Gareth, Witten, Hastie, and Tibshirani (2021), bagging & random forests (section 8.2) Introduction to Statistical Learning.
Max Kuhn and Julia Silge (2021), Tidy Modeling with R
How does bagging improve on a single tree? How does tuning mtry
(with aggregation) improve on a single tree? (That is, what advantage do forests have over single trees.)
How do Random Forests make predictions on test data?
Can Random Forests be used for both classification and regression or only one of the two tasks?
Can you use categorical / character predictors with Random Forests?
How are mtry
and the number of trees chosen?
How do the bias and variance change for different values of mtry
and number of trees?
What are the advantages of the Random Forests algorithm?
What are the disadvantages of the Random Forest algorithm?
What type of feature engineering is required for Random Forests?
Do Random Forests produce a closed form “model” that can be written down or visualized and handed to a client?
If the model produces near perfect predictions on the test data, what are some potential concerns about putting that model into production?
In class slides - bagging & random forests for 11/2/21 and 11/4/21.
From last week: WU14 - CART
The end of science (???) “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete”, Science, June 23, 2008.
Maybe not so fast. “10 things statistics taught us about big data analysis”, Jeff Leek, May 22, 2014.
Julia Silge’s blog <a href = “https://juliasilge.com/blog/sf-trees-random-tuning/” target_“blank”>Tuning Random Forest parameters
Julia Silge’s blog <a href = “https://juliasilge.com/blog/water-sources/” target_“blank”>Predicting water sources with Random Forests
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".