Data wrangling skills are among the most important to hone.
_join
)pivot
ingmap
pingClass notes: Data Wrangling
Wickham (2017) Data transformation and Tidy data in R for Data Science.
If you aren’t super comfortable with R yet, check out Workflow: basics in R for Data Science.
How and why is %>%
used? And how is it different from the layering symbol +
used in ggplot()
?
What are the main data wrangling verbs?
How do you distinguish the different _join
functions? Are the _join
keys formatted in the same way across the two datasets? Are the data recorded in the same way (e.g., is age birthday or age at recording?) ?
What are some of the ways to distinguish a data verb from a typical function?
In class slides for both 9/14/21 and 9/16/21.
RStudio cheatsheets
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/hardin47/m154-comp-stats, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".