class: right, top, my-title, title-slide # Data Viz & Introduction to ggplot2 ### Jo Hardin ### September 7, 2021 --- # Agenda 9/7/21 1. Cholera: what went (didn't go) well with the graphics? 2. Challenger: what didn't go (went) well with the graphics? 3. Thoughts on plotting Tufte (1997) <a href = "http://www.edwardtufte.com/tufte/books_textb" target = "_blank">Visual and Statistical Thinking: Displays of Evidence for Making Decisions</a>. (Use Google to find it.) --- # Preliminaries 1. Make the data stand out 2. Facilitate comparison 3. Add information --- # Preliminaries Tufte lists two main motivational steps to working with graphics as part of an argument. 1. "An essential analytic task in making decisions based on evidence is to understand how things work." 2. Making decisions based on evidence requires the appropriate display of that evidence." --- # Cholera - a picture tells 1000 words <div class="figure" style="text-align: center"> <img src="../images/cholera1.jpg" alt="How many aspects of this graph can you point out which are relevant to figuring out that cholera infection was coming from a single pump? Are there any distracting aspects?" width="65%" /> <p class="caption">How many aspects of this graph can you point out which are relevant to figuring out that cholera infection was coming from a single pump? Are there any distracting aspects?</p> </div> --- # Cholera - difficult to interpret <div class="figure" style="text-align: center"> <img src="../images/cholera2.jpg" alt="Why would the outbreak already have begun to decline before the pump handle was removed?" width="80%" /> <p class="caption">Why would the outbreak already have begun to decline before the pump handle was removed?</p> </div> --- # Challenger - Problematic <div class="figure" style="text-align: center"> <img src="../images/challenger2.jpg" alt="One of the graphics which was particularly unconvincing in trying to explain that O-rings fail in the cold." width="70%" /> <p class="caption">One of the graphics which was particularly unconvincing in trying to explain that O-rings fail in the cold.</p> </div> --- # Challenger - Better???? <div class="figure" style="text-align: center"> <img src="../images/challenger1.jpg" alt="A different graph of the Challenger information, now sorted by temperature" width="3524" /> <p class="caption">A different graph of the Challenger information, now sorted by temperature</p> </div> --- # Challenger - Improved <div class="figure" style="text-align: center"> <img src="../images/TuftestemperatureandOringrelationshi.jpg" alt="The graphic the engineers should have led with in trying to persuade the administrators not to launch. It is evident that the number of O-ring failures is quite highly associated with the ambient temperature. Note the *vital* information on the x-axis associated with the large number of launches at warm temperatures that had *zero* O-ring failures." width="1040" /> <p class="caption">The graphic the engineers should have led with in trying to persuade the administrators not to launch. It is evident that the number of O-ring failures is quite highly associated with the ambient temperature. Note the *vital* information on the x-axis associated with the large number of launches at warm temperatures that had *zero* O-ring failures.</p> </div> --- # Fonts matter <img src="../images/fontsmatter.png" title="image credit: Will Chase RStudio::conf 2020" alt="image credit: Will Chase RStudio::conf 2020" width="500px" style="display: block; margin: auto;" /> --- # Advice on plotting, specific - Avoid having other graph elements interfere with data - Use visually prominent symbols - Avoid over-plotting (One way to avoid over plotting: jitter the values) - Different values of data may obscure each other - Include all or nearly all of the data - Fill data region --- # Advice on plotting, general - Eliminate superfluous material - Facilitate comparisons - Choose the best scale - Make the plot data / information rich - Use good captions, alt text, conclusions --- # Simplify <div class="figure"> <img src="../images/data-ink-bar.gif" alt="A gif of a barplot which starts out cluttered with labels and slowly becomes simplified with the relevant information highlighted." width="85%" /> <p class="caption">image credit: https://www.darkhorseanalytics.com/portfolio-data-looks-better-naked</p> </div> --- # Simplified <div class="figure"> <img src="../images/barplot-gif-before.png" alt="The before and after images with the process of simplifying a barplot." width="47%" /><img src="../images/barplot-gif-after.png" alt="The before and after images with the process of simplifying a barplot." width="47%" /> <p class="caption">image credit: https://www.darkhorseanalytics.com/portfolio-data-looks-better-naked</p> </div> --- # NYT 9/7/21 <div class="figure"> <img src="../images/vacc-case.png" alt="A scatterplot showing that states with higher vaccination rates have lower COVID case rates. A few states are highlighted in stronger font: NY, CA, MA have low COVID rates and high vaccination rates; SC GA, ID have high COVID rates and low vaccination rates; TX and USA are in the middle with medium vaccination and medium COVID rates." width="90%" /> <p class="caption">One in 5,000, NYT, D. Leonhardt 9/7/21; image credit: https://www.nytimes.com/2021/09/07/briefing/risk-breakthrough-infections-delta.html</p> </div> --- # Worth a Mention .pull-left[ W.E.B. DuBois (1868-1963) * sociologist * data scientist] .pull-right[ <div class="figure"> <img src="../images/WEB_DuBois_1918.jpg" alt="image of WEB Dubois" width="40%" /> <p class="caption">image credit: wikipedia</p> </div> ] In 1900 Du Bois contributed approximately 60 data visualizations to an exhibit at the Exposition Universelle in Paris, an exhibit designed to illustrate the progress made by African Americans since the end of slavery (only 37 years prior, in 1863). --- # Beautiful & Informative Graphics https://drawingmatter.org/w-e-b-du-bois-visionary-infographics/ .pull-left[ <img src="../images/dubois-graphs1.png" title="figures from DuBois's 1900 exhibition" alt="figures from DuBois's 1900 exhibition" width="90%" /> ] .pull-right[ <img src="../images/dubois-graphs2.png" title="figures from DuBois's 1900 exhibition" alt="figures from DuBois's 1900 exhibition" width="90%" /> ] --- # Agenda 9/9/21 1. Grammar of graphics 2. `ggplot2` --- # Grammar of graphics Yau (2013) gives us nine visual cues, and Wickham (2014) translates them into a language using `ggplot2`. 1. Visual Cues: the aspects of the figure where we should focus. **Position** (numerical) where in relation to other things? **Length** (numerical) how big (in one dimension)? **Angle** (numerical) how wide? parallel to something else? **Direction** (numerical) at what slope? In a time series, going up or down? **Shape** (categorical) belonging to what group? **Area** (numerical) how big (in two dimensions)? Beware of improper scaling! **Volume** (numerical) how big (in three dimensions)? Beware of improper scaling! **Shade** (either) to what extent? how severely? **Color** (either) to what extent? how severely? Beware of red/green color blindness. 2. Coordinate System: rectangular, polar, geographic, etc. 3. Scale: numeric (linear? logarithmic?), categorical (ordered?), time 4. Context: in comparison to what (think back to ideas from Tufte) --- ## Pieces of the Graph .pull-left[ Visual Cues of Yau (2013): **Position** (numerical) **Length** (numerical) **Angle** (numerical) **Direction** (numerical) **Shape** (categorical) **Area** (numerical) **Volume** (numerical) **Shade** (either) **Color** (either) ] .pull-right[ <img src="../images/Yau_viz_cues.png" width="100%" style="display: block; margin: auto;" /> ] --- ## Order Matters <img src="../images/Yau_order.png" width="100%" style="display: block; margin: auto;" /> --- ## Cues Together <img src="../images/Yau_cuestogether.png" width="100%" style="display: block; margin: auto;" /> --- # Goals of `ggplot2` What I will try to do * give a tour of `ggplot2` * explain how to think about plots the `ggplot2` way * prepare/encourage you to learn more later What I can't do in one session * show every bell and whistle * make you an expert at using `ggplot2` --- # Getting help 1. One of the best ways to get started with ggplot is to google what you want to do with the word ggplot. Then look through the images that come up. More often than not, the associated code is there. There are also ggplot galleries of images, one of them is here: https://plot.ly/ggplot2/ 2. Look at the end of this presentation and the syllabus. More help options there. <img src="../images/plotly.png" width="100%" style="display: block; margin: auto;" /> --- ## What are the visual cues on this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-18-1.png)<!-- --> ] .pull-right[ * position * length * shape * area/volume * shade/color Coordinate System? Scale? ] --- ## What are the visual cues on this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-20-1.png)<!-- --> ] .pull-right[ * position * length * shape * area/volume * shade/color Coordinate System? Scale? ] --- ## What are the visual cues on this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-21-1.png)<!-- --> ] .pull-right[ * position * length * shape * area/volume * shade/color Coordinate System? Scale? ] --- ## The grammar of graphics `ggplot` **geom**: the geometric "shape" used to display data * bar, point, line, ribbon, text, etc. **aesthetic**: an attribute controlling how geom is displayed wih respect to variables * x position, y position, color, fill, shape, size, etc. **scale**: adjust information in the aesthetic to map onto the plot * *particular* assignment of colors, shapes, sizes, etc.; making axes continuous or constrained to a particular range of values. **guide**: helps user convert visual data back into raw data (legends, axes) **stat**: a transformation applied to data before geom gets it * example: histograms work on binned data --- ## Set up ```r library(mosaic) data(Births78) # restore fresh version of Births78 head(Births78, 3) ``` ``` ## date births wday year month day_of_year day_of_month day_of_week ## 1 1978-01-01 7701 Sun 1978 1 1 1 1 ## 2 1978-01-02 7527 Mon 1978 1 2 2 2 ## 3 1978-01-03 8825 Tue 1978 1 3 3 3 ``` --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-23-1.png)<!-- --> ] .pull-right[ Two Questions: 1. What do we want R to do? (What is the goal?) 2. What does R need to know? ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-24-1.png)<!-- --> ] .pull-right[ 1. Goal: scatterplot = a plot with points 2. What does R need to know? * data source: `Births78` * aesthetics: * `date -> x` * `births -> y` * points (!) ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-25-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>ggplot(<span style='background-color:#ffff7f'>data=</span>Births78, <br> <span style='background-color:#ffff7f'>aes</span>(x=date, y=births)) + <br> geom_point() +<br> ggtitle("US Births in 1978")<br><br>ggplot() +<br> geom_point(<span style='background-color:#ffff7f'>data=</span>Births78, <br> <span style='background-color:#ffff7f'>aes</span>(x=date, y=births)) +<br> ggtitle("US Births in 1978")</code> ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-27-1.png)<!-- --> ] .pull-right[ What has changed? * new aesthetic: mapping color to day of week Adding day of week to the data set The `wday()` function in the `lubridate` package computes the day of the week from a date. ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-28-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>Births78 <- <br> Births78 %>% <br> mutate(<span style='background-color:#ffff7f'>day_of_week</span> = <br> wday(date, <br> label=TRUE))<br>Births78 %>%<br>ggplot(aes(x=date<br> y=births, <br> color=<span style='background-color:#ffff7f'>day_of_week</span>)) +<br> geom_point() +<br> ggtitle("US Births in 1978")</code> ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-30-1.png)<!-- --> ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-31-1.png)<!-- --> ] .pull-right[ lines instead of dots! <code class ='r hljs remark-code'>Births78 %>%<br> ggplot(aes(x=date, <br> y=births,<br> color=day_of_week)) +<br> <span style='background-color:#ffff7f'>geom_line</span>() +<br> ggtitle("US Births in 1978")</code> ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-33-1.png)<!-- --> ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-34-1.png)<!-- --> ] .pull-right[ Now there are two **layers**: one with points and one with lines <code class ='r hljs remark-code'>Births78 %>%<br> ggplot(aes(x=date, <br> y=births,<br> color=day_of_week)) + <br> <span style='background-color:#ffff7f'>geom_point</span>() + <br> <span style='background-color:#ffff7f'>geom_line</span>()+<br> ggtitle("US Births in 1978")</code> * The layers are placed one on top of the other: the points are *below* and the lines are *above*. * `data` and `aes` specified in `ggplot()` affect all geoms ] --- ## What does this code do? ```r Births78 %>% ggplot(aes(x=date, y=births, color="navy")) + geom_point() + ggtitle("US Births in 1978") ``` --- ## What does this code do? <code class ='r hljs remark-code'>Births78 %>% <br> ggplot(aes(x=date, y=births, <span style='background-color:#ffff7f'>color=</span>"navy")) + <br> geom_point() +<br> ggtitle("US Births in 1978")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-90-1.png)<!-- --> This is *mapping* the color aesthetic to a new variable with only one value ("navy"). So all the dots get set to the same color, but it's not navy. --- ## Setting vs. Mapping If we want to *set* the color to be navy for all of the dots, we do it outside the `aes()` designation: <code class ='r hljs remark-code'>Births78 %>%<br> ggplot(aes(x=date, y=births)) + # map variables <br> geom_point(<span style='background-color:#ffff7f'>color=</span>"navy") + # set attributes<br> ggtitle("US Births in 1978")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-91-1.png)<!-- --> * Note that `color = "navy"` is now outside of the aesthetics list. That's how `ggplot2` distinguishes between mapping and setting. --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-39-1.png)<!-- --> ] --- ## How do we make this plot? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-40-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>Births78 %>%<br> ggplot(aes(x=date, <br> y=births)) + <br> geom_line(aes(<span style='background-color:#ffff7f'>color=</span>day_of_week)) + <br> geom_point(<span style='background-color:#ffff7f'>color=</span>"navy") + <br> ggtitle("US Births in 1978")</code> * `ggplot()` establishes the default data and aesthetics for the geoms, but each geom may change these defaults. * good practice: put into `ggplot()` the things that affect all (or most) of the layers; rest in `geom_blah()` ] --- ## Setting vs. Mapping (again) Information gets passed to the plot via: a. `map` the **variable** information inside the aes (aesthetic) command a. `set` the **non-variable** information outside the aes (aesthetic) command --- ## Other geoms ```r apropos("^geom_") ``` ``` [1] "geom_abline" "geom_area" [3] "geom_ash" "geom_bar" [5] "geom_barh" "geom_bin_2d" [7] "geom_bin2d" "geom_blank" [9] "geom_boxplot" "geom_boxploth" [11] "geom_col" "geom_colh" [13] "geom_contour" "geom_contour_filled" [15] "geom_count" "geom_crossbar" [17] "geom_crossbarh" "geom_curve" [19] "geom_density" "geom_density_2d" [21] "geom_density_2d_filled" "geom_density_line" [23] "geom_density_ridges" "geom_density_ridges_gradient" [25] "geom_density_ridges2" "geom_density2d" [27] "geom_density2d_filled" "geom_dotplot" [29] "geom_errorbar" "geom_errorbarh" [31] "geom_errorbarh" "geom_freqpoly" [33] "geom_function" "geom_hex" [35] "geom_histogram" "geom_histogramh" [37] "geom_hline" "geom_jitter" [39] "geom_label" "geom_line" [41] "geom_linerange" "geom_linerangeh" [43] "geom_lm" "geom_map" [45] "geom_path" "geom_point" [47] "geom_pointrange" "geom_pointrangeh" [49] "geom_polygon" "geom_qq" [51] "geom_qq_line" "geom_quantile" [53] "geom_rangeframe" "geom_raster" [55] "geom_rect" "geom_ribbon" [57] "geom_ridgeline" "geom_ridgeline_gradient" [59] "geom_rug" "geom_segment" [61] "geom_sf" "geom_sf_label" [63] "geom_sf_text" "geom_sina" [65] "geom_smooth" "geom_spline" [67] "geom_spoke" "geom_step" [69] "geom_text" "geom_tile" [71] "geom_tufteboxplot" "geom_violin" [73] "geom_violinh" "geom_vline" [75] "geom_vridgeline" ``` --- ## Other geoms help pages will tell you their aesthetics, default stats, etc. ```r ?geom_area # for example ``` --- ## Let's try `geom_area` .pull-left[ ```r Births78 %>% ggplot(aes(x=date, y=births, fill=day_of_week)) + geom_area()+ ggtitle("US Births in 1978") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-44-1.png)<!-- --> ] --- ## Let's try `geom_area` .pull-left[ ```r Births78 %>% ggplot(aes(x=date, y=births, fill=day_of_week)) + geom_area()+ ggtitle("US Births in 1978") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-45-1.png)<!-- --> ] .pull-right[ ... not a good plot * overplotting is hiding much of the data * extending y-axis to 0 may or may not be desirable. ] --- ## Side note: what makes a plot good? Most (all?) graphics are intended to help us make comparisons * How does something change over time? * Do my treatments matter? How much? * Do treatnent and control respond the same way? **Key plot metric:** Does my plot make the comparisions I am interested in * easily, and * accurately? --- ## Time for some different data HELPrct: Health Evaluation and Linkage to Primary care randomized clinical trial. Subjects admitted for treatment for addiction to one of three substances. ```r head(HELPrct) ``` ``` ## age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b female ## 1 37 1 yes 49 3 177 225 0 NA 0 ## 2 37 1 yes 30 22 2 NA 0 NA 0 ## 3 26 1 yes 39 0 3 365 20 NA 0 ## 4 39 1 yes 15 2 189 343 0 1 1 ## 5 32 1 yes 39 12 2 57 0 1 0 ## 6 47 1 yes 6 1 31 365 0 NA 1 ## sex g1b homeless i1 i2 id indtot linkstatus link mcs pcs pss_fr ## 1 male yes housed 13 26 1 39 1 yes 25.111990 58.41369 0 ## 2 male yes homeless 56 62 2 43 NA <NA> 26.670307 36.03694 1 ## 3 male no housed 0 0 3 41 0 no 6.762923 74.80633 13 ## 4 female no housed 5 5 4 28 0 no 43.967880 61.93168 11 ## 5 male no homeless 10 13 5 38 1 yes 21.675755 37.34558 10 ## 6 female no housed 4 4 6 29 0 no 55.508991 46.47521 5 ## racegrp satreat sexrisk substance treat avg_drinks max_drinks ## 1 black no 4 cocaine yes 13 26 ## 2 white no 7 alcohol yes 56 62 ## 3 black no 2 heroin no 0 0 ## 4 white yes 4 heroin no 5 5 ## 5 black no 6 cocaine no 10 13 ## 6 black no 5 cocaine yes 4 4 ## hospitalizations ## 1 3 ## 2 22 ## 3 0 ## 4 2 ## 5 12 ## 6 1 ``` --- ## Who are the people in the study? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-48-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=substance)) + <br> <span style='background-color:#ffff7f'>geom_bar</span>()+<br> ggtitle("HELP trial")</code> * Hmm. What's up with `y`? * `stat_bin()` is being applied to the data before the `geom_bar()` gets to do its thing. Binning creates the `y` values. ] --- ## Who are the people in the study? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-50-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=substance, <br> <span style='background-color:#ffff7f'>fill=</span>children)) + <br> geom_bar()+<br> ggtitle("HELP trial")</code> ] --- ## Who are the people in the study? .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-52-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=substance, <br> fill=children)) + <br> geom_bar(<span style='background-color:#ffff7f'>position=</span>"fill") +<br> ylab("actually, percent")+<br> ggtitle("HELP trial")</code> ] --- ## How old are people in the HELP study? .pull-left[ ``` ## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-54-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br> <span style='background-color:#ffff7f'>geom_histogram</span>()+<br> ggtitle("HELP trial")</code> Notice the messages * `stat_bin`: Histograms are not mapping the raw data but binned data. `stat_bin()` performs the data transformation. * `binwidth`: a default binwidth has been selected, but we should really choose our own. ] --- ## Setting the binwidth manually .pull-left[ ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-56-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br> geom_histogram(<span style='background-color:#ffff7f'>binwidth=</span>2)+<br> ggtitle("HELP trial")</code> ] --- ## How old are people in the HELP study? -- Other geoms <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br> <span style='background-color:#ffff7f'>geom_freqpoly</span>(binwidth=2)+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-98-1.png)<!-- --> <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br> <span style='background-color:#ffff7f'>geom_density</span>()+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-99-1.png)<!-- --> --- ## Selecting stat and geom manually Every geom comes with a default stat * for simple cases, the stat is `stat_identity()` which does nothing * we can mix and match geoms and stats however we like <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br><span style='background-color:#ffff7f'> geom_line(stat="density")+</span><br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-100-1.png)<!-- --> --- ## Selecting stat and geom manually Every stat comes with a default geom, every geom with a default stat * we can specify stats instead of geom, if we prefer * we can mix and match geoms and stats however we like <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br><span style='background-color:#ffff7f'> stat_density( geom="line")+</span><br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-101-1.png)<!-- --> --- ## More combinations <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br><span style='background-color:#ffff7f'> geom_point(stat="bin", binwidth=3) + </span><br><span style='background-color:#ffff7f'> geom_line(stat="bin", binwidth=3) +</span><br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-102-1.png)<!-- --> --- ## More combinations <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br><span style='background-color:#ffff7f'> geom_area(stat="bin", binwidth=3) +</span><br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-103-1.png)<!-- --> --- ## More combinations <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=age)) + <br><span style='background-color:#ffff7f'> geom_point(stat="bin", binwidth=3, aes(size=..count..)) +</span><br><span style='background-color:#ffff7f'> geom_line(stat="bin", binwidth=3) +</span><br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-104-1.png)<!-- --> --- ## How much drinking? (i1) ```r HELP_data %>% ggplot(aes(x=i1)) + geom_histogram()+ ggtitle("HELP clinical trial at detoxification unit") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-65-1.png)<!-- --> --- ## How much drinking? (i1) ```r HELP_data %>% ggplot(aes(x=i1)) + geom_density()+ ggtitle("HELP clinical trial at detoxification unit") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-66-1.png)<!-- --> --- ## How much drinking? (i1) ```r HELP_data %>% ggplot(aes(x=i1)) + geom_area(stat="density")+ ggtitle("HELP clinical trial at detoxification unit") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-67-1.png)<!-- --> --- ## Covariates: Adding in more variables Using color and linetype: <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=i1, <span style='background-color:#ffff7f'>color=</span>substance, <span style='background-color:#ffff7f'>linetype=</span>children)) + <br> geom_line(stat="density")+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-105-1.png)<!-- --> --- ## Using color and facets <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=i1, color=substance)) + <br> geom_line(stat="density") + <span style='background-color:#ffff7f'>facet_grid</span>( . ~ children )+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-106-1.png)<!-- --> <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=i1, color=substance)) + <br> geom_line(stat="density") + <span style='background-color:#ffff7f'>facet_grid</span>( children ~ . )+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-107-1.png)<!-- --> --- ## Boxplots Boxplots use `stat_quantile()` (five number summary). The quantitative variable must be `y`, and there must be an additional `x` variable. ```r HELP_data %>% ggplot(aes(x=substance, y=age, color=children)) + geom_boxplot()+ ggtitle("HELP clinical trial at detoxification unit") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-71-1.png)<!-- --> --- ## Horizontal boxplots Horizontal boxplots are obtained by flipping the coordinate system: * `coord_flip()` may be used with other plots as well to reverse the roles of `x` and `y` on the plot. <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=substance, y=age, color=children)) + <br> geom_boxplot() +<br> <span style='background-color:#ffff7f'>coord_flip()</span>+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-108-1.png)<!-- --> --- ## Axes scaling with boxplots We can scale the continuous axis <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=substance, y=age, color=children)) + <br> geom_boxplot() +<br> <span style='background-color:#ffff7f'>coord_trans</span>(y="exp")+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-109-1.png)<!-- --> --- ## Give me some space We've triggered a new feature: `dodge` (for dodging things left/right). We can control how much if we set the dodge manually. <code class ='r hljs remark-code'>HELP_data %>% <br> ggplot(aes(x=substance, y=age, color=children)) + <br><span style='background-color:#ffff7f'> geom_boxplot(position=position_dodge(width=1)) +</span><br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-110-1.png)<!-- --> --- ## Issues with bigger data * Although we can see a generally positive association (as we would expect), the overplotting may be hiding information. <code class ='r hljs remark-code'>library(NHANES)<br>dim(NHANES)</code> ``` ## [1] 10000 76 ``` <code class ='r hljs remark-code'>NHANES %>% ggplot(aes(x=Height, y=Weight)) +<br> geom_point() + <span style='background-color:#ffff7f'>facet_grid</span>( Gender ~ PregnantNow )</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-111-1.png)<!-- --> --- ## Using alpha (opacity) One way to deal with overplotting is to set the opacity low. <code class ='r hljs remark-code'>NHANES %>% <br> ggplot(aes(x=Height, y=Weight)) +<br> geom_point(<span style='background-color:#ffff7f'>alpha=0.01</span>) + facet_grid( Gender ~ PregnantNow )</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-112-1.png)<!-- --> --- ## geom_density2d Alternatively (or simultaneously) we might prefere a different geom altogether. <code class ='r hljs remark-code'>NHANES %>% <br> ggplot(aes(x=Height, y=Weight)) +<br> <span style='background-color:#ffff7f'>geom_density2d()</span> + facet_grid( Gender ~ PregnantNow )</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-113-1.png)<!-- --> --- ## Multiple layers .pull-left[ ```r ggplot( data=HELP_data, aes(x=children, y=age)) + geom_boxplot(outlier.size=0) + geom_point(alpha=.6) + coord_flip()+ ggtitle("HELP clinical trial at detoxification unit") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-78-1.png)<!-- --> ] .pull-right[ <code class ='r hljs remark-code'>ggplot( data=HELP_data, aes(x=children, y=age)) +<br> geom_boxplot(outlier.size=0) +<br><span style='background-color:#ffff7f'> geom_jitter(alpha=.6, width = 0.1) +</span><br> coord_flip()+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-114-1.png)<!-- --> ] --- ## Multiple layers <code class ='r hljs remark-code'>ggplot( data=HELP_data, aes(x=children, y=age)) +<br> geom_boxplot(outlier.size=0) +<br><span style='background-color:#ffff7f'> geom_point(alpha=.6, position=position_jitter(width=.1, height=0)) +</span><br> coord_flip()+<br> ggtitle("HELP clinical trial at detoxification unit")</code> ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-115-1.png)<!-- --> --- ## Things I haven't mentioned (much) * coords (`coord_flip()` is good to know about) * themes (for customizing appearance) * position (`position_dodge()`, `position_jitterdodge()`, `position_stack()`, etc.) * transforming axes ```r library(ggthemes) ggplot(Births78, aes(x=date, y=births)) + geom_point() + theme_wsj() ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-81-1.png)<!-- --> --- ## Things I haven't mentioned (much) * coords (`coord_flip()` is good to know about) * themes (for customizing appearance) * position (`position_dodge()`, `position_jitterdodge()`, `position_stack()`, etc.) * transforming axes ```r ggplot( data=HELP_data, aes(x=substance, y=age, color=children)) + geom_boxplot(coef = 10, position=position_dodge()) + geom_point(aes(color=children, fill=children), position=position_jitterdodge()) + ggtitle("HELP clinical trial at detoxification unit") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-82-1.png)<!-- --> --- ## A little bit of everything ```r ggplot( data=HELP_data, aes(x=substance, y=age, color=children)) + geom_boxplot(coef = 10, position=position_dodge(width=1)) + geom_point(aes(fill=children), alpha=.5, position=position_jitterdodge(dodge.width=1, jitter.width = 0.2)) + facet_wrap(~homeless)+ ggtitle("HELP clinical trial at detoxification unit") ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-83-1.png)<!-- --> <!-- ## Some short cuts 1. `qplot()` provides "quick plots" for `ggplot2` ```r qplot(length, width, data=KidsFeet) ``` ![](2021-09-07-ggplotPresent_files/figure-html/unnamed-chunk-84-1.png)<!-- --> 2. `mplot(dataframe)` provides an interactive plotting tool for both `ggplot2` and `lattice`. ```r mplot(HELPrct) ``` * quickly make several plots from a data frame * can show the expression so you can learn how to do it or copy and paste into another document --> --- ## Want to learn more? * [docs.ggplot2.org/](http://docs.ggplot2.org/) * [R for Data Science](https://r4ds.had.co.nz/) by Hadley Wickham and Garrett Grolemund --- ## What's around the corner? `shiny` * interactive graphics / modeling * https://shiny.rstudio.com/ `plotly` > `Plotly` is an R package for creating interactive web-based graphs via plotly's JavaScript graphing library, `plotly.js`. The `plotly` R libary contains the `ggplotly` function , which will convert `ggplot2` figures into a Plotly object. Furthermore, you have the option of manipulating the Plotly object with the `style` function. * https://plot.ly/ggplot2/getting-started/ `gganimate` * [`gganimate` tutorial](https://gganimate.com/articles/gganimate.html)