class: center, middle, title-slide .title[ # The Grammar of Graphics ] .author[ ### Luke Tierney ] .institute[ ### University of Iowa ] .date[ ### 2023-05-06 ] --- layout: true <link rel="stylesheet" href="stat4580.css" type="text/css" /> ## Background --- The _Grammar of Graphics_ is a language proposed by Leland Wilkinson for describing statistical graphs. > Wilkinson, L. (2005), _The Grammar of Graphics_, 2nd ed., Springer. -- The grammar of graphics has served as the foundation for the graphics frameworks in [SPSS](https://www.ibm.com/products/spss-statistics), [Vega-Lite](https://vega.github.io/vega-lite/) and several other systems. -- `ggplot2` represents an implementation and extension of the grammar of graphics for R. -- > Wickham, H. (2016), _ggplot2: Elegant Graphics for Data Analysis_, > 2nd ed., Springer. [3rd ed. in progress](https://ggplot2-book.org/). -- > On line documentation: <https://ggplot2.tidyverse.org/reference/index.html>. -- > Wickham. H., and Grolemund, G. (2016), > [_R for Data Science_](https://r4ds.had.co.nz/), O'Reilly. -- > [Data visualization cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf) -- > Winston Chang (2018), [_R Graphics Cookbook_, 2nd > edition](https://r-graphics.org/), O’Reilly. ([Book source on > GitHub](https://github.com/wch/rgcookbook)) --- The idea is that any basic plot can be built out of a combination of -- * a data set; -- * one or more geometrical representation (_geoms_); -- * mappings of values to _aesthetic_ features of the geom; -- * a _stat_ to produce values to be mapped; -- * position adjustments; -- * a coordinate system; -- * a scale specification; -- * a faceting scheme. -- `ggplot2` provides tools for specifying these components and adjusting their features. -- Many components and features are provided by default and do not need to be specified explicitly unless the defaults are to be changed. --- layout: true ## A Basic Template --- The simplest graph needs a data set, a geom, and a mapping: ```r ggplot(data = <DATA>) + <GEOM>(mapping = aes(<MAPPINGS>)) ``` -- The appearance of geom objects is controlled by _aesthetic_ features. -- Each geom has some required and some optional aesthetics. -- For `geom_point` the required aesthetics are * `x` position * `y` position. -- Optional aesthetics include * `color` * `fill` * `shape` * `size` -- `geom_point` is used to produce a _scatter plot_. --- layout: true ## Scatter Plots Using `geom_point` --- The `mpg` data set included in the `ggpllot2` package includes EPA fuel economy data from 1999 to 2008 for 38 popular models of cars. ```r mpg ## # A tibble: 234 × 11 ## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp… ## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp… ## 3 audi a4 2 2008 4 manu… f 20 31 p comp… ## 4 audi a4 2 2008 4 auto… f 21 30 p comp… ## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp… ## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp… ## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp… ## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp… ## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp… ## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp… ## # ℹ 224 more rows ``` --- .pull-left.width-45[ A simple scatter plot: ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) ``` ] -- .pull-right.width-55[ <img src="ggplot_files/figure-html/mpg-plain-1.png" style="display: block; margin: auto auto auto 0;" /> ] --- .pull-left.width-45[ Map color to vehicle class: ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, * color = class)) ``` ] .pull-right.width-55[ <img src="ggplot_files/figure-html/mpg-color-1.png" style="display: block; margin: auto auto auto 0;" /> ] --- .pull-left.width-45[ And map shape to number of cylinders: ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class, * shape = factor(cyl))) ``` ] .pull-right.width-55[ <img src="ggplot_files/figure-html/mpg-color-shape-1.png" style="display: block; margin: auto auto auto 0;" /> {{content}} ] -- <!-- --> Perception: * Too many colors; * shapes are too small; * interference between shapes and colors. --- .pull-left.width-45[ Aesthetics can be mapped to a variable or set to a fixed common value. {{content}} ] -- This can be used to override default settings: ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy), color = "blue", shape = 1) ``` -- .pull-right.width-55[ <img src="ggplot_files/figure-html/mpg-fixed-1.png" style="display: block; margin: auto auto auto 0;" /> ] --- .pull-left.width-45[ Changing the `size` aesthetics makes shapes easier to recognize: ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class, shape = factor(cyl)), * size = 3) ``` ] .pull-right.width-55[ <img src="ggplot_files/figure-html/mpg-color-shape-large-1.png" style="display: block; margin: auto auto auto 0;" /> ] -- <!-- --> Perception: Still too many colors; still have interference. --- .pull-left.width-60[ Available point shapes are specified by number: {{content}} ] -- <img src="ggplot_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> -- .pull-right.width-40[ Shapes 1-20 have their color set by the `color` aesthetic and ignore the `fill` aesthetic. {{content}} ] -- For shapes 21-25 the `color` aesthetic specifies the border color and `fill` specifies the interior color. --- Using `shape` 21 with `cyl` mapped to the `fill` aesthetic: .pull-left.width-45[ ```r ggplot(mutate(mpg, cyl = factor(cyl))) + geom_point(aes(x = displ, y = hwy, fill = cyl), shape = 21, size = 4) ``` ] .pull-right.width-55[ <img src="ggplot_files/figure-html/mpg-fill-21-1.png" style="display: block; margin: auto;" /> ] -- <!-- --> Perception: Borders, larger symbols, fewer colors help. --- Specifying a new default is very different from specifying a constant value as an aesthetic. -- .pull-left.small-code[ Constant aesthetic: Rarely what you want: ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy, * color = "blue")) ``` {{content}} ] -- <img src="ggplot_files/figure-html/mpg-bad-color-1.png" style="display: block; margin: auto;" /> -- .pull-right.small-code[ Default: Probably what you want: ```r ggplot(mpg) + geom_point(aes(x = displ, y = hwy), * color = "blue") ``` {{content}} ] -- <img src="ggplot_files/figure-html/mpg-good-color-1.png" style="display: block; margin: auto;" /> --- layout: true ## Geometric Objects --- `ggplot2` provides a number of geoms: .small-code[ ```r geom_abline geom_area geom_bar geom_bin_2d geom_bin2d geom_blank geom_boxplot geom_col geom_contour geom_contour_filled geom_count geom_crossbar geom_curve geom_density geom_density_2d geom_density_2d_filled geom_density2d geom_density2d_filled geom_dotplot geom_errorbar geom_errorbarh geom_freqpoly geom_function geom_hex geom_histogram geom_hline geom_jitter geom_label geom_line geom_linerange geom_map geom_path geom_point geom_pointrange geom_polygon geom_qq geom_qq_line geom_quantile geom_raster geom_rect geom_ribbon geom_rug geom_segment geom_sf geom_sf_label geom_sf_text geom_smooth geom_spoke geom_step geom_text geom_tile geom_violin geom_vline ``` ] -- Additional geoms are available in packages like `ggforce`, `ggridges`, and others described on the [`ggplot2` extensions site](https://exts.ggplot2.tidyverse.org/). --- .pull-left.width-45[ Geoms can be added as _layers_ to a plot. {{content}} ] -- Mappings common to all, or most, geoms can be specified in the `ggplot` call: {{content}} -- ```r ggplot(mpg, aes(x = displ, y = hwy)) + geom_smooth() + geom_point() ``` -- .pull-right.width-55[ <img src="ggplot_files/figure-html/mpg-smooth-1.png" style="display: block; margin: auto;" /> ] --- Geoms can also use different data sets. -- .pull-left.small-code[ One way to highlight Europe in a plot of life expectancy against log income for 2007 is to start with a plot of the full data: {{content}} ] -- ```r library(dplyr) library(gapminder) gm_2007 <- filter(gapminder, year == 2007) (p <- ggplot(gm_2007, aes(x = gdpPercap, y = lifeExp)) + geom_point() + scale_x_log10()) ``` -- .pull-right[ <img src="ggplot_files/figure-html/gm_2007-1.png" style="display: block; margin: auto;" /> ] --- Then add a layer showing only Europe: .pull-left.small-code[ ```r gm_2007_eu <- filter(gm_2007, continent == "Europe") p + geom_point(data = gm_2007_eu, color = "red", size = 3) ``` ] .pull-right[ <img src="ggplot_files/figure-html/gm_2007_eu-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Statistical Transformations --- All geoms use a statistical transformation (_stat_) to convert raw data to the values to be mapped to the object's features. -- The available stats are .small-code[ ```r stat_align stat_bin stat_bin_2d stat_bin_hex stat_bin2d stat_binhex stat_boxplot stat_contour stat_contour_filled stat_count stat_density stat_density_2d stat_density_2d_filled stat_density2d stat_density2d_filled stat_ecdf stat_ellipse stat_function stat_identity stat_qq stat_qq_line stat_quantile stat_sf stat_sf_coordinates stat_smooth stat_spoke stat_sum stat_summary stat_summary_2d stat_summary_bin stat_summary_hex stat_summary2d stat_unique stat_ydensity ``` ] -- Each geom has a default stat, and each stat has a default geom. -- * For `geom_point` the default stat is `stat_identity`. -- * For `geom_bar` the default stat is `stat_count`. -- * For `geom_histogram` the default is `stat_bin`. --- Stats can provide _computed variables_ that can be mapped to aesthetic features. -- For `stat_bin` some of the computed variables are * `count`: number of points in bin * `density`: density of points in bin, scaled to integrate to 1 -- The `density` variable can be accessed as `after_stat(dentity)`. -- Older approaches that also work but are now discouraged: * `stat(dentity)` * `..density..` --- By default, `geom_histogram` uses `y = after_stat(count)`. .pull-left.small-code[ ```r ggplot(faithful) + geom_histogram(aes(x = eruptions), binwidth = 0.25, fill = "grey", color = "black") ``` ] .pull-right[ <img src="ggplot_files/figure-html/geyser-count-1.png" style="display: block; margin: auto;" /> ] --- Explicitly specifying `y = after_stat(count)` produces the same plot: .pull-left.small-code[ ```r ggplot(faithful) + geom_histogram(aes(x = eruptions, * y = after_stat(count)), binwidth = 0.25, fill = "grey", color = "black") ``` ] .pull-right[ <img src="ggplot_files/figure-html/geyser-count-exp-1.png" style="display: block; margin: auto;" /> ] --- Using `y = after_stat(density)` produces a density scaled axis. .pull-left.small-code[ ```r (p <- ggplot(faithful) + geom_histogram(aes(x = eruptions, * y = after_stat(density)), binwidth = 0.25, fill = "grey", color = "black")) ``` ] .pull-right[ <img src="ggplot_files/figure-html/geyser-dentity-1.png" style="display: block; margin: auto;" /> ] --- `stat_function` can be used to add a density curve specified as a mixture of two normal densities: .pull-left.small-code[ ```r (ms <- mutate(faithful, type = ifelse(eruptions < 3, "short", "long")) %>% group_by(type) %>% summarize(mean = mean(eruptions), sd = sd(eruptions), n = n()) %>% mutate(p = n / sum(n))) ## # A tibble: 2 × 5 ## type mean sd n p ## <chr> <dbl> <dbl> <int> <dbl> ## 1 long 4.29 0.411 175 0.643 ## 2 short 2.04 0.267 97 0.357 ``` ```r f <- function(x) ms$p[1] * dnorm(x, ms$mean[1], ms$sd[1]) + ms$p[2] * dnorm(x, ms$mean[2], ms$sd[2]) p + stat_function(fun = f, color = "red") ``` ] .pull-right[ <img src="ggplot_files/figure-html/geyser-hist-dens-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Position Adjustments --- The available position adjustments: ```r position_dodge position_dodge2 position_fill position_identity position_jitter position_jitterdodge position_nudge position_stack ``` --- A bar chart showing the counts for the different `cut` categories in the `diamonds` data: .pull-left.small-code[ ```r ggplot(diamonds, aes(x = cut)) + geom_bar() ``` ] .pull-right[ <img src="ggplot_files/figure-html/diamonds-cut-1.png" style="display: block; margin: auto;" /> ] --- Mapping `clarity` to `fill` shows the breakdown by both `cut` and `clarity` in a _stacked bar chart_: .pull-left.small-code[ ```r ggplot(diamonds, aes(x = cut, * fill = clarity)) + geom_bar() ``` ] .pull-right[ <img src="ggplot_files/figure-html/diamonds-stack1-1.png" style="display: block; margin: auto;" /> ] --- The default `position` for bar charts is `position_stack`: .pull-left.small-code[ ```r ggplot(diamonds, aes(x = cut, fill = clarity)) + * geom_bar(position = "stack") ``` ] .pull-right[ <img src="ggplot_files/figure-html/diamonds-stack2-1.png" style="display: block; margin: auto;" /> ] --- `position_dodge` produces _side-by-side bar charts_: .pull-left.small-code[ ```r ggplot(diamonds, aes(x = cut, fill = clarity)) + * geom_bar(position = "dodge") ``` ] .pull-right[ <img src="ggplot_files/figure-html/diamonds-dodge-1.png" style="display: block; margin: auto;" /> ] --- `position_fill` rescales all bars to be equal height to help compare proportions within bars. .pull-left.small-code[ ```r ggplot(diamonds, aes(x = cut, fill = clarity)) + * geom_bar(position = "fill") ``` ] .pull-right[ <img src="ggplot_files/figure-html/diamonds-fill-1.png" style="display: block; margin: auto;" /> ] --- Using the counts to scale the widths would produce a _spine plot_, a variant of a _mosaic plot_. -- This is easiest to do with the `ggmosaic` package. -- `position_jitter` can be used with `geom_point` to avoid overplotting or break up rounding artifacts. --- Another version of the Old Faithful data available as `geyser` in package `MASS` has some rounding in the `duration` variable: -- .pull-left.small-code[ ```r data(geyser, package = "MASS") ## Adjust for different meaning of `waiting` variable geyser2 <- na.omit(mutate(geyser, duration = lag(duration))) p <- ggplot(geyser2, aes(x = duration, y = waiting)) p + geom_point() ``` ] -- .pull-right[ <img src="ggplot_files/figure-html/geyser2-1.png" style="display: block; margin: auto;" /> ] --- _Jittering_ can help break up the distracting _heaping_ of values on durations of 2 and 4 minutes. -- .pull-left.small-code[ The default amount of jittering isn't quite enough in this case: ```r p + geom_point(position = "jitter") ``` ] .pull-right[ <img src="ggplot_files/figure-html/geyser2-jit-1.png" style="display: block; margin: auto;" /> ] --- To jitter only horizontally and by a larger amount you can use .pull-left.small-code[ ```r p + geom_point(position = position_jitter(height = 0, width = 0.1)) ``` ] .pull-right[ <img src="ggplot_files/figure-html/geyser2-jit2-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Coordinate Systems --- Coordinate system functions include ```r coord_cartesian coord_equal coord_fixed coord_flip coord_map coord_munch coord_polar coord_quickmap coord_sf coord_trans ``` -- The default coordinate system is `coord_cartesian`. --- ### Cartesian Coordinates `coord_cartesian` can be used to _zoom in_ on a particular regiion: .pull-left.small-code[ ```r p + geom_point() + coord_cartesian(xlim = c(3, 4)) ``` ] .pull-right[ <img src="ggplot_files/figure-html/geyser2-zoom-1.png" style="display: block; margin: auto;" /> ] --- `coord_fixed` and `coord_equal` fix the _aspect ratio_ for a cartesian coordinate system. -- The aspect ratio is the ratio of the number physical display units per `y` unit to the number of physical display units per `x` unit. -- The aspect ratio can be important for recognizing features and patterns. -- .small-code[ ```r river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat") r <- data.frame(flow = river, month = seq_along(river)) ``` ] .pull-left.small-code.width-40[ ```r ggplot(r, aes(x = month, y = flow)) + geom_point() + coord_fixed(ratio = 4) ``` ] .pull-right.width-60[ <img src="ggplot_files/figure-html/river-flat-1.png" style="display: block; margin: auto;" /> ] --- .pull-left.small-code[ ### Polar Coordinates A filled bar chart ```r (p <- ggplot(diamonds) + geom_bar(aes(x = 1, fill = cut), position = "fill")) ``` ] .pull-right[ <img src="ggplot_files/figure-html/diamonds-fill-1-1.png" style="display: block; margin: auto;" /> ] --- .pull-left.small-code[ is turned into a pie chart by changing to polar coordinates: ```r p + coord_polar(theta = "y") ``` ] .pull-right[ <img src="ggplot_files/figure-html/diamonds-pie-1.png" style="display: block; margin: auto;" /> ] --- ### Coordinate Systems for Maps Coordinate systems are particularly important for maps. -- Polygons for many political and geographic boundaries are available through the `map_data` function. -- Boundaries for the lower 48 US states can be obtained as ```r usa <- map_data("state") ``` --- Polygon vertices are encoded by longitude and latitude. -- Plotting these in the default cartesian coordinate system usually does not work well: .pull-left.small-code[ ```r usa <- map_data("state") m <- ggplot(usa, aes(x = long, y = lat, group = group)) + geom_polygon(fill = "white", color = "black") m ``` ] .pull-right[ <img src="ggplot_files/figure-html/usa-cart-1.png" style="display: block; margin: auto;" /> ] --- Using a fixed aspect ratio is better, but an aspect ratio of 1 does not work well: .pull-left.small-code[ ```r m + coord_equal() ``` <img src="ggplot_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ The problem is that away from the equator a one degree change in latitude corresponds to a larger distance than a one degree change in longitude. {{content}} ] -- The ratio of one degree longitude separation to one degree latitude separation for the latitude at the middle of Iowa of 41 degrees is ```r longlat <- cos(41 / 90 * pi / 2) longlat ## [1] 0.7547096 ``` --- A better map is obtained using the aspect ratio `1 / longlat`: .pull-left.small-code[ ```r m + coord_fixed(1 / longlat) ``` ] .pull-right[ <img src="ggplot_files/figure-html/usa-fixed-1.png" style="display: block; margin: auto;" /> ] --- The best approach is to use a coordinate system designed specifically for maps. -- .pull-left.small-code[ There are many _projections_ used in map making. {{content}} ] -- The default projection used by `coord_map` is the [Mercator](https://en.wikipedia.org/wiki/Mercator_projection) projection. ```r m + coord_map() ``` -- .pull-right[ <img src="ggplot_files/figure-html/usa-mercator-1.png" style="display: block; margin: auto;" /> ] --- Proper map projections are non-linear; this is easier to see with an Albers projection: .pull-left.small-code[ ```r m + coord_map("albers", 20, 50) ``` ] .pull-right[ <img src="ggplot_files/figure-html/usa-albers-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Scales --- Scales are used for controlling the mapping of values to physical representations such as colors, shapes, and positions. -- Scale functions are also responsible for producing _guides_ for translating physical representations back to values, such as -- * axis labels and marks; -- * color or shape legends. -- There are currently 131 scale functions; some examples are ```r scale_color_gradient scale_shape_manual scale_x_log10 scale_color_manual scale_size_area scale_y_log10 scale_fill_gradient scale_x_sqrt scale_fill_manual scale_y_sqrt ``` -- An [experimental tool](https://ggplot2tor.com/scales/) to help choosing scales has recently been introduced. --- Start with a basic scatter plot: .pull-left.small-code[ ```r (p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()) ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-basic-1.png" style="display: block; margin: auto;" /> ] --- Remove the `x` tick marks and labels (this can also be done with theme settings): .pull-left.small-code[ ```r p + scale_x_continuous(labels = NULL, breaks = NULL) ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-no-ticks-labs-1.png" style="display: block; margin: auto;" /> ] --- Change the tick locations and labels: .pull-left.small-code[ ```r p + scale_x_continuous(labels = paste(c(2, 4, 6), "ltr"), breaks = c(2, 4, 6)) ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-new-ticks-labs-1.png" style="display: block; margin: auto;" /> ] --- Use a logarithmic axis: .pull-left.small-code[ ```r p + scale_x_log10(labels = paste(c(2, 4, 6), "ltr"), breaks = c(2, 4, 6), minor_breaks = c(3, 5, 7)) ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-log-x-1.png" style="display: block; margin: auto;" /> ] --- The [Scales](https://r4ds.had.co.nz/graphics-for-communication.html#scales) section in [R for Data Science](https://r4ds.had.co.nz/) provides some more details. -- Color assignment can also be controlled by scale functions. -- For example, for some presidential approval ratings data .pull-left.small-code[ ```r pr_appr ## pres appr party year ## 1 Obama 79 D 2009 ## 2 Carter 78 D 1977 ## 3 Clinton 68 D 1993 ## 4 G.W. Bush 65 R 2001 ## 5 Reagan 58 R 1981 ## 6 G.H.W Bush 56 R 1989 ## 7 Trump 40 R 2017 ``` {{content}} ] -- the default color scale is not ideal: ```r ggplot(pr_appr, aes(x = appr, y = pres, fill = party)) + geom_col() ``` -- .pull-right[ <img src="ggplot_files/figure-html/pr-appr0-1.png" style="display: block; margin: auto;" /> ] --- The common assignment of red for Republican and blue for Democrat can be obtained by .pull-left.small-code[ ```r ggplot(pr_appr, aes(x = appr, y = pres, fill = party)) + geom_col() + * scale_fill_manual(values * = c(R = "red", D = "blue")) ``` ] .pull-right[ <img src="ggplot_files/figure-html/pr-appr-1.png" style="display: block; margin: auto;" /> ] --- A better choice is to use a well-designed [color palette](https://hclwizard.org/#color-palettes): .pull-left.small-code[ ```r ggplot(pr_appr, aes(x = appr, y = pres, fill = party)) + geom_col() + * colorspace::scale_fill_discrete_diverging( * palette = "Blue-Red 2") ``` ] .pull-right[ <img src="ggplot_files/figure-html/pr-appr-2-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Facets --- Faceting uses the _small multiples_ approach to introduce additional variables. -- For a single variable `facet_wrap` is usually used: .pull-left.small-code[ ```r p <- ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) p + facet_wrap(~ class) ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-facet-wrap-1.png" style="display: block; margin: auto;" /> ] --- .pull-left.small-code[ For two variables, each with a modest number of categories, `facet_grid` can be effective: ```r p + facet_grid(factor(cyl) ~ drv) ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-facet-grid-1.png" style="display: block; margin: auto;" /> ] --- <!-- Using the previous mpg facet plot would be better but this is one of the homework problems in HW3 --> .pull-left.small-code.width-40[ To show common data in all facets make sure the data does not contain the faceting variable. {{content}} ] -- This was used to show muted views of the full data in faceted plots. {{content}} -- A faceted plot of the `gapminder` data: ```r library(gapminder) years_to_keep <- c(1977, 1987, 1997, 2007) gd <- filter(gapminder, year %in% years_to_keep) ggplot(gd, aes(x = gdpPercap, y = lifeExp, color = continent)) + geom_point(size = 2.5) + scale_x_log10() + facet_wrap(~ year) ``` -- .pull-right.width-60[ <img src="ggplot_files/figure-html/gapminder-not-muted-1.png" style="display: block; margin: auto;" /> ] --- .pull-left.small-code.width-40[ Add a muted version of the full data in the background of each panel: <!-- variant of code in prercep.Rmd --> ```r library(gapminder) years_to_keep <- c(1977, 1987, 1997, 2007) gd <- filter(gapminder, year %in% years_to_keep) *gd_no_year <- mutate(gd, year = NULL) ggplot(gd, aes(x = gdpPercap, y = lifeExp, color = continent)) + * geom_point(data = gd_no_year, * color = "grey80") + geom_point(size = 2.5) + scale_x_log10() + facet_wrap(~ year) ``` ] .pull-right.width-60[ <img src="ggplot_files/figure-html/gapminder-muted-1.png" style="display: block; margin: auto;" /> ] --- Usually facets use common axis scales, but one or both can be allowed to vary. -- A useful approach for showing time series data with a good aspect ratio can be to split the data into facets for non-overlapping portions of the time axis. -- .pull-left.small-code[ ```r pd <- rep(paste(seq(1, by = 32, length.out = 4), seq(32, by = 32, length.out = 4), sep = " - "), each = 32) rd <- data.frame(month = seq_along(river), flow = river, panel = pd) ggplot(rd, aes(x = month, y = flow)) + geom_point() + facet_wrap(~ panel, scale = "free_x", ncol = 1) ``` ] .pull-right[ <img src="ggplot_files/figure-html/river-facet-1.png" style="display: block; margin: auto;" /> ] --- Facet arrangement can also be used to convey other information, such as geographic location. -- The [`geofacet` package](https://hafen.github.io/geofacet/) allows facets to be placed in approximate locations of different geographic regions. -- .pull-left.small-code[ An example for data from US states: ```r library(geofacet) ggplot(state_unemp, aes(year, rate)) + geom_line() + facet_geo(~ state, grid = "us_state_grid2", label = "code") + scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) + labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016", caption = "Data Source: bls.gov", x = "Year", y = "Unemployment Rate (%)") + theme(strip.text.x = element_text(size = 6), axis.text = element_text(size = 5)) ``` ] .pull-right[ <img src="ggplot_files/figure-html/geofacet-1.png" style="display: block; margin: auto;" /> ] -- Arrangement according to a calendar can also be useful. --- layout: true ## Themes --- `ggplot2` supports the notion of _themes_ for adjusting non-data appearance aspects of a plot, such as -- * plot titles -- * axis and legend placement and titles -- * background colors -- * guide line placement -- Theme elements can be customized in several ways: -- * `theme()` can be used to adjust individual elements in a plot. -- * `theme_set()` adjusts default settings for a session; -- * pre-defined theme functions allow consistent style changes. -- The [full documentation](https://ggplot2.tidyverse.org/reference/theme.html) of the `theme` function lists many customizable elements. --- One simple example: .pull-left.small-code[ ```r ggplot(mutate(mpg, cyl = factor(cyl))) + geom_point(aes(x = displ, y = hwy, fill = cyl), shape = 21, size = 3) + * theme(legend.position = "top", * axis.text = element_text(size = 12), * axis.title = element_text(size = 14, * face = "bold")) ``` ] .pull-right[ <img src="ggplot_files/figure-html/theme-simple-1.png" style="display: block; margin: auto;" /> ] --- Another example: .pull-left.small-code[ ```r gthm <- theme(plot.background = element_rect(fill = "lightblue", color = NA), panel.background = element_rect(fill = "pink")) p + gthm ``` ] .pull-right[ <img src="ggplot_files/figure-html/theme-simple-2-1.png" style="display: block; margin: auto;" /> ] --- Some alternate complete themes provided by `ggplot2` are ```r theme_bw theme_gray theme_minimal theme_void theme_classic theme_grey theme_dark theme_light ``` -- .pull-left.small-code[ Some examples: ```r p_bw <- p + theme_bw() + ggtitle("BW") p_classic <- p + theme_classic() + ggtitle("Classic") p_min <- p + theme_minimal() + ggtitle("Minimal") p_void <- p + theme_void() + ggtitle("Void") library(patchwork) (p_bw + p_classic) / (p_min + p_void) ``` ] .pull-right[ <img src="ggplot_files/figure-html/alt-themes-1.png" style="display: block; margin: auto;" /> ] --- The [`ggthemes`](http://www.rpubs.com/Mentors_Ubiqum/ggthemes_1) package provides some additional themes. .pull-left.small-code[ Some examples: ```r library(ggthemes) p_econ <- p + theme_economist() + ggtitle("Economist") p_wsj <- p + theme_wsj() + ggtitle("WSJ") p_tufte <- p + theme_tufte() + ggtitle("Tufte") p_few <- p + theme_few() + ggtitle("Few") (p_econ + p_wsj) / (p_tufte + p_few) ``` ] .pull-right[ <img src="ggplot_files/figure-html/ggthemes-examples-1.png" style="display: block; margin: auto;" /> ] --- `ggthemes` also provides `theme_map` that removes unnecessary elements from maps: ```r m + coord_map() + theme_map() ``` <img src="ggplot_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> -- The [Themes](https://r4ds.had.co.nz/graphics-for-communication.html#themes) section in [R for Data Science](https://r4ds.had.co.nz/) provides some more details. --- layout: false ## A More Complete Template ```r ggplot(data = <DATA>) + <GEOM>(mapping = aes(<MAPPINGS>), stat = <STAT>, position = <POSITION>) + < ... MORE GEOMS ... > + <COORDINATE_ADJUSTMENT> + <SCALE_ADJUSTMENT> + <FACETING> + <THEME_ADJUSTMENT> ``` --- layout: true ## Labels and Annotations --- A basic plot: .pull-left.small-code[ ```r p <- ggplot(mpg, aes(x = displ, y = hwy)) p1 <- p + geom_point(aes(color = factor(cyl)), size = 2.5) p1 ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-ann-1.png" style="display: block; margin: auto;" /> ] -- Axis labels are based on the expressions given to `aes`. -- This is convenient for exploration but usually not ideal for a report. --- The `labs()` function can be used to change axis and legend labels: .pull-left.small-code[ ```r p1 + labs(x = "Displacement (Liters)", y = "Highway Miles Per Gallon", color = "Cylinders") ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-ann-labs-1.png" style="display: block; margin: auto;" /> ] --- The `labs()` function can also add a title, subtitle, and caption: .pull-left.small-code[ ```r p2 <- p1 + labs(x = "Displacement (Liters)", y = "Highway Miles Per Gallon", color = "Cylinders", title = "Gas Mileage and Displacement", subtitle = paste("For models which had a new release every year", "between 1999 and 2008"), caption = "Data Source: https://fueleconomy.gov/") p2 ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-ann-labs-2-1.png" style="display: block; margin: auto;" /> ] --- Annotations can be used to provide popout that draws a viewer's attention to particular features. -- The `annotate()` function is one option: .pull-left.small-code[ ```r p2 + annotate("label", x = 2.8, y = 43, label = "Volkswagens") + annotate("rect", xmin = 1.7, xmax = 2.1, ymin = 40, ymax = 45, fill = NA, color = "black") ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-ann-popout-1.png" style="display: block; margin: auto;" /> ] --- Often more convenient are some `geom_mark` objects provided by the `ggforce` package: .pull-left.small-code[ ```r library(ggforce) p2 + geom_mark_hull(aes(filter = class == "2seater"), description = paste("2-Seaters have high displacement", "values, but also high fuel efficiency", "for their displacement.")) + geom_mark_rect(aes(filter = hwy > 40), description = "These are Volkswagens") + geom_mark_circle(aes(filter = hwy == 12), description = "Three pickups and an SUV.") ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-ann-popout-2-1.png" style="display: block; margin: auto;" /> ] -- These annotations can be customized in a number of ways. --- layout: false ## Arranging Plots There are several tools available for assembling ensemble plots. -- The [`patchwork`](https://patchwork.data-imaginist.com/) package is a good choice. -- .pull-left.small-code[ A simple example: ```r p1 <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() p2 <- ggplot(mpg, aes(x = cyl, y = hwy, group = cyl)) + geom_boxplot() p3 <- ggplot(mpg, aes(x = cyl)) + geom_bar() library(patchwork) (p1 + p2) / p3 ``` ] .pull-right[ <img src="ggplot_files/figure-html/mpg-patchwork-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Interaction --- ### Plotly The `ggplotly` function in the [`plotly` package](https://plotly.com/r/) can be used to add some interactive features to a plot created with `ggplot2`. -- * In an R session a call to `ggplotly()` opens a browser window with the interactive plot. -- * In an Rmarkdown document the interactive plot is embedded in the `html` file. -- Another interactive plotting approach that can be used from R is described in an [Infoworld article](https://www.infoworld.com/article/3607068/plot-in-r-with-echarts4r.html). --- A simple example using `ggplotly()`: .pull-left.small-code[ ```r library(ggplot2) library(plotly) p <- ggplot(mutate(mpg, cyl = factor(cyl))) + geom_point(aes(x = displ, y = hwy, fill = cyl), shape = 21, size = 3) ggplotly(p) ``` ] .pull-right[
] --- Adding a `text` aesthetic allows the tooltip display to be customized: .pull-left.small-code[ ```r p <- ggplot(mutate(mpg, cyl = factor(cyl))) + geom_point(aes(x = displ, y = hwy, fill = cyl, * text = paste(year, * manufacturer, * model)), shape = 21, size = 3) ggplotly(p, tooltip = "text") %>% style(hoverlabel = list(bgcolor = "white")) ``` ] .pull-right[
] --- ### Ggiraph .pull-left.small-code[ The [`ggiraph` package](https://davidgohel.github.io/ggiraph/) provides another approach. ```r library(ggplot2) library(ggiraph) p <- ggplot(mutate(mpg, cyl = factor(cyl))) + * geom_point_interactive(aes(x = displ, y = hwy, fill = cyl, * tooltip = paste(year, * manufacturer, * model)), shape = 21, size = 3) *girafe(ggobj = p) ``` ] .pull-right[
] --- ### Grammar of Interactive Graphics -- There have been several efforts to develop a grammar of interactive graphics, including [`ggvis`](https://ggvis.rstudio.com/) and [`animint`](https://tdhock.github.io/animint/); neither seems to be under active development at this time. -- A promising approach is [Vega-Lite](https://vega.github.io/vega-lite/), with a Python interface [Altair](https://altair-viz.github.io/) and an R interface [altair](https://vegawidget.github.io/altair/) to the Python interface. --- An example using the `altair` package: .small-code[ ```r rub <- read.csv(here::here("rubber.csv")) library(altair) chartTH <- alt$Chart(rub)$ mark_point()$ encode(x = alt$X("H:Q", scale = alt$Scale(domain = range(rub$H))), y = alt$Y("T:Q", scale = alt$Scale(domain = range(rub$T)))) brush <- alt$selection_interval() chartTH_brush <- chartTH$add_selection(brush) chartTH_selection <- chartTH_brush$encode(color = alt$condition(brush, "Origin:N", alt$value("lightgray"))) chartAT <- chartTH_selection$ encode(x = alt$X("T:Q", scale = alt$Scale(domain = range(rub$T))), y = alt$Y("A:Q", scale = alt$Scale(domain = range(rub$A)))) chartAT | chartTH_selection ``` ] --- The resulting linked plots:
--- layout: false ## Notes * A recent project [`gganimate`](https://github.com/thomasp85/gganimate) to add animation to `ggplot` looks very promising. -- * A number of other [`ggplot` extensions](https://exts.ggplot2.tidyverse.org/) are available. -- * A [blog post](https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535) explains how the [BBC Visual and Data Journalism](https://medium.com/bbc-visual-and-data-journalism) team creates their graphics. More details are provided in an [_R cook book_](https://bbc.github.io/rcookbook/). -- * A [blog post](https://blog.revolutionanalytics.com/2016/07/data-journalism-with-r-at-538.html) on use of R and `ggplot` by [FiveThirtyEight](https://fivethirtyeight.com/). The `ggthemes` packages includes `theme_fivethirtyeight` to emulate their style. --- ## Reading Chapters [_Data visualization_](https://r4ds.had.co.nz/data-visualisation.html) and [_Graphics for communication_](https://r4ds.had.co.nz/graphics-for-communication.html) in [_R for Data Science_](https://r4ds.had.co.nz/), O'Reilly. Chapter [_Make a plot_](https://socviz.co/makeplot.html) in [_Data Visualization_](https://socviz.co/). Chapter [_ggplot2_](https://rafalab.dfci.harvard.edu/dsbook/ggplot2.html) in [_Introduction to Data Science Data Analysis and Prediction Algorithms with R_](https://rafalab.dfci.harvard.edu/dsbook/). --- layout: false ## Interactive Tutorial An interactive [`learnr`](https://rstudio.github.io/learnr/) tutorial for these notes is [available](../tutorials/ggplot.Rmd). You can run the tutorial with ```r STAT4580::runTutorial("ggplot") ``` You can install the current version of the `STAT4580` package with ```r remotes::install_gitlab("luke-tierney/STAT4580") ``` You may need to install the `remotes` package from CRAN first. --- layout: true ## Exercises --- 1) In the following expression, which value of the `shape` aesthetic produces a plot with points represented as triangles outlined in black colored according to the number of cylinders? <!-- ## nolint start --> ```r library(ggplot2) ggplot(mpg, aes(x = displ, y = hwy, fill = factor(cyl))) + geom_point(size = 4, shape = ---) ``` <!-- ## nolint end --> * a. 15 * b. 17 * c. 21 * d. 24 --- 2) It can sometimes be useful to plot text labels in a scatterplot instead of points. Consider the plot set up as ```r library(ggplot2) library(dplyr) data(gapminder, package = "gapminder") p <- filter(gapminder, year == 2007) %>% group_by(continent) %>% summarize(gdpPercap = mean(gdpPercap), lifeExp = mean(lifeExp)) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) ``` Which of the following produces a plot with continent names on white rectangles? * a. `p + geom_text(aes(label = continent))` * b. `p + geom_label(aes(label = continent))` * c. `p + geom_label(label = continent)` * d. `p + geom_text(text = continent)` --- 3) The following code plots a _kernel density estimate_ for the `eruptions` variable in the `faithful` data set: ```r library(ggplot2) ggplot(faithful, aes(x = eruptions)) + geom_density(bw = 0.1) ``` Look at the help page for `geom_density`. Which of the following best describes what specifying a value for `bw` does: * a. Changes the _kernel_ used to construct the estimate. * b. Changes the _smoothing bandwidth_ to make the result more or less smooth. * c. Changes the `stat` used to `stat_bw`. * d. Has no effect on the retult. --- 4) This code creates a map of Iowa counties. ```r library(ggplot2) p <- ggplot(map_data("county", "iowa"), aes(x = long, y = lat, group = group)) + geom_polygon(, fill = "White", color = "black") ``` Which of these produces a plot with an aspect ratio that best matches the map on [this page](https://en.wikipedia.org/w/index.php?title=List_of_counties_in_Iowa&oldid=1001171082)? * a. `p` * b. `p + coord_fixed(0.75)` * c. `p + coord_fixed(1.25)` * d. `p + coord_fixed(2)` --- 5) Consider the two plots created by this code (print the values of `p1` and `p2` to see the plots): ```r library(ggplot2) data(gapminder, package = "gapminder") p1 <- ggplot(gapminder, aes(x = log(gdpPercap), y = lifeExp)) + geom_point() + scale_x_continuous(name = "") p2 <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + scale_x_log10(labels = scales::comma, name = "") ``` Which of these statements is true? * a. The `x` axis labels are identical in both plots. * b. The `x` axis labels in `p2` are in dollars; the labels in `p1` are in log dollars. * c. The `x` axis labels in `p1` are in dollars; the labels in `p2` are in log dollars. * d. There are no labels on the `x` axis in `p2`. --- 6) Consider the plot created by ```r library(ggplot2) data(gapminder, package = "gapminder") p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + scale_x_log10(labels = scales::comma) ``` Which of these expressions produces a plot with a white background? * a. `p` * b. `p + theme_grey()` * c. `p + theme_classic()` * d. `p + ggthemes::theme_economist()` --- 7) There are many different ways to change the `x` axis label in `ggplot`. Consider the plot created by ```r library(ggplot2) p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() ``` Which of the following does **not** change the `x` axis label to _Displacement_? * a. `p + labs(x = "Displacement")` * b. `p + scale_x_continuous("Displacement")` * c. `p + xlab("Displacement")` * d. `p + theme(axis.title.x = "Displacement")`
//adapted from Emi Tanaka's gist at //https://gist.github.com/emitanaka/eaa258bb8471c041797ff377704c8505