--- ############################################################# # # # In RStudio click on "Run Document" to run this tutorial # # # ############################################################# title: "The Grammar of Graphics" author: "Luke Tierney" output: learnr::tutorial runtime: shiny_prerendered --- ```{r setup, include = FALSE} library(learnr) library(tidyverse) knitr::opts_chunk$set(echo = FALSE, comment = "", warning = FALSE) ``` ```{r stop_when_browser_closes, context = "server"} # stop the app when the browser is closed (or, unfortunately, refreshed) session$onSessionEnded(stopApp) ``` ## Scatter Plots Using the `mpg` data frame included in `ggplot2` ```{r} mpg ``` create a simple scatter plot of `hwy` against `displ`: ```{r mpg-simple, exercise = TRUE} ``` ```{r mpg-simple-solution} ggplot(mpg) + geom_point(aes(x = displ, y = hwy)) ``` Show the vehicle class by mapping `color` to `class`. ```{r mpg-color, exercise = TRUE} ``` ```{r mpg-color-solution} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class)) ``` Try increasing the size of the points to make the colors easier to distinguish. ```{r mpg-color-size, exercise = TRUE} ``` ```{r mpg-color-size-solution} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class), size = 3) ``` It may also help to use a shape that supports the `fill` aesthetic and can show a border. Try using one of the shapes that supports this. ```{r mpg-color-size-shape, exercise = TRUE} ``` ```{r mpg-color-size-shape-solution} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, fill = class), shape = 21, size = 3) ``` The default grey background is not always ideal. An alternate theme is provided by `theme_minimal()`. Change your plot to use this theme. ```{r mpg-minimal, exercise = TRUE} ``` ```{r mpg-minimal-solution} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, fill = class), shape = 21, size = 3) + theme_minimal() ``` The minimal theme may be a little too minimal. Add a border to the plot panel. ```{r mpg-minimal-border, exercise = TRUE} ``` ```{r mpg-minimal-border-solution} ggplot(mpg) + geom_point(aes(x = displ, y = hwy, fill = class), shape = 21, size = 3) + theme_minimal() + theme(panel.border = element_rect(color = "grey", fill = NA)) ``` ## State Murder Rates for 2010 The `dslabs` package contains in the data set `murders` data from the FBI on the number of gun murders by state in 2010. The data set also contains the state population counts for that year. We can access the data set and compute the murder rate per 100,000 inhabitants as ```{r setup-murders, echo = TRUE} data(murders, package = "dslabs") library(dplyr) murders <- mutate(murders, rate = total / population * 100000) ``` Use `geom_col()` to create a bar chart showing murder rates for different states. Using the state abbreviation in the `abb` variable is probably best. Try both vertical and horizontal bars and see which works better. ```{r murder-bar, exercise = TRUE, exercise.setup = "setup-murders"} ``` ```{r murder-bar-solution} ggplot(murders, aes(y = abb, x = rate)) + geom_col() ``` Try vertical bars with faceting by state abbreviation. This will not work well unless you use free scales for the `x` axis. ```{r murder-facet, exercise = TRUE, exercise.setup = "setup-murders", fig.height = 6} ``` ```{r murder-facet-solution} ggplot(murders, aes(x = abb, y = rate)) + geom_col() + facet_wrap(~abb, scales = "free_x") ``` Using `geofacet` may be more useful. Try that and see how it works. ```{r murder-geofacet, exercise = TRUE, exercise.setup = "setup-murders", fig.height = 6} ``` ```{r murder-geofacet-solution} library(geofacet) ggplot(murders, aes(x = abb, y = rate)) + geom_col() + facet_geo(~abb, grid = "us_state_grid2", scales = "free_x") ``` To improve the result you can remove the redundant strips at the top of the facets, use a simpler theme like `theme_minimal()`, and add a better `y` axis label. ```{r murder-geofacet-improved, exercise = TRUE, exercise.setup = "setup-murders", fig.height = 6} ``` ```{r murder-geofacet-improved-solution} library(geofacet) ggplot(murders, aes(x = abb, y = rate)) + geom_col() + facet_geo(~abb, grid = "us_state_grid2", scales = "free_x") + theme_minimal() + theme(strip.background = element_blank(), strip.text = element_blank()) + labs(x = NULL, y = "Murders Per 100,000") ``` ## Exercises ### Exercise 1 In the following expression, which value of the `shape` aesthetic produces a plot with points represented as triangles outlined in black colored according to the number of cylinders? ```{r triangles-exercise, exercise = TRUE} library(ggplot2) ggplot(mpg, aes(x = displ, y = hwy, fill = factor(cyl))) + geom_point(size = 4, shape = ---) ``` ```{r triangles-question, echo = FALSE} question( "", answer("15"), answer("17"), answer("21"), answer("24", correct = TRUE), allow_retry = TRUE ) ``` ### Exercise 2 It can sometimes be useful to plot text labels in a scatterplot instead of points. Consider the plot set up as ```{r label-plot-exercise, exercise = TRUE} library(ggplot2) library(dplyr) data(gapminder, package = "gapminder") p <- filter(gapminder, year == 2007) %>% group_by(continent) %>% summarize(gdpPercap = mean(gdpPercap), lifeExp = mean(lifeExp)) %>% ggplot(aes(x = gdpPercap, y = lifeExp)) ``` ```{r label-plot-question, echo = FALSE} question( "Which of the following produces a plot with continent names on white rectangles?", answer("`p + geom_text(aes(label = continent))`"), answer("`p + geom_label(aes(label = continent))`", correct = TRUE), answer("`p + geom_label(label = continent)`"), answer("`p + geom_text(text = continent)`"), allow_retry = TRUE ) ``` ### Exercise 3 The following code plots a _kernel density estimate_ for the `eruptions` variable in the `faithful` data set: ```{r kde-bw-exercise, exercise = TRUE} library(ggplot2) ggplot(faithful, aes(x = eruptions)) + geom_density(bw = 0.1) ``` Look at the help page for `geom_density`. ```{r kde-bw-question, echo = FALSE} question( "Which of the following best describes what specifying a value for `bw` does:", answer("Changes the _kernel_ used to construct the estimate."), answer("Changes the _smoothing bandwidth_ to make the result more or less smooth.", correct = TRUE), answer("Changes the `stat` used to `stat_bw`."), answer("Has no effect on the result."), allow_retry = TRUE ) ``` ### Exercise 4 This code creates a map of Iowa counties. ```{r iowa-map-exercise, exercise = TRUE} library(ggplot2) p <- ggplot(map_data("county", "iowa"), aes(x = long, y = lat, group = group)) + geom_polygon(, fill = "White", color = "black") ``` ```{r iowa-map-question, echo = FALSE} question( "Which of these produces a plot with an aspect ratio that best matches the map on [this page](https://en.wikipedia.org/w/index.php?title=List_of_counties_in_Iowa&oldid=1001171082)?", answer("`p`"), answer("`p + coord_fixed(0.75)`"), answer("`p + coord_fixed(1.25)`", correct = TRUE), answer("`p + coord_fixed(2)`"), allow_retry = TRUE ) ``` ### Exercise 5 Consider the two plots created by this code (print the values of `p1` and `p2` to see the plots): ```{r x-axis-labels-exercise, exercise = TRUE} library(ggplot2) library(patchwork) data(gapminder, package = "gapminder") p1 <- ggplot(gapminder, aes(x = log(gdpPercap), y = lifeExp)) + geom_point() + scale_x_continuous(name = "") + labs(title = "p1") p2 <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + scale_x_log10(labels = scales::label_comma(), name = "") + labs(title = "p2") p1 + p2 ``` ```{r x-axis-labels-question, echo = FALSE} question( "Which of these statements is true?", answer('The `x` axis labels are identical in both plots.'), answer('The `x` axis labels in `p2` are in dollars; the labels in `p1` are in log dollars.', correct = TRUE), answer('The `x` axis labels in `p1` are in dollars; the labels in `p2` are in log dollars.'), answer('There are no labels on the `x` axis in `p2`.'), random_answer_order = TRUE, allow_retry = TRUE ) ``` ### Exercise 6 Consider the plot created by ```{r white-background-exercise, exercise = TRUE} library(ggplot2) data(gapminder, package = "gapminder") p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) + geom_point() + scale_x_log10(labels = scales::label_comma()) p ``` ```{r white-background-question, echo = FALSE} question( "Which of these expressions produces a plot with a white background?", answer('`p`'), answer('`p + theme_grey()`'), answer('`p + theme_classic()`', correct = TRUE), answer('`p + ggthemes::theme_economist()`'), random_answer_order = TRUE, allow_retry = TRUE ) ``` ### Exercise 7 There are many different ways to change the `x` axis label in `ggplot`. Consider the plot created by ```{r no-x-change-exercise, exercise = TRUE} library(ggplot2) p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point() p ``` ```{r no-x-change-question, echo = FALSE} question( "Which of the following does **not** change the `x` axis label to _Displacement_?", answer('`p + labs(x = "Displacement")`'), answer('`p + theme(axis.title.x = "Displacement")`', correct = TRUE), answer('`p + xlab("Displacement")`'), answer('`p + scale_x_continuous("Displacement")`'), random_answer_order = TRUE, allow_retry = TRUE ) ```