Background
The Grammar of Graphics is a language proposed by Leland
Wilkinson for describing statistical graphs.
Wilkinson, L. (2005), The Grammar of Graphics , 2nd ed.,
Springer.
The grammar of graphics has served as the foundation for the graphics
frameworks in SPSS , Vega-Lite and several other
systems.
ggplot2 represents an implementation and extension of
the grammar of graphics for R.
Wickham, H. (2016), ggplot2: Elegant Graphics for Data
Analysis , 2nd ed., Springer. 3rd ed. in progress .
On line documentation: https://ggplot2.tidyverse.org/reference/index.html .
Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023),
R for Data Science (2nd
Edition) , O’Reilly.
Data
visualization cheatsheet
Winston Chang (2018), R
Graphics Cookbook , 2nd edition , O’Reilly. (Book source on GitHub )
The idea is that any basic plot can be built out of a combination
of
a data set;
one or more geometrical representation (geoms );
mappings of values to aesthetic features of the
geom;
a stat to produce values to be mapped;
position adjustments;
a coordinate system;
a scale specification;
a faceting scheme.
ggplot2 provides tools for specifying these components
and adjusting their features.
Many components and features are provided by default and do not need
to be specified explicitly unless the defaults are to be changed.
A Basic Template
The simplest graph needs a data set, a geom, and a mapping:
ggplot(data = <DATA>) + <GEOM>(mapping = aes(<MAPPINGS>))
The appearance of geom objects is controlled by aesthetic
features.
Each geom has some required and some optional aesthetics.
For geom_point the required aesthetics are
Optional aesthetics include
color (or colour)
fill
shape
size
geom_point is used to produce a scatter
plot .
Scatter Plots Using geom_point
The mpg data set included in the ggplot2
package includes EPA fuel economy data from 1999 to 2008 for 38 popular
models of cars.
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
A simple scatter plot:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy))
Map color to vehicle class:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = class))
And map shape to number of cylinders:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = class,
shape = factor(cyl)))
Perception:
Too many colors;
shapes are too small;
interference between shapes and colors.
Aesthetics can be mapped to a variable or set to a fixed common
value.
This can be used to override default settings:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy),
color = "blue",
shape = 1)
Changing the size aesthetic makes shapes easier to
recognize:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = class,
shape = factor(cyl)),
size = 3)
Perception: Still too many colors; still have interference.
Available point shapes are specified by number:
Shapes 1-20 have their color set by the color aesthetic
and ignore the fill aesthetic.
For shapes 21-25 the color aesthetic specifies the
border color and fill specifies the interior
color .
Using shape 21 with cyl mapped to the
fill aesthetic:
ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl),
shape = 21,
size = 4)
Perception: Borders, larger symbols, fewer colors help.
Specifying a new default is very different from specifying a constant
value as an aesthetic.
Constant aesthetic: Rarely what you want:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = "blue"))
Default: Probably what you want:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy),
color = "blue")
Geometric Objects
ggplot2 provides a number of geoms:
geom_abline geom_area geom_bar geom_bin_2d
geom_bin2d geom_blank geom_boxplot geom_col
geom_contour geom_contour_filled geom_count geom_crossbar
geom_curve geom_density geom_density_2d geom_density_2d_filled
geom_density2d geom_density2d_filled geom_dotplot geom_errorbar
geom_errorbarh geom_freqpoly geom_function geom_hex
geom_histogram geom_hline geom_jitter geom_label
geom_line geom_linerange geom_map geom_path
geom_point geom_pointrange geom_polygon geom_qq
geom_qq_line geom_quantile geom_raster geom_rect
geom_ribbon geom_rug geom_segment geom_sf
geom_sf_label geom_sf_text geom_smooth geom_spoke
geom_step geom_text geom_tile geom_violin
geom_vline
Additional geoms are available in packages like ggforce,
ggridges, and others described on the ggplot2
extensions site .
Geoms can be added as layers to a plot.
Mappings common to all, or most, geoms can be specified in the
ggplot call:
ggplot(mpg,
aes(x = displ,
y = hwy)) +
geom_smooth() +
geom_point()
Geoms can also use different data sets.
One way to highlight Europe in a plot of life expectancy against log
income for 2007 is to start with a plot of the full data:
library(dplyr)
library(gapminder)
gm_2007 <- filter(gapminder, year == 2007)
(p <- ggplot(gm_2007, aes(x = gdpPercap,
y = lifeExp)) +
geom_point() +
scale_x_log10())
Then add a layer showing only Europe:
gm_2007_eu <- filter(gm_2007, continent == "Europe")
p + geom_point(data = gm_2007_eu,
color = "red",
size = 3)
Statistical Transformations
All geoms use a statistical transformation (stat ) to convert
raw data to the values to be mapped to the object’s features.
The available stats are
stat_align stat_bin stat_bin_2d
stat_bin_hex stat_bin2d stat_binhex
stat_boxplot stat_connect stat_contour
stat_contour_filled stat_count stat_density
stat_density_2d stat_density_2d_filled stat_density2d
stat_density2d_filled stat_ecdf stat_ellipse
stat_function stat_identity stat_manual
stat_qq stat_qq_line stat_quantile
stat_sf stat_sf_coordinates stat_smooth
stat_spoke stat_sum stat_summary
stat_summary_2d stat_summary_bin stat_summary_hex
stat_summary2d stat_unique stat_ydensity
Each geom has a default stat, and each stat has a default geom.
For geom_point the default stat is
stat_identity.
For geom_bar the default stat is
stat_count.
For geom_histogram the default stat is
stat_bin.
Stats can provide computed variables that can be mapped to
aesthetic features.
For stat_bin some of the computed variables are
count: number of points in bin
density: density of points in bin, scaled to integrate
to 1
The density variable can be accessed as
after_stat(density).
Older approaches that also work but are now discouraged:
stat(density)
..density..
By default, geom_histogram uses
y = after_stat(count).
ggplot(faithful) +
geom_histogram(aes(x = eruptions),
binwidth = 0.25,
fill = "grey",
color = "black")
Explicitly specifying y = after_stat(count) produces the
same plot:
ggplot(faithful) +
geom_histogram(aes(x = eruptions,
y = after_stat(count)),
binwidth = 0.25,
fill = "grey",
color = "black")
Using y = after_stat(density) produces a density scaled
axis.
(p <- ggplot(faithful) +
geom_histogram(aes(x = eruptions,
y = after_stat(density)),
binwidth = 0.25,
fill = "grey",
color = "black"))
stat_function can be used to add a density curve
specified as a mixture of two normal densities:
(ms <- mutate(faithful,
type = ifelse(eruptions < 3,
"short",
"long")) |>
group_by(type) |>
summarize(mean = mean(eruptions),
sd = sd(eruptions),
n = n()) |>
mutate(p = n / sum(n)))
## # A tibble: 2 × 5
## type mean sd n p
## <chr> <dbl> <dbl> <int> <dbl>
## 1 long 4.29 0.411 175 0.643
## 2 short 2.04 0.267 97 0.357
f <- function(x)
ms$p[1] * dnorm(x, ms$mean[1], ms$sd[1]) +
ms$p[2] * dnorm(x, ms$mean[2], ms$sd[2])
p + stat_function(fun = f, color = "red")
Position Adjustments
The available position adjustments:
position_dodge position_dodge2 position_fill
position_identity position_jitter position_jitterdodge
position_nudge position_stack
A bar chart showing the counts for the different cut
categories in the diamonds data:
ggplot(diamonds, aes(x = cut)) +
geom_bar()
Mapping clarity to fill shows the breakdown
by both cut and clarity in a stacked bar
chart :
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar()
The default position for bar charts is
position_stack:
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar(position = "stack")
position_dodge produces side-by-side bar
charts :
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar(position = "dodge")
position_fill rescales all bars to be equal height to
help compare proportions within bars.
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar(position = "fill")
Using the counts to scale the widths would produce a spine
plot , a variant of a mosaic plot .
This is easiest to do with the ggmosaic package.
position_jitter can be used with geom_point
to avoid overplotting or break up rounding artifacts.
Another version of the Old Faithful data available as
geyser in package MASS has some rounding in
the duration variable:
data(geyser, package = "MASS")
## Adjust for different meaning of `waiting` variable
geyser2 <- na.omit(mutate(geyser,
duration = lag(duration)))
p <- ggplot(geyser2, aes(x = duration, y = waiting))
p + geom_point()
Jittering can help break up the distracting heaping
of values on durations of 2 and 4 minutes.
The default amount of jittering isn’t quite enough in this case:
p + geom_point(position = "jitter")
To jitter only horizontally and by a larger amount you can use
p + geom_point(position =
position_jitter(height = 0,
width = 0.1))
Coordinate Systems
Coordinate system functions include
coord_cartesian coord_equal coord_fixed coord_flip
coord_map coord_munch coord_polar coord_quickmap
coord_radial coord_sf coord_trans coord_transform
The default coordinate system is coord_cartesian.
Cartesian Coordinates
coord_cartesian can be used to zoom in on a
particular regiion:
p + geom_point() +
coord_cartesian(xlim = c(3, 4))
coord_fixed and coord_equal fix the
aspect ratio for a cartesian coordinate system.
The aspect ratio is the ratio of the number physical display units
per y unit to the number of physical display units per
x unit.
The aspect ratio can be important for recognizing features and
patterns.
river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat")
r <- data.frame(flow = river, month = seq_along(river))
ggplot(r, aes(x = month, y = flow)) +
geom_point() +
coord_fixed(ratio = 4)
Polar Coordinates
A filled bar chart
(p <- ggplot(diamonds) +
geom_bar(aes(x = 1, fill = cut),
position = "fill"))
is turned into a pie chart by changing to polar coordinates:
p + coord_polar(theta = "y")
Coordinate Systems for Maps
Coordinate systems are particularly important for maps.
Polygons for many political and geographic boundaries are available
through the map_data function.
Boundaries for the lower 48 US states can be obtained as
usa <- map_data("state")
Polygon vertices are encoded by longitude and latitude.
Plotting these in the default cartesian coordinate system usually
does not work well:
usa <- map_data("state")
m <- ggplot(usa, aes(x = long,
y = lat,
group = group)) +
geom_polygon(fill = "white",
color = "black")
m
Using a fixed aspect ratio is better, but an aspect ratio of 1 does
not work well:
m + coord_equal()
The problem is that away from the equator a one degree change in
latitude corresponds to a larger distance than a one degree change in
longitude.
The ratio of one degree longitude separation to one degree latitude
separation for the latitude at the middle of Iowa of 41 degrees is
longlat <- cos(41 / 90 * pi / 2)
longlat
## [1] 0.7547096
A better map is obtained using the aspect ratio
1 / longlat:
m + coord_fixed(1 / longlat)
The best approach is to use a coordinate system designed specifically
for maps.
There are many projections used in map making.
The default projection used by coord_map is the Mercator
projection.
m + coord_map()
Proper map projections are non-linear; this is easier to see with an
Albers projection:
m + coord_map("albers", 20, 50)
Scales
Scales are used for controlling the mapping of values to physical
representations such as colors, shapes, and positions.
Scale functions are also responsible for producing guides
for translating physical representations back to values, such as
axis labels and marks;
color or shape legends.
There are currently 131 scale functions; some examples are
scale_color_gradient scale_shape_manual scale_x_log10
scale_color_manual scale_size_area scale_y_log10
scale_fill_gradient scale_x_sqrt
scale_fill_manual scale_y_sqrt
An experimental tool to
help choosing scales is available.
Start with a basic scatter plot:
(p <- ggplot(mpg, aes(x = displ,
y = hwy)) +
geom_point())
Remove the x tick marks and labels (this can also be
done with theme settings):
p + scale_x_continuous(labels = NULL,
breaks = NULL)
Change the tick locations and labels:
p + scale_x_continuous(labels =
paste(c(2, 4, 6), "ltr"),
breaks = c(2, 4, 6))
Use a logarithmic axis:
p + scale_x_log10(labels = paste(c(2, 4, 6), "ltr"),
breaks = c(2, 4, 6),
minor_breaks = c(3, 5, 7))
The Scales
section in R for Data Science
provides some more details.
Color assignment can also be controlled by scale functions.
For example, for some presidential approval ratings data
pr_appr
## pres appr party year
## 1 Obama 79 D 2009
## 2 Carter 78 D 1977
## 3 Clinton 68 D 1993
## 4 G.W. Bush 65 R 2001
## 5 Reagan 58 R 1981
## 6 G.H.W Bush 56 R 1989
## 7 Trump 40 R 2017
the default color scale is not ideal:
ggplot(pr_appr,
aes(x = appr, y = pres, fill = party)) +
geom_col()
The common assignment of red for Republican and blue for Democrat can
be obtained by
ggplot(pr_appr,
aes(x = appr, y = pres, fill = party)) +
geom_col() +
scale_fill_manual(values
= c(R = "red", D = "blue"))
A better choice is to use a well-designed color palette :
ggplot(pr_appr,
aes(x = appr, y = pres, fill = party)) +
geom_col() +
colorspace::scale_fill_discrete_diverging(
palette = "Blue-Red 2")
Facets
Faceting uses the small multiples approach to introduce
additional variables.
For a single variable facet_wrap is usually used:
p <- ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy))
p + facet_wrap(~ class)
For two variables, each with a modest number of categories,
facet_grid can be effective:
p + facet_grid(factor(cyl) ~ drv)
To show common data in all facets make sure the data does not contain
the faceting variable.
This was used to show muted views of the full data in faceted
plots.
A faceted plot of the gapminder data:
library(gapminder)
years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
year %in% years_to_keep)
ggplot(gd,
aes(x = gdpPercap,
y = lifeExp,
color = continent)) +
geom_point(size = 2.5) +
scale_x_log10() +
facet_wrap(~ year)
Add a muted version of the full data in the background of each
panel:
library(gapminder)
years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
year %in% years_to_keep)
gd_no_year <- mutate(gd, year = NULL)
ggplot(gd,
aes(x = gdpPercap,
y = lifeExp,
color = continent)) +
geom_point(data = gd_no_year,
color = "grey80") +
geom_point(size = 2.5) +
scale_x_log10() +
facet_wrap(~ year)
Usually facets use common axis scales, but one or both can be allowed
to vary.
A useful approach for showing time series data with a good aspect
ratio can be to split the data into facets for non-overlapping portions
of the time axis.
pd <- rep(paste(seq(1, by = 32, length.out = 4),
seq(32, by = 32, length.out = 4),
sep = " - "),
each = 32)
rd <- data.frame(month = seq_along(river),
flow = river,
panel = pd)
ggplot(rd, aes(x = month,
y = flow)) +
geom_point() +
facet_wrap(~ panel,
scale = "free_x", #<<
ncol = 1)
Facet arrangement can also be used to convey other information, such
as geographic location.
The geofacet
package allows facets to be placed in approximate locations of
different geographic regions.
An example for data from US states:
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
geom_line() +
facet_geo(~ state,
grid = "us_state_grid2",
label = "code") +
scale_x_continuous(labels =
function(x) paste0("'", substr(x, 3, 4))) +
labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
caption = "Data Source: bls.gov",
x = "Year",
y = "Unemployment Rate (%)") +
theme(strip.text.x = element_text(size = 6),
axis.text = element_text(size = 5))
Arrangement according to a calendar can also be useful.
Themes
ggplot2 supports the notion of themes for
adjusting non-data appearance aspects of a plot, such as
Theme elements can be customized in several ways:
theme() can be used to adjust individual elements in
a plot.
theme_set() adjusts default settings for a
session;
pre-defined theme functions allow consistent style
changes.
The full
documentation of the theme function lists many
customizable elements.
One simple example:
ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl),
shape = 21,
size = 3) +
theme(legend.position = "top",
axis.text = element_text(size = 12),
axis.title = element_text(size = 14,
face = "bold"))
Another example:
gthm <-
theme(plot.background =
element_rect(fill = "lightblue",
color = NA),
panel.background =
element_rect(fill = "pink"))
p + gthm
Some alternate complete themes provided by ggplot2
are
theme_bw theme_gray theme_minimal theme_void
theme_classic theme_grey theme_dark theme_light
Some examples:
p_bw <- p + theme_bw() + ggtitle("BW")
p_classic <- p + theme_classic() + ggtitle("Classic")
p_min <- p + theme_minimal() + ggtitle("Minimal")
p_void <- p + theme_void() + ggtitle("Void")
library(patchwork)
(p_bw + p_classic) / (p_min + p_void)
The ggthemes
package provides some additional themes.
Some examples:
library(ggthemes)
p_econ <- p + theme_economist() + ggtitle("Economist")
p_wsj <- p + theme_wsj() + ggtitle("WSJ")
p_tufte <- p + theme_tufte() + ggtitle("Tufte")
p_few <- p + theme_few() + ggtitle("Few")
(p_econ + p_wsj) / (p_tufte + p_few)
ggthemes also provides theme_map that
removes unnecessary elements from maps:
m + coord_map() + theme_map()
The Themes
section in R for Data Science
provides some more details.
A More Complete Template
ggplot(data = <DATA>) +
<GEOM>(mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>) +
< ... MORE GEOMS ... > +
<COORDINATE_ADJUSTMENT> +
<SCALE_ADJUSTMENT> +
<FACETING> +
<THEME_ADJUSTMENT>
Labels and Annotations
A basic plot:
p <- ggplot(mpg, aes(x = displ,
y = hwy))
p1 <- p + geom_point(aes(color = factor(cyl)),
size = 2.5)
p1
Axis labels are based on the expressions given to
aes.
This is convenient for exploration but usually not ideal for a
report.
The labs() function can be used to change axis and
legend labels:
p1 + labs(x = "Displacement (Liters)",
y = "Highway Miles Per Gallon",
color = "Cylinders")
The labs() function can also add a title, subtitle, and
caption:
p2 <- p1 +
labs(x = "Displacement (Liters)",
y = "Highway Miles Per Gallon",
color = "Cylinders",
title = "Gas Mileage and Displacement",
subtitle = paste("For models which had a new release every year",
"between 1999 and 2008"),
caption = "Data Source: https://fueleconomy.gov/")
p2
Annotations can be used to provide popout that draws a viewer’s
attention to particular features.
The annotate() function is one option:
p2 +
annotate("label", x = 2.8, y = 43,
label = "Volkswagens") +
annotate("rect",
xmin = 1.7, xmax = 2.1,
ymin = 40, ymax = 45,
fill = NA, color = "black")
Often more convenient are some geom_mark objects
provided by the ggforce package:
library(ggforce)
p2 +
geom_mark_hull(aes(filter = class == "2seater"),
description =
paste("2-Seaters have high displacement",
"values, but also high fuel efficiency",
"for their displacement.")) +
geom_mark_rect(aes(filter = hwy > 40),
description =
"These are Volkswagens") +
geom_mark_circle(aes(filter = hwy == 12),
description =
"Three pickups and an SUV.")
These annotations can be customized in a number of ways.
Arranging Plots
There are several tools available for assembling ensemble plots.
The patchwork
package is a good choice.
A simple example:
p1 <- ggplot(mpg, aes(x = displ,
y = hwy)) +
geom_point()
p2 <- ggplot(mpg, aes(x = cyl,
y = hwy,
group = cyl)) +
geom_boxplot()
p3 <- ggplot(mpg, aes(x = cyl)) +
geom_bar()
library(patchwork)
(p1 + p2) / p3
Animation
The gganimate
package can be used to add animation to a ggplot graph.
Start with a plot p for all years in the
gapminder data, with year in the
background:
p <- gapminder |>
arrange(desc(pop)) |>
ggplot(aes(x = gdpPercap, y = lifeExp)) +
geom_text(aes(x = 5000, y = 55, label = as.character(year)),
size = 50, color = "grey",
hjust = "center", vjust = "center") +
geom_point(aes(size = pop, fill = continent), shape = 21) +
scale_x_log10(labels = scales::comma) +
ylim(c(20, 85)) +
scale_size_area(max_size = 20,
labels = scales::comma,
breaks = c(0.25 * 10 ^ 9, 0.5 * 10 ^ 9, 10 ^ 9)) +
scale_fill_manual(values = c(Africa = "deepskyblue",
Asia = "red",
Americas = "green",
Europe = "gold",
Oceania = "brown")) +
labs(x = "Income", y = "Life expectancy") +
theme(text = element_text(size = 16)) +
guides(fill = guide_legend(title = "Continent",
override.aes = list(size = 5),
order = 1),
size = guide_legend(title = "Population",
label.hjust = 1,
order = 2)) +
theme_minimal() +
theme(panel.border = element_rect(fill = NA, color = "grey20"))
A GIF
animation:
library(gganimate)
animate(p +
transition_states(
year,
transition_length = 2,
state_length = 0))
A movie:
animate(p +
transition_states(
year,
transition_length = 2,
state_length = 0,
wrap = FALSE),
renderer = ffmpeg_renderer())
Interaction
Plotly
The ggplotly function in the plotly package can be used
to add some interactive features to a plot created with
ggplot2.
In an R session a call to ggplotly() may open a
browser window with the interactive plot.
In an RStudio session the plot appears in the graphics
panel.
In an Rmarkdown document the interactive plot is embedded in the
html file.
Another interactive plotting approach that can be used from R is
described in an Infoworld
article .
A simple example using ggplotly():
library(ggplot2)
library(plotly)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl),
shape = 21,
size = 3)
ggplotly(p)
Adding a text aesthetic allows the tooltip display to be
customized:
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl,
text = paste(year,
manufacturer,
model)),
shape = 21,
size = 3)
ggplotly(p, tooltip = "text") |>
style(hoverlabel = list(bgcolor = "white"))
Ggiraph
The ggiraph
package provides another approach.
library(ggplot2)
library(ggiraph)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point_interactive(
aes(x = displ,
y = hwy,
fill = cyl,
tooltip = paste(year,
manufacturer,
model)),
shape = 21,
size = 3)
girafe(ggobj = p)
Grammar of Interactive Graphics
There have been several efforts to develop a grammar of interactive
graphics, including ggvis and animint ;
neither seems to be under active development at this time.
A promising approach is Vega-Lite , with a Python
interface Altair and an R
interface altair to
the Python interface.
An example using the altair package:
rub <- read.csv(here::here("rubber.csv"))
library(altair)
chartTH <- alt$Chart(rub)$
mark_point()$
encode(x = alt$X("H:Q", scale = alt$Scale(domain = range(rub$H))),
y = alt$Y("T:Q", scale = alt$Scale(domain = range(rub$T))))
brush <- alt$selection_interval()
chartTH_brush <- chartTH$add_selection(brush)
chartTH_selection <-
chartTH_brush$encode(color = alt$condition(brush,
"Origin:N",
alt$value("lightgray")))
chartAT <- chartTH_selection$
encode(x = alt$X("T:Q", scale = alt$Scale(domain = range(rub$T))),
y = alt$Y("A:Q", scale = alt$Scale(domain = range(rub$A))))
chartAT | chartTH_selection
The resulting linked plots:
## Error importing Altair python package:
##
## ModuleNotFoundError: No module named 'altair'
## Run `reticulate::py_last_error()` for details.
##
## Output from reticulate::py_config():
## python: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/python
## libpython: /home/luke/.cache/R/reticulate/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/libpython3.12.so
## pythonhome: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz:/home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz
## virtualenv: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/activate_this.py
## version: 3.12.12 (main, Dec 17 2025, 21:10:06) [Clang 21.1.4 ]
## numpy: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/lib/python3.12/site-packages/numpy
## numpy_version: 2.4.2
## altair: [NOT FOUND]
##
## NOTE: Python version was forced by py_require()
## Error:
## ! Error loading Python module altair
## Error importing Altair python package:
##
## ModuleNotFoundError: No module named 'altair'
## Run `reticulate::py_last_error()` for details.
##
## Output from reticulate::py_config():
## python: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/python
## libpython: /home/luke/.cache/R/reticulate/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/libpython3.12.so
## pythonhome: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz:/home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz
## virtualenv: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/activate_this.py
## version: 3.12.12 (main, Dec 17 2025, 21:10:06) [Clang 21.1.4 ]
## numpy: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/lib/python3.12/site-packages/numpy
## numpy_version: 2.4.2
## altair: [NOT FOUND]
##
## NOTE: Python version was forced by py_require()
## Error:
## ! Error loading Python module altair
## Error:
## ! object 'chartTH' not found
## Error:
## ! object 'chartTH_brush' not found
## Error:
## ! object 'chartTH_selection' not found
## Error:
## ! object 'chartAT' not found
Interactive Tutorial
An interactive learnr
tutorial for these notes is available .
You can run the tutorial with
STAT4580::runTutorial("ggplot")
You can install the current version of the STAT4580
package with
remotes::install_gitlab("luke-tierney/STAT4580")
You may need to install the remotes package from CRAN
first.
Exercises
In the following expression, which value of the shape
aesthetic produces a plot with points represented as triangles outlined
in black colored according to the number of cylinders?
```r
library(ggplot2)
ggplot(mpg, aes(x = displ, y = hwy, fill = factor(cyl))) +
geom_point(size = 4, shape = ---)
```
a. 15
b. 17
c. 21
d. 24
It can sometimes be useful to plot text labels in a scatterplot
instead of points. Consider the plot set up as
library(ggplot2)
library(dplyr)
data(gapminder, package = "gapminder")
p <- filter(gapminder, year == 2007) |>
group_by(continent) |>
summarize(gdpPercap = mean(gdpPercap), lifeExp = mean(lifeExp)) |>
ggplot(aes(x = gdpPercap, y = lifeExp))
Which of the following produces a plot with continent names on white
rectangles?
p + geom_text(aes(label = continent))
p + geom_label(aes(label = continent))
p + geom_label(label = continent)
p + geom_text(text = continent)
The following code plots a kernel density estimate for
the eruptions variable in the faithful data
set:
library(ggplot2)
ggplot(faithful, aes(x = eruptions)) + geom_density(bw = 0.1)
Look at the help page for geom_density. Which of the
following best describes what specifying a value for bw
does:
Changes the kernel used to construct the estimate.
Changes the smoothing bandwidth to make the result more or
less smooth.
Changes the stat used to stat_bw.
Has no effect on the retult.
This code creates a map of Iowa counties.
library(ggplot2)
p <- ggplot(map_data("county", "iowa"),
aes(x = long, y = lat, group = group)) +
geom_polygon(, fill = "White", color = "black")
Which of these produces a plot with an aspect ratio that best matches
the map on this
page ?
p + coord_fixed(0.5)
p + coord_fixed(0.75)
p + coord_fixed(1.35)
p + coord_fixed(1.95)
Consider the two plots created by this code (print the values of
p1 and p2 to see the plots):
library(ggplot2)
data(gapminder, package = "gapminder")
p1 <- ggplot(gapminder, aes(x = log(gdpPercap), y = lifeExp)) +
geom_point() +
scale_x_continuous(name = "")
p2 <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10(labels = scales::comma, name = "")
Which of these statements is true?
The x axis labels are identical in both plots.
The x axis labels in p2 are in dollars;
the labels in p1 are in log dollars.
The x axis labels in p1 are in dollars;
the labels in p2 are in log dollars.
There are no labels on the x axis in
p2.
Consider the plot created by
library(ggplot2)
data(gapminder, package = "gapminder")
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10(labels = scales::comma)
Which of these expressions produces a plot with a white
background?
p
p + theme_grey()
p + theme_classic()
p + ggthemes::theme_economist()
There are many different ways to change the x axis
label in ggplot. Consider the plot created by
library(ggplot2)
p <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
Which of the following does not change the
x axis label to Displacement ?
p + labs(x = "Displacement")
p + scale_x_continuous("Displacement")
p + xlab("Displacement")
p + theme(axis.title.x = "Displacement")
---
title: "The Grammar of Graphics"
output:
  html_document:
    toc: yes
    code_folding: show
    code_download: true
---

<link rel="stylesheet" href="stat4580.css" type="text/css" />

```{r setup, include = FALSE}
source(here::here("setup.R"))
knitr::opts_chunk$set(collapse = TRUE, message = FALSE,
                      fig.height = 5, fig.width = 6, fig.align = "center")

library(dplyr)
library(ggplot2)
library(lattice)
library(gridExtra)
set.seed(12345)
```


## Background

The _Grammar of Graphics_ is a language proposed by Leland Wilkinson
for describing statistical graphs.

> Wilkinson, L. (2005), _The Grammar of Graphics_, 2nd ed., Springer.

The grammar of graphics has served as the foundation for the graphics
frameworks in [SPSS](https://www.ibm.com/products/spss-statistics),
[Vega-Lite](https://vega.github.io/vega-lite/) and several other
systems.

`ggplot2` represents an implementation and extension of the grammar
of graphics for R.

> Wickham, H. (2016), _ggplot2: Elegant Graphics for Data Analysis_,
> 2nd ed., Springer. [3rd ed. in progress](https://ggplot2-book.org/).

> On line documentation: <https://ggplot2.tidyverse.org/reference/index.html>.

> Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023),
> [_R for Data Science (2nd Edition)_](https://r4ds.hadley.nz/),
> O'Reilly.

> [Data visualization cheatsheet](https://raw.githubusercontent.com/rstudio/cheatsheets/main/data-visualization.pdf)

> Winston Chang (2018), [_R Graphics Cookbook_, 2nd
> edition](https://r-graphics.org/), O’Reilly. ([Book source on
> GitHub](https://github.com/wch/rgcookbook))

The idea is that any basic plot can be built out of a combination of

* a data set;

* one or more geometrical representation (_geoms_);

* mappings of values to _aesthetic_ features of the geom;

* a _stat_ to produce values to be mapped;

* position adjustments;

* a coordinate system;

* a scale specification;

* a faceting scheme.

`ggplot2` provides tools for specifying these components and adjusting
their features.

Many components and features are provided by default and do not need
to be specified explicitly unless the defaults are to be changed.


## A Basic Template

The simplest graph needs a data set, a geom, and a mapping:

```r
ggplot(data = <DATA>) + <GEOM>(mapping = aes(<MAPPINGS>))
```

The appearance of geom objects is controlled by _aesthetic_ features.

Each geom has some required and some optional aesthetics.

For `geom_point` the required aesthetics are

* `x` position
* `y` position.

Optional aesthetics include

* `color` (or `colour`)
* `fill`
* `shape`
* `size`

`geom_point` is used to produce a _scatter plot_.


## Scatter Plots Using `geom_point`

The `mpg` data set included in the `ggplot2` package includes EPA
fuel economy data from 1999 to 2008 for 38 popular models of cars.

```{r}
mpg
```

```{r, include = FALSE}
fig_align <- if (using_xaringan) "left" else "center"
```

A simple scatter plot:

```{r mpg-plain, eval = FALSE}
ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy))
```

```{r mpg-plain, echo = FALSE, fig.width = 5.75, fig.align = fig_align}
```

Map color to vehicle class:

```{r mpg-color, eval = FALSE}
ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = class))
```

```{r mpg-color, echo = FALSE, fig.width = 7, fig.align = fig_align}
```

And map shape to number of cylinders:

```{r mpg-color-shape, eval = FALSE}
ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = class,
                   shape = factor(cyl)))
```

```{r mpg-color-shape, echo = FALSE, fig.width = 7, fig.align = fig_align}
```

<!-- --> Perception:

* Too many colors;
* shapes are too small;
* interference between shapes and colors.

Aesthetics can be mapped to a variable or set to a fixed common value.

This can be used to override default settings:

```{r mpg-fixed, eval = FALSE}
ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy),
               color = "blue",
               shape = 1)
```

```{r mpg-fixed, echo = FALSE, fig.width = 7, fig.align = fig_align}
```

Changing the `size` aesthetic makes shapes easier to recognize:

```{r mpg-color-shape-large, eval = FALSE}
ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = class,
                   shape = factor(cyl)),
               size = 3)
```

```{r mpg-color-shape-large, echo = FALSE, fig.width = 7, fig.align = fig_align}
```

<!-- --> Perception: Still too many colors; still have interference.

Available point shapes are specified by number:

```{r, echo = FALSE, eval = FALSE}
generateRPointShapes <- function() {
    oldPar <- par()
    par(font = 2, mar = c(0.5, 0, 0, 0))
    y <- rev(c(rep(1, 6), rep(2, 5), rep(3, 5), rep(4, 5), rep(5, 5)))
    x <- c(rep(1 : 5, 5), 6)
    plot(x, y, pch = 0 : 25, cex = 1.5, ylim = c(1, 5.5), xlim = c(1, 6.5),
         axes = FALSE, xlab = "", ylab = "", bg = "blue")
    text(x, y, labels = 0 : 25, pos = 3)
    par(mar = oldPar$mar, font = oldPar$font)
}
generateRPointShapes()
```
```{r, echo = FALSE}
ggplot(NULL, aes(x = rep(1 : 5, 5), y = rev(rep(1 : 5, each = 5)))) +
    geom_point(shape = 1 : 25, size = 5, fill = "blue") +
    geom_text(aes(label = 1 : 25), nudge_y = 0.25, size = 6) +
    theme_void()
```

Shapes 1-20 have their color set by the `color` aesthetic and ignore
the `fill` aesthetic.

For shapes 21-25 the `color` aesthetic specifies the _border color_ and
`fill` specifies the _interior color_.

Using `shape` 21 with `cyl` mapped to the `fill` aesthetic:

```{r mpg-fill-21, eval = FALSE}
ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl),
               shape = 21,
               size = 4)
```
```{r mpg-fill-21, echo = FALSE}
```

<!-- --> Perception: Borders, larger symbols, fewer colors help.

Specifying a new default is very different from specifying a constant
value as an aesthetic.

Constant aesthetic: Rarely what you want:

```{r mpg-bad-color, eval = FALSE}
ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy,
                   color = "blue"))
```

```{r mpg-bad-color, echo = FALSE, fig.height = 4.2}
```

Default: Probably what you want:
```{r mpg-good-color, eval = FALSE}
ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy),
               color = "blue")
```

```{r mpg-good-color, echo = FALSE, fig.height = 4.2}
```


## Geometric Objects

`ggplot2` provides a number of geoms:

```{r, echo = FALSE, results = "asis"}
showList <- function(v, ncol = 4, pad = 2) {
    w <- max(nchar(v)) + pad
    nrow <- ceiling(length(v) / ncol)
    v <- c(v, character(ncol * nrow - length(v)))

    cat("```r\n")
    for (i in seq_len(nrow)) {
        line <- v[ncol * (i - 1) + (1 : ncol)]
        for (j in 1 : ncol)
            if (j < ncol)
                cat(sprintf("%-*s", w, line[j]))
            else
                cat(sprintf("%s\n", line[j]))
        ## cat(sprintf("%-*s%-*s%-*s%s\n",
        ##             w, line[1], w, line[2], w, line[3], line[4]))
    }
    cat("```\n")
}
showList(ls("package:ggplot2", pat = "^geom_"))
```

Additional geoms are available in packages like `ggforce`, `ggridges`,
and others described on the [`ggplot2` extensions
site](https://exts.ggplot2.tidyverse.org/).

Geoms can be added as _layers_ to a plot.

Mappings common to all, or most, geoms can be specified in the
`ggplot` call:

```{r mpg-smooth, eval = FALSE}
ggplot(mpg,
       aes(x = displ,
           y = hwy)) +
    geom_smooth() +
    geom_point()
```

```{r mpg-smooth, echo = FALSE, message = FALSE}
```

Geoms can also use different data sets.

One way to highlight Europe in a plot of life expectancy against log
income for 2007 is to start with a plot of the full data:

```{r gm_2007, eval = FALSE}
library(dplyr)
library(gapminder)
gm_2007 <- filter(gapminder, year == 2007)

(p <- ggplot(gm_2007, aes(x = gdpPercap,
                          y = lifeExp)) +
     geom_point() +
     scale_x_log10())
```

```{r gm_2007, echo = FALSE}
```

Then add a layer showing only Europe:

```{r gm_2007_eu, eval = FALSE}
gm_2007_eu <- filter(gm_2007, continent == "Europe")

p + geom_point(data = gm_2007_eu,
               color = "red",
               size = 3)
```

```{r gm_2007_eu, echo = FALSE}
```


## Statistical Transformations

All geoms use a statistical transformation (_stat_) to convert raw
data to the values to be mapped to the object's features.

The available stats are

```{r, echo = FALSE, results = "asis"}
showList(ls("package:ggplot2", pat = "^stat_"), ncol = 3)
```

Each geom has a default stat, and each stat has a default geom.

* For `geom_point` the default stat is `stat_identity`.

* For `geom_bar` the default stat is `stat_count`.

* For `geom_histogram` the default stat is `stat_bin`.

Stats can provide _computed variables_ that can be mapped to aesthetic
features.

For `stat_bin` some of the computed variables are

* `count`: number of points in bin
* `density`: density of points in bin, scaled to integrate to 1

The `density` variable can be accessed as `after_stat(density)`.

Older approaches that also work but are now discouraged:

* `stat(density)`
* `..density..`

By default, `geom_histogram` uses `y = after_stat(count)`.

```{r geyser-count, eval = FALSE}
ggplot(faithful) +
    geom_histogram(aes(x = eruptions),
                   binwidth = 0.25,
                   fill = "grey",
                   color = "black")
```
```{r geyser-count, echo = FALSE}
```

Explicitly specifying `y = after_stat(count)` produces the same plot:

```{r geyser-count-exp, eval = FALSE}
ggplot(faithful) +
    geom_histogram(aes(x = eruptions,
                       y = after_stat(count)),
                   binwidth = 0.25,
                   fill = "grey",
                   color = "black")
```
```{r geyser-count-exp, echo = FALSE}
```

Using `y = after_stat(density)` produces a density scaled axis.

```{r geyser-density, eval = FALSE}
(p <- ggplot(faithful) +
     geom_histogram(aes(x = eruptions,
                        y = after_stat(density)),
                    binwidth = 0.25,
                    fill = "grey",
                    color = "black"))
```
```{r geyser-density, echo = FALSE}
```

`stat_function` can be used to add a density curve specified as a
mixture of two normal densities:

```{r}
(ms <- mutate(faithful,
              type = ifelse(eruptions < 3,
                            "short",
                            "long")) |>
     group_by(type) |>
     summarize(mean = mean(eruptions),
               sd = sd(eruptions),
               n = n()) |>
     mutate(p = n / sum(n)))
```

```{r geyser-hist-dens, eval = FALSE}
f <- function(x)
    ms$p[1] * dnorm(x, ms$mean[1], ms$sd[1]) +
        ms$p[2] * dnorm(x, ms$mean[2], ms$sd[2])

p + stat_function(fun = f, color = "red")
```
```{r geyser-hist-dens, echo = FALSE}
```


## Position Adjustments

The available position adjustments:

```{r, echo = FALSE, results = "asis"}
showList(ls("package:ggplot2", pat = "^position_"), ncol = 3)
```

A bar chart showing the counts for the different `cut` categories in
the `diamonds` data:

```{r diamonds-cut, eval = FALSE}
ggplot(diamonds, aes(x = cut)) +
    geom_bar()
```
```{r diamonds-cut, echo = FALSE}
```

Mapping `clarity` to `fill` shows the breakdown by both `cut` and
`clarity` in a _stacked bar chart_:

```{r diamonds-stack1, eval = FALSE}
ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar()
```
```{r diamonds-stack1, echo = FALSE}
```

The default `position` for bar charts is `position_stack`:

```{r diamonds-stack2, eval = FALSE}
ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar(position = "stack")
```
```{r diamonds-stack2, echo = FALSE}
```

`position_dodge` produces _side-by-side bar charts_:

```{r diamonds-dodge, eval = FALSE}
ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar(position = "dodge")
```
```{r diamonds-dodge, echo = FALSE}
```

`position_fill` rescales all bars to be equal height to help compare
proportions within bars.

```{r diamonds-fill, eval = FALSE}
ggplot(diamonds, aes(x = cut,
                     fill = clarity)) +
    geom_bar(position = "fill")
```
```{r diamonds-fill, echo = FALSE}
```

Using the counts to scale the widths would produce a _spine plot_, a
variant of a _mosaic plot_.

This is easiest to do with the `ggmosaic` package.

`position_jitter` can be used with `geom_point` to avoid overplotting
or break up rounding artifacts.

Another version of the Old Faithful data available as `geyser` in
package `MASS` has some rounding in the `duration` variable:

```{r geyser2, eval = FALSE}
data(geyser, package = "MASS")

## Adjust for different meaning of `waiting` variable
geyser2 <- na.omit(mutate(geyser,
                          duration = lag(duration)))

p <- ggplot(geyser2, aes(x = duration, y = waiting))
p + geom_point()
```

```{r geyser2, echo = FALSE}
```

_Jittering_ can help break up the distracting _heaping_ of values on
durations of 2 and 4 minutes.

The default amount of jittering isn't quite enough in this case:

```{r geyser2-jit, eval = FALSE}
p + geom_point(position = "jitter")
```
```{r geyser2-jit, echo = FALSE}
```

To jitter only horizontally and by a larger amount you can use

```{r geyser2-jit2, eval = FALSE}
p + geom_point(position =
                   position_jitter(height = 0,
                                   width = 0.1))
```
```{r geyser2-jit2, echo = FALSE}
```


## Coordinate Systems

Coordinate system functions include

```{r, echo = FALSE, results = "asis"}
showList(ls("package:ggplot2", pat = "^coord_"))
```

The default coordinate system is `coord_cartesian`.


### Cartesian Coordinates

`coord_cartesian` can be used to _zoom in_ on a particular regiion:

```{r geyser2-zoom, eval = FALSE}
p + geom_point() +
    coord_cartesian(xlim = c(3, 4))
```
```{r geyser2-zoom, echo = FALSE}
```

`coord_fixed` and `coord_equal` fix the _aspect ratio_ for a cartesian
coordinate system.

The aspect ratio is the ratio of the number physical display units per
`y` unit to the number of physical display units per `x` unit.

The aspect ratio can be important for recognizing features and patterns.

```{r}
river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat")
r <- data.frame(flow = river, month = seq_along(river))
```

```{r river-flat, eval = FALSE}
ggplot(r, aes(x = month, y = flow)) +
    geom_point() +
    coord_fixed(ratio = 4)
```
```{r river-flat, echo = FALSE, fig.height = 2, fig.width = 8}
```


### Polar Coordinates

A filled bar chart

```{r diamonds-fill-1, eval = FALSE}
(p <- ggplot(diamonds) +
     geom_bar(aes(x = 1, fill = cut),
              position = "fill"))
```
```{r diamonds-fill-1, echo = FALSE}
```

is turned into a pie chart by changing to polar coordinates:

```{r diamonds-pie, eval = FALSE}
p + coord_polar(theta = "y")
```
```{r diamonds-pie, echo = FALSE}
```


### Coordinate Systems for Maps

Coordinate systems are particularly important for maps.

Polygons for many political and geographic boundaries are available
through the `map_data` function.

Boundaries for the lower 48 US states can be obtained as

```{r}
usa <- map_data("state")
```

Polygon vertices are encoded by longitude and latitude.

Plotting these in the default cartesian coordinate system usually does
not work well:

```{r usa-cart, eval = FALSE}
usa <- map_data("state")
m <- ggplot(usa, aes(x = long,
                     y = lat,
                     group = group)) +
    geom_polygon(fill = "white",
                 color = "black")
m
```
```{r usa-cart, echo = FALSE}
```

Using a fixed aspect ratio is better, but an aspect ratio of 1 does
not work well:

```{r}
m + coord_equal()
```

The problem is that away from the equator a one degree change in
latitude corresponds to a larger distance than a one degree change in
longitude.

The ratio of one degree longitude separation to one degree latitude
separation for the latitude at the middle of Iowa of 41 degrees is

```{r}
longlat <- cos(41 / 90 * pi / 2)
longlat
```

A better map is obtained using the aspect ratio `1 / longlat`:

```{r usa-fixed, eval = FALSE}
m + coord_fixed(1 / longlat)
```
```{r usa-fixed, echo = FALSE}
```

The best approach is to use a coordinate system designed specifically for
maps.

There are many _projections_ used in map making.

The default projection used by `coord_map` is the
[Mercator](https://en.wikipedia.org/wiki/Mercator_projection)
projection.

```{r usa-mercator, eval = FALSE}
m + coord_map()
```

```{r usa-mercator, echo = FALSE}
```

Proper map projections are non-linear; this is easier to see with an
Albers projection:

```{r usa-albers, eval = FALSE}
m + coord_map("albers", 20, 50)
```
```{r usa-albers, echo = FALSE}
```


## Scales

Scales are used for controlling the mapping of values to physical
representations such as colors, shapes, and positions.

Scale functions are also responsible for producing _guides_ for
translating physical representations back to values, such as

* axis labels and marks;

* color or shape legends.

There are currently `r length(ls("package:ggplot2", pat = "scale_"))`
scale functions; some examples are

```r
scale_color_gradient      scale_shape_manual     scale_x_log10
scale_color_manual        scale_size_area        scale_y_log10
scale_fill_gradient                              scale_x_sqrt
scale_fill_manual                                scale_y_sqrt

```

An [experimental tool](https://ggplot2tor.com/scales/) to help
choosing scales is available.

Start with a basic scatter plot:

```{r mpg-basic, eval = FALSE}
(p <- ggplot(mpg, aes(x = displ,
                      y = hwy)) +
     geom_point())
```
```{r mpg-basic, echo = FALSE}
```

Remove the `x` tick marks and labels (this can also be done with theme
settings):

```{r mpg-no-ticks-labs, eval = FALSE}
p + scale_x_continuous(labels = NULL,
                       breaks = NULL)
```
```{r mpg-no-ticks-labs, echo = FALSE}
```

Change the tick locations and labels:

```{r mpg-new-ticks-labs, eval = FALSE}
p + scale_x_continuous(labels =
                           paste(c(2, 4, 6), "ltr"),
                       breaks = c(2, 4, 6))
```
```{r mpg-new-ticks-labs, echo = FALSE}
```

Use a logarithmic axis:

```{r mpg-log-x, eval = FALSE}
p + scale_x_log10(labels = paste(c(2, 4, 6), "ltr"),
                  breaks = c(2, 4, 6),
                  minor_breaks = c(3, 5, 7))
```
```{r mpg-log-x, echo = FALSE}
```

The
[Scales](https://r4ds.hadley.nz/communication.html#scales)
section in [R for Data Science](https://r4ds.hadley.nz/) provides some
more details.

Color assignment can also be controlled by scale functions.

For example, for some presidential approval ratings data

```{r, include = FALSE}
pr_appr <- data.frame(pres = c("Obama", "Carter", "Clinton",
                               "G.W. Bush", "Reagan", "G.H.W Bush", "Trump"),
                      appr = c(79, 78, 68, 65, 58, 56, 40),
                      party = c("D", "D", "D", "R", "R", "R", "R"),
                      year = c(2009, 1977, 1993, 2001, 1981, 1989, 2017))
pr_appr <- mutate(pr_appr, pres = reorder(pres, appr))
```
```{r}
pr_appr
```

the default color scale is not ideal:

```{r pr-appr0, eval = FALSE}
ggplot(pr_appr,
       aes(x = appr, y = pres, fill = party)) +
    geom_col()
```

```{r pr-appr0, echo = FALSE}
```

The common assignment of red for Republican and blue for Democrat can
be obtained by

```{r pr-appr, eval = FALSE}
ggplot(pr_appr,
       aes(x = appr, y = pres, fill = party)) +
    geom_col() +
    scale_fill_manual(values
                      = c(R = "red", D = "blue"))
```

```{r pr-appr, echo = FALSE}
```

A better choice is to use a well-designed [color
palette](https://hclwizard.org/#color-palettes):

```{r pr-appr-2, eval = FALSE}
ggplot(pr_appr,
       aes(x = appr, y = pres, fill = party)) +
    geom_col() +
    colorspace::scale_fill_discrete_diverging(
                    palette = "Blue-Red 2")
```

```{r pr-appr-2, echo = FALSE}
```


## Facets

Faceting uses the _small multiples_ approach to introduce additional
variables.

For a single variable `facet_wrap` is usually used:

```{r mpg-facet-wrap, eval = FALSE}
p <- ggplot(mpg) +
    geom_point(aes(x = displ,
                   y = hwy))
p + facet_wrap(~ class)
```
```{r mpg-facet-wrap, echo = FALSE}
```

For two variables, each with a modest number of categories,
`facet_grid` can be effective:

```{r mpg-facet-grid, eval = FALSE}
p + facet_grid(factor(cyl) ~ drv)
```
```{r mpg-facet-grid, echo = FALSE}
```

<!--
Using the previous mpg facet plot would be better but this is one of
the homework problems in HW3
-->

To show common data in all facets make sure the data does not contain the
faceting variable.

This was used to show muted views of the full data in faceted plots.

A faceted plot of the `gapminder` data:

```{r gapminder-not-muted, eval = FALSE}
library(gapminder)

years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
             year %in% years_to_keep)

ggplot(gd,
       aes(x = gdpPercap,
           y = lifeExp,
           color = continent)) +
    geom_point(size = 2.5) +
    scale_x_log10() +
    facet_wrap(~ year)
```

```{r gapminder-not-muted, echo = FALSE, fig.width = 8}
```

Add a muted version of the full data in the background of each panel:

<!-- variant of code in prercep.Rmd -->
```{r gapminder-muted, eval = FALSE}
library(gapminder)

years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
             year %in% years_to_keep)
gd_no_year <- mutate(gd, year = NULL)

ggplot(gd,
       aes(x = gdpPercap,
           y = lifeExp,
           color = continent)) +
    geom_point(data = gd_no_year,
               color = "grey80") +
    geom_point(size = 2.5) +
    scale_x_log10() +
    facet_wrap(~ year)
```

```{r gapminder-muted, echo = FALSE, fig.width = 8}
```

Usually facets use common axis scales, but one or both can be allowed
to vary.

A useful approach for showing time series data with a good aspect
ratio can be to split the data into facets for non-overlapping
portions of the time axis.

```{r river-facet, eval = FALSE}
pd <- rep(paste(seq(1, by = 32, length.out = 4),
                seq(32, by = 32, length.out = 4),
                sep = " - "),
         each =  32)
rd <- data.frame(month = seq_along(river),
                 flow = river,
                 panel = pd)
ggplot(rd, aes(x = month,
               y = flow)) +
    geom_point() +
    facet_wrap(~ panel,
               scale = "free_x", #<<
               ncol = 1)
```
```{r river-facet, echo = FALSE}
```

Facet arrangement can also be used to convey other information, such
as geographic location.

The [`geofacet` package](https://hafen.github.io/geofacet/) allows
facets to be placed in approximate locations of different geographic
regions.

An example for data from US states:

```{r geofacet, eval = FALSE}
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
    geom_line() +
    facet_geo(~ state,
              grid = "us_state_grid2",
              label = "code") +
    scale_x_continuous(labels =
                           function(x) paste0("'", substr(x, 3, 4))) +
    labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
         caption = "Data Source: bls.gov",
         x = "Year",
         y = "Unemployment Rate (%)") +
    theme(strip.text.x = element_text(size = 6),
          axis.text = element_text(size = 5))
```
```{r geofacet, echo = FALSE, message = FALSE}
```

Arrangement according to a calendar can also be useful.


## Themes

`ggplot2` supports the notion of _themes_ for adjusting non-data
appearance aspects of a plot, such as

* plot titles

* axis and legend placement and titles

* background colors

* guide line placement

Theme elements can be customized in several ways:

* `theme()` can be used to adjust individual elements in a plot.

* `theme_set()` adjusts default settings for a session;

* pre-defined theme functions allow consistent style changes.

The
[full documentation](https://ggplot2.tidyverse.org/reference/theme.html)
of the `theme` function lists many customizable elements.

One simple example:

```{r theme-simple, eval = FALSE}
ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl),
               shape = 21,
               size = 3) +
    theme(legend.position = "top",
          axis.text = element_text(size = 12),
          axis.title = element_text(size = 14,
                                    face = "bold"))
```
```{r theme-simple, echo = FALSE}
```

Another example:

```{r theme-simple-2, eval = FALSE}
gthm <-
    theme(plot.background =
              element_rect(fill = "lightblue",
                           color = NA),
          panel.background =
              element_rect(fill = "pink"))
p + gthm
```
```{r theme-simple-2, echo = FALSE}
```

Some alternate complete themes provided by `ggplot2` are

```r
theme_bw        theme_gray      theme_minimal   theme_void
theme_classic   theme_grey      theme_dark      theme_light
```

Some examples:

```{r alt-themes, eval = FALSE}
p_bw <- p + theme_bw() + ggtitle("BW")

p_classic <- p + theme_classic() + ggtitle("Classic")

p_min <- p + theme_minimal() + ggtitle("Minimal")

p_void <- p + theme_void() + ggtitle("Void")

library(patchwork)
(p_bw + p_classic) / (p_min + p_void)
```
```{r alt-themes, echo = FALSE}
```

The
[`ggthemes`](http://www.rpubs.com/Mentors_Ubiqum/ggthemes_1)
package provides some additional themes.

Some examples:

```{r ggthemes-examples, eval = FALSE}
library(ggthemes)

p_econ <- p + theme_economist() + ggtitle("Economist")

p_wsj <- p + theme_wsj() + ggtitle("WSJ")

p_tufte <- p + theme_tufte() + ggtitle("Tufte")

p_few <- p + theme_few() + ggtitle("Few")

(p_econ + p_wsj) / (p_tufte + p_few)
```
```{r ggthemes-examples, echo = FALSE}
```

`ggthemes` also provides `theme_map` that removes unnecessary elements
from maps:

```{r}
m + coord_map() + theme_map()
```

The
[Themes](https://r4ds.hadley.nz/communication.html#sec-themes)
section in [R for Data Science](https://r4ds.hadley.nz/) provides some
more details.


## A More Complete Template

```r
ggplot(data = <DATA>) +
    <GEOM>(mapping = aes(<MAPPINGS>),
           stat = <STAT>,
           position = <POSITION>) +
    < ... MORE GEOMS ... > +
    <COORDINATE_ADJUSTMENT> +
    <SCALE_ADJUSTMENT> +
    <FACETING> +
    <THEME_ADJUSTMENT>
```


## Labels and Annotations

A basic plot:

```{r mpg-ann, eval = FALSE}
p <- ggplot(mpg, aes(x = displ,
                     y = hwy))
p1 <- p + geom_point(aes(color = factor(cyl)),
                     size = 2.5)
p1
```
```{r mpg-ann, echo = FALSE}
```

Axis labels are based on the expressions given to `aes`.

This is convenient for exploration but usually not ideal for a report.

The `labs()` function can be used to change axis and legend labels:

```{r mpg-ann-labs, eval = FALSE}
p1 + labs(x = "Displacement (Liters)",
          y = "Highway Miles Per Gallon",
          color = "Cylinders")
```
```{r mpg-ann-labs, echo = FALSE}
```

The `labs()` function can also add a title, subtitle, and caption:

```{r mpg-ann-labs-2, eval = FALSE}
p2 <- p1 +
    labs(x = "Displacement (Liters)",
         y = "Highway Miles Per Gallon",
         color = "Cylinders",
         title = "Gas Mileage and Displacement",
         subtitle = paste("For models which had a new release every year",
                           "between 1999 and 2008"),
         caption = "Data Source: https://fueleconomy.gov/")
p2
```
```{r mpg-ann-labs-2, echo = FALSE}
```

Annotations can be used to provide popout that draws a viewer's
attention to particular features.

The `annotate()` function is one option:

```{r mpg-ann-popout, eval = FALSE}
p2 +
    annotate("label", x = 2.8, y = 43,
             label = "Volkswagens") +
    annotate("rect",
             xmin = 1.7, xmax = 2.1,
             ymin = 40, ymax = 45,
             fill = NA, color = "black")
```
```{r mpg-ann-popout, echo = FALSE}
```

Often more convenient are some `geom_mark` objects provided by the
`ggforce` package:

```{r mpg-ann-popout-2, eval = FALSE}
library(ggforce)
p2 +
    geom_mark_hull(aes(filter = class == "2seater"),
                   description =
                       paste("2-Seaters have high displacement",
                             "values, but also high fuel efficiency",
                             "for their displacement.")) +
    geom_mark_rect(aes(filter = hwy > 40),
                   description =
                       "These are Volkswagens") +
    geom_mark_circle(aes(filter = hwy == 12),
                     description =
                         "Three pickups and an SUV.")
```
```{r mpg-ann-popout-2, echo = FALSE, fig.width = 7, fig.height = 5.5}
#| warning: false
```

These annotations can be customized in a number of ways.


## Arranging Plots

There are several tools available for assembling ensemble plots.

The [`patchwork`](https://patchwork.data-imaginist.com/) package is a
good choice.

A simple example:

```{r mpg-patchwork, eval = FALSE}
p1 <- ggplot(mpg, aes(x = displ,
                      y = hwy)) +
    geom_point()
p2 <- ggplot(mpg, aes(x = cyl,
                      y = hwy,
                      group = cyl)) +
    geom_boxplot()
p3 <- ggplot(mpg, aes(x = cyl)) +
    geom_bar()

library(patchwork)
(p1 + p2) / p3
```
```{r mpg-patchwork, echo = FALSE}
```


## Animation

The [`gganimate`](https://github.com/thomasp85/gganimate) package
can be used to add animation to a `ggplot` graph.

Start with a plot `p` for all years in the `gapminder` data, with
`year` in the background:

```{r}
p <- gapminder |>
    arrange(desc(pop)) |>
    ggplot(aes(x = gdpPercap, y = lifeExp)) +
    geom_text(aes(x = 5000, y = 55, label = as.character(year)),
              size = 50, color = "grey",
              hjust = "center", vjust = "center") +
    geom_point(aes(size = pop, fill = continent), shape = 21) +
    scale_x_log10(labels = scales::comma) +
    ylim(c(20, 85)) +
    scale_size_area(max_size = 20,
                    labels = scales::comma,
                    breaks = c(0.25 * 10 ^ 9, 0.5 * 10 ^ 9, 10 ^ 9)) +
    scale_fill_manual(values = c(Africa = "deepskyblue",
                                 Asia = "red",
                                 Americas = "green",
                                 Europe = "gold",
                                 Oceania = "brown")) +
    labs(x = "Income", y = "Life expectancy") +
    theme(text = element_text(size = 16)) +
    guides(fill = guide_legend(title = "Continent",
                               override.aes = list(size = 5),
                               order = 1),
           size = guide_legend(title = "Population",
                               label.hjust = 1,
                               order = 2)) +
    theme_minimal() +
        theme(panel.border = element_rect(fill = NA, color = "grey20"))
```

```{r gapminder-full, echo = FALSE, fig.height = 6, fig.width = 8}
p
```

A [GIF](https://simple.wikipedia.org/wiki/Graphics_Interchange_Format)
animation:

```{r gapminder-anim, eval = FALSE}
library(gganimate)
animate(p +
        transition_states(
            year,
            transition_length = 2,
            state_length = 0))
```
```{r gapminder-anim, echo = FALSE, fig.height = 6, fig.width = 8}
```

A movie:

```{r gapminder-anim-movie, eval = FALSE}
animate(p +
        transition_states(
            year,
            transition_length = 2,
            state_length = 0,
            wrap = FALSE),
        renderer = ffmpeg_renderer())
```
<center> <!-- there should/may be a better way -->
```{r gapminder-anim-movie, echo = FALSE, fig.height = 6, fig.width = 8, out.width = "100%"}
```
</center>


## Interaction


### Plotly

The `ggplotly` function in the [`plotly` package](https://plotly.com/r/)
can be used to add some interactive features to a plot created with
`ggplot2`.

* In an R session a call to `ggplotly()` may open a browser
  window with the interactive plot.

* In an RStudio session the plot appears in the graphics panel.

* In an Rmarkdown document the interactive plot is embedded in the
  `html` file.

Another interactive plotting approach that can be used from R is
described in an [Infoworld
article](https://www.infoworld.com/article/3607068/plot-in-r-with-echarts4r.html).

A simple example using `ggplotly()`:

```{r mpg-plotly, eval = FALSE}
library(ggplot2)
library(plotly)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl),
               shape = 21,
               size = 3)
ggplotly(p)
```
```{r mpg-plotly, echo = FALSE, message = FALSE}
```

Adding a `text` aesthetic allows the tooltip display to be customized:

```{r mpg-plotly-2, eval = FALSE}
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point(aes(x = displ,
                   y = hwy,
                   fill = cyl,
                   text = paste(year,
                                manufacturer,
                                model)),
               shape = 21,
               size = 3)
ggplotly(p, tooltip = "text") |>
    style(hoverlabel = list(bgcolor = "white"))
```
```{r mpg-plotly-2, echo = FALSE, warning = FALSE, message = FALSE}
```


### Ggiraph

The [`ggiraph` package](https://davidgohel.github.io/ggiraph/)
provides another approach.

```{r mpg-ggiraph, eval = FALSE}
library(ggplot2)
library(ggiraph)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
    geom_point_interactive(
        aes(x = displ,
            y = hwy,
            fill = cyl,
            tooltip = paste(year,
                            manufacturer,
                            model)),
        shape = 21,
        size = 3)
girafe(ggobj = p)
```

```{r mpg-ggiraph, echo = FALSE}
```


### Grammar of Interactive Graphics

There have been several efforts to develop a grammar of interactive
graphics, including [`ggvis`](https://ggvis.rstudio.com/) and
[`animint`](https://tdhock.github.io/animint/); neither seems to be
under active development at this time.

A promising approach is
[Vega-Lite](https://vega.github.io/vega-lite/), with a Python
interface [Altair](https://altair-viz.github.io/) and an R interface
[altair](https://vegawidget.github.io/altair/) to the Python
interface.

An example using the `altair` package:

```{r rubber-altair, eval = FALSE}
rub <- read.csv(here::here("rubber.csv"))

library(altair)

chartTH <- alt$Chart(rub)$
    mark_point()$
    encode(x = alt$X("H:Q", scale = alt$Scale(domain = range(rub$H))),
           y = alt$Y("T:Q", scale = alt$Scale(domain = range(rub$T))))

brush <- alt$selection_interval()

chartTH_brush <- chartTH$add_selection(brush)

chartTH_selection <-
    chartTH_brush$encode(color = alt$condition(brush,
                                               "Origin:N",
                                               alt$value("lightgray")))

chartAT <- chartTH_selection$
    encode(x = alt$X("T:Q", scale = alt$Scale(domain = range(rub$T))),
           y = alt$Y("A:Q", scale = alt$Scale(domain = range(rub$A))))

chartAT | chartTH_selection
```

The resulting linked plots:

```{r rubber-altair, echo = FALSE, error = TRUE, warning = FALSE}
```


## Notes

* A number of other [`ggplot`
  extensions](https://exts.ggplot2.tidyverse.org/) are available.

* A [blog
  post](https://medium.com/bbc-visual-and-data-journalism/how-the-bbc-visual-and-data-journalism-team-works-with-graphics-in-r-ed0b35693535)
  explains how the [BBC Visual and Data
  Journalism](https://medium.com/bbc-visual-and-data-journalism) team
  creates their graphics. More details are provided in an [_R cook
  book_](https://bbc.github.io/rcookbook/).

<!--
* A [blog
  post](https://blog.revolutionanalytics.com/2016/07/data-journalism-with-r-at-538.html)
  describes the use of R and `ggplot` by
  [FiveThirtyEight](https://fivethirtyeight.com/).  The `ggthemes`
  packages includes `theme_fivethirtyeight` to emulate their style.
-->


## Reading

Chapters [_Data
visualization_](https://r4ds.hadley.nz/data-visualize.html) and
[_Graphics for
communication_](https://r4ds.hadley.nz/communication.html)
in [_R for Data Science_](https://r4ds.hadley.nz/), O'Reilly.

Chapter [_Make a plot_](https://socviz.co/makeplot.html) in [_Data
Visualization_](https://socviz.co/).

Chapter
[_ggplot2_](https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/ggplot2.html)
in [_Introduction to Data Science: Data Analysis and Prediction
Algorithms with R_](https://rafalab.dfci.harvard.edu/dsbook-part-1/).


## Interactive Tutorial

An interactive [`learnr`](https://rstudio.github.io/learnr/) tutorial
for these notes is [available](`r WLNK("tutorials/ggplot.Rmd")`).

You can run the tutorial with

```{r, eval = FALSE}
STAT4580::runTutorial("ggplot")
```

You can install the current version of the `STAT4580` package with

```{r, eval = FALSE}
remotes::install_gitlab("luke-tierney/STAT4580")
```

You may need to install the `remotes` package from CRAN first.


## Exercises

1. In the following expression, which value of the `shape` aesthetic
   produces a plot with points represented as triangles outlined in
   black colored according to the number of cylinders?

<!-- ## nolint start -->
    ```r
    library(ggplot2)
    ggplot(mpg, aes(x = displ, y = hwy, fill = factor(cyl))) +
        geom_point(size = 4, shape = ---)
    ```
<!-- ## nolint end -->

    a. 15
    b. 17
    c. 21
    d. 24

2. It can sometimes be useful to plot text labels in a scatterplot
   instead of points. Consider the plot set up as

    ```r
    library(ggplot2)
    library(dplyr)
    data(gapminder, package = "gapminder")
    p <- filter(gapminder, year == 2007) |>
        group_by(continent) |>
        summarize(gdpPercap = mean(gdpPercap), lifeExp = mean(lifeExp)) |>
        ggplot(aes(x = gdpPercap, y = lifeExp))
    ```
    Which of the following produces a plot with continent
    names on white rectangles?

    a. `p + geom_text(aes(label = continent))`
    b. `p + geom_label(aes(label = continent))`
    c. `p + geom_label(label = continent)`
    d. `p + geom_text(text = continent)`

3. The following code plots a _kernel density estimate_ for the
   `eruptions` variable in the `faithful` data set:

    ```r
    library(ggplot2)
    ggplot(faithful, aes(x = eruptions)) + geom_density(bw = 0.1)
    ```
    Look at the help page for `geom_density`. Which of the following best
    describes what specifying a value for `bw` does:

    a. Changes the _kernel_ used to construct the estimate.
    b. Changes the _smoothing bandwidth_ to make the result more or less smooth.
    c. Changes the `stat` used to `stat_bw`.
    d. Has no effect on the retult.

4. This code creates a map of Iowa counties.

    ```r
    library(ggplot2)
    p <- ggplot(map_data("county", "iowa"),
                aes(x = long, y = lat, group = group)) +
        geom_polygon(, fill = "White", color = "black")
    ```
   
    Which of these produces a plot with an aspect ratio that best
    matches the map on [this
    page](https://en.wikipedia.org/w/index.php?title=List_of_counties_in_Iowa&oldid=1001171082)?

    a. `p + coord_fixed(0.5)` 
    b. `p + coord_fixed(0.75)`
    c. `p + coord_fixed(1.35)`
    d. `p + coord_fixed(1.95)`

5. Consider the two plots created by this code (print the values of
   `p1` and `p2` to see the plots):

    ```r
    library(ggplot2)
    data(gapminder, package = "gapminder")
    p1 <- ggplot(gapminder, aes(x = log(gdpPercap), y = lifeExp)) +
        geom_point() +
        scale_x_continuous(name = "")
    p2 <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
        geom_point() +
        scale_x_log10(labels = scales::comma, name = "") 
    ```

    Which of these statements is true?

    a. The `x` axis labels are identical in both plots.
    b. The `x` axis labels in `p2` are in dollars; the labels in `p1`
       are in log dollars. 
    c. The `x` axis labels in `p1` are in dollars; the labels in `p2`
       are in log dollars.
    d. There are no labels on the `x` axis in `p2`.

6. Consider the plot created by

    ```r
    library(ggplot2)
    data(gapminder, package = "gapminder")
    p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
        geom_point() +
        scale_x_log10(labels = scales::comma) 
    ```

    Which of these expressions produces a plot with a white background?

    a. `p`
    b. `p + theme_grey()`
    c. `p + theme_classic()`
    d. `p + ggthemes::theme_economist()`

7. There are many different ways to change the `x` axis label in
   `ggplot`.  Consider the plot created by

    ```r
    library(ggplot2)
    p <- ggplot(mpg, aes(x = displ, y = hwy)) +
        geom_point()
    ```

    Which of the following does **not** change the `x` axis label to
    _Displacement_?

    a. `p + labs(x = "Displacement")`
    b. `p + scale_x_continuous("Displacement")`
    c. `p + xlab("Displacement")`
    d. `p + theme(axis.title.x = "Displacement")`

