## Background

The Grammar of Graphics is a language proposed by Leland Wilkinson for describing statistical graphs.

• Wilkinson, L. (2005), The Grammar of Graphics, 2nd ed., Springer.

The grammar of graphics has served as the foundation for the graphics system in SPSS and several other systems.

ggplot2 represents an implementation and extension of the grammar for R.

The basic idea is that any basic plot can be built out of a combination of

• a data set
• one or more geometrical representation (geoms)
• mappings of values to aesthetic features of the geom
• a stat to produce values to be mapped
• a coordinate system
• a scale specification
• a faceting scheme

ggplot2 provides tools for specifying these components and adjusting their features.

Many are provided by default and do not need to be specified explicitly unless the defaults are to be changed.

## A Basic Template

The simplest graph needs a data set, a geom, and a mapping:

ggplot(data = <DATA>) + <GEOM>(mapping = aes(<MAPPINGS>))

The appearance of geom objects is controlled by aesthetic features.

Each geom has some required and some optional aesthetics.

For geom_point the required aesthetics are

• x position
• y position.

Optional aesthetics include

• color
• fill
• shape
• size
ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class)) ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = class, shape = factor(cyl))) Many optional aesthetics can also be used to override common defaults:

ggplot(mpg) + geom_point(aes(x = displ, y = hwy), color = "blue", shape = 1) Available point shapes are specified by number: Some of these only work properly in certain combinations. For example, fill only works with shapes 21–15:

ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ, y = hwy, fill = cyl),
shape = 21, size = 4) Specifying a new default is very different from specifying a constant value as an aesthetic, which is rarely what you want:

ggplot(mpg) + geom_point(aes(x = displ, y = hwy, color = "blue")) ## Geometric Objects

ggplot2 provides a number of geoms:

geom_abline      geom_density_2d  geom_linerange   geom_rug
geom_area        geom_density2d   geom_map         geom_segment
geom_bar         geom_dotplot     geom_path        geom_sf
geom_bin2d       geom_errorbar    geom_point       geom_sf_label
geom_blank       geom_errorbarh   geom_pointrange  geom_sf_text
geom_boxplot     geom_freqpoly    geom_polygon     geom_smooth
geom_col         geom_hex         geom_qq          geom_spoke
geom_contour     geom_histogram   geom_qq_line     geom_step
geom_count       geom_hline       geom_quantile    geom_text
geom_crossbar    geom_jitter      geom_raster      geom_tile
geom_curve       geom_label       geom_rect        geom_violin
geom_density     geom_line        geom_ribbon      geom_vline

Additional geoms are available in packages like ggbeewsarm and ggridges.

Geoms can be added as layers to a plot.

Mappings common to all, or most, geoms can be specified in the ggplot call:

ggplot(mpg, aes(x = displ, y = hwy)) +  geom_smooth() + geom_point()
## geom_smooth() using method = 'loess' and formula 'y ~ x' Geoms can also use different data sets. This was used to show simulated QQ plots in the background behind the QQ plot for the Galton heights data:

father.son <- UsingR::father.son
s <- sd(Galton$parent) p + stat_function(fun = function(x) dnorm(x, m, s), color = "red") ## Position Adjustments Some available position adjustments: position_dodge position_identity position_nudge position_dodge2 position_jitter position_stack position_fill position_jitterdodge  For bar charts these allow choosing between stacked and side-by-side charts. The default is position_stack: ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar(position = "stack") position_dodge produces side-by-side bar charts: ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar(position = "dodge") position_fill rescales all bars to be equal height to help compare proportions within bars. Specifying y = ..prop.. produces a better y axis label. ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar(position = "fill") Using the counts to scale the widths produces as spine plot, a variant of a mosaic plot. This is easiest to do with the ggmosaic package. position_jitter can be used with geom_point to avoid overplotting or break up rounding artifacts. p <- ggplot(mpg, aes(x = displ, y = hwy)) p + geom_point(position = "jitter") To jitter only horizontally you can use p + geom_point(position = position_jitter(height=0)) ## Coordinate Systems Coordinate system functions include coord_cartesian coord_flip coord_polar coord_trans coord_equal coord_map coord_quickmap coord_fixed coord_munch coord_sf  We already used coord_flip and coord_polar. The default coordinate system is coord_cartesian. coord_cartesian can be used to zoom in on a particular regiion: p + geom_point() + coord_cartesian(xlim=c(3,4)) coord_fixed and coord_equal fix the aspect ratio for a cartesian coordinate system. The aspect ratio is the ratio of the number physical display units per y unit to the number of physical display units par x unit. The aspect ratio can be important for recognizing features and patterns. In a PP plot the 45 degree line plays an important role, so using an aspect ratio of 1 is helpful: library(gridExtra) m <- mean(father.son$fheight)
s <- sd(father.son\$fheight)
n <- nrow(father.son)
prop <- (1 : n) / n - 0.5 / n
pp1 <- ggplot(father.son) +
geom_point(aes(x = prop, y = sort(pnorm(fheight, m, s))))
pp2 <- pp1 + coord_fixed(ratio = 1)
grid.arrange(pp1, pp2, nrow = 1) Coordinate systems are particularly important for maps.

Polygons for many polotical and geographic boundaries are available through the map_data function.

usa <- map_data("state")

Polygon vertices are encoded by longitude and latitude. Plotting these in the default cartesian coordinate system usually does not work well:

m <- ggplot(usa, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "white", color = "black")
m Using a fixed aspect ratio is better, but an aspect ratio of 1 does not work well:

m + coord_equal() The problem is that away from the equator a one degree change in latitude corresponds to a larger distance than a one degree change in longitude.

The ratio of one degree longitude separation to one degree latitude separation for the latitude at the middle of iowa of 41 degrees is

longlat <- cos(41/90 * pi /2)
longlat
##  0.7547096

A better map is obtained using the aspect ration 1 / longlat:

m + coord_fixed(1 / longlat) The best approach to use a coordinate system designed specifically for maps.

m + coord_map() There are many projections used in map making; the default projection used by coord_map is the Mercator projection.

Proper map projections are non-linear; this is easier to see with the Lagrange projection:

m + coord_map("lagrange") ## Scales

Scales are used for controlling the mapping of values to physical representations such as colors, shapes, and positions.

Scale functions are also responsible for producing guides for translating physical representations back to values, such as

• axis labels and marks;

• color or shape legends.

There are 94 scale functions; some examples are

scale_color_gradient      scale_shape_manual     scale_x_log10
scale_color_identity      scale_size_area        scale_y_log10
scale_fill_manual                                scale_y_sqrt


p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
p Remove the tick marks and labels (this can also be done with theme settings):

p + scale_x_continuous(labels = NULL, breaks = NULL) Change the tick locations and labels:

p + scale_x_continuous(labels = paste(c(2, 4, 6), "ltr"), breaks = c(2, 4, 6)) Use a logarithmic axis:

p + scale_x_log10(labels = paste(c(2, 4, 6), "ltr"),
breaks = c(2, 4, 6),
minor_breaks = c(3, 5, 7)) The Scales section in R for Data Science provides some more details.

Color assignment can also be controlled by scale functions. For example, for the presidential approval ratings data

pr_appr <- data.frame(pres = c("Obama", "Carter", "Clinton",
"G.W. Bush", "Reagan", "G.H.W Bush", "Trump"),
appr = c(79, 78, 68, 65, 58, 56, 40),
party = c("D", "D", "D", "R", "R", "R", "R"),
year = c(2009, 1977, 1993, 2001, 1981, 1989, 2017))
pr_appr <- mutate(pr_appr, pres = reorder(pres, appr))

the common assignment of red for republican and blue for democrat can be obtained by

ggplot(pr_appr, aes(x = pres, y = appr, fill = party)) +
geom_col() + coord_flip() +
scale_fill_manual(values = c(R = "red", D = "blue")) ## Themes

ggplot2 supports the notion of themes for adjusting non-data appearance aspects of a plot, such as

• plot titles
• axis and legend placement and titles
• background colors
• guide line placement

Theme elements can be customized in several ways:

• theme can be used to adjust individual elements in a plot.
• theme_set adjusts default settings for a session;
• pre-defined theme functions allow consistent style changes.

The full documentation of the theme function lists many customizable elements.

One simple example:

ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ, y = hwy, fill = cyl),
shape = 21, size = 3) +
theme(legend.position = "top",
axis.text = element_text(size = 12),
axis.title = element_text(size = 14, face = "bold")) Another example:

gthm <- theme(plot.background = element_rect(fill = "lightblue", color = NA),
panel.background = element_rect(fill = "lightblue2"))
p + gthm Some alternate complete themes provided by ggplot2 are

theme_bw        theme_gray      theme_minimal   theme_void
theme_classic   theme_grey      theme_dark      theme_light
p_bw <- p + theme_bw() + ggtitle("BW")
p_classic <- p + theme_classic() + ggtitle("Classic")
p_min <- p + theme_minimal() + ggtitle("Minimal")
p_void <- p + theme_void() + ggtitle("Void")
grid.arrange(p_bw, p_classic, p_min, p_void, nrow = 2) The ggthemes package provides some additional themes. Some examples:

library(ggthemes)
p_econ <- p + theme_economist() + ggtitle("Economist")
p_wsj <- p + theme_wsj() + ggtitle("WSJ")
p_tufte <- p + theme_tufte() + ggtitle("Tufte")
p_few <- p + theme_few() + ggtitle("Few")
grid.arrange(p_econ, p_wsj, p_tufte, p_few, nrow = 2) ggthemes also provides theme_map that removes unnecessary elements from maps:

m + coord_map() + theme_map() The Themes section in R for Data Science provides some more details.

## Facets

Faceting uses the small multiples approach to introduce additional variables.

For a single variable facet_wrap is usually used:

p <- ggplot(mpg) + geom_point(aes(x = displ, y = hwy))
p + facet_wrap(~ class) For two variables, each with a modest number of categories, facet_grid can be effective:

p + facet_grid(factor(cyl) ~ drv) Facet arrangement can also be used to convey other information, such as geographic location.

The geofacet package allows facets to be placed in approximate locations of different geographic regions.

An example for data from US states:

library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
geom_line() +
facet_geo(~ state, grid = "us_state_grid2", label = "name") +
scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
caption = "Data Source: bls.gov",
x = "Year",
y = "Unemployment Rate (%)") +
theme(strip.text.x = element_text(size = 6)) Arrangement according to a calendar is also useful.

## A More Complete Template

ggplot(data = <DATA>) +
<GEOM>(mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>) +
< ... MORE GEOMS ... > +
<FACETING> +
<THEME_ADJUSTMENT>

## Interaction

The ggplotly function in the plotly package can be used to add some interactive features to a plot created with ggplot2.

• In an R session a call to ggplotly opens a browser window with the interactive plot.
• In an Rmarkdown document the interactive plot is embedded in the html file.
library(plotly)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ, y = hwy, fill = cyl),
shape = 21, size = 3)
ggplotly(p)

Adding a text aesthetic allows the tooltip display to be customized:

p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ, y = hwy, fill = cyl,
text = paste(year, manufacturer, model)),
shape = 21, size = 3)
## Warning: Ignoring unknown aesthetics: text
ggplotly(p, tooltip = "text")