Background
The Grammar of Graphics is a language proposed by Leland
Wilkinson for describing statistical graphs.
Wilkinson, L. (2005), The Grammar of Graphics , 2nd ed.,
Springer.
The grammar of graphics has served as the foundation for the graphics
frameworks in SPSS , Vega-Lite and several other
systems.
ggplot2 represents an implementation and extension of
the grammar of graphics for R.
Wickham, H. (2016), ggplot2: Elegant Graphics for Data
Analysis , 2nd ed., Springer. 3rd ed. in progress .
On line documentation: https://ggplot2.tidyverse.org/reference/index.html .
Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund (2023),
R for Data Science (2nd
Edition) , O’Reilly.
Data
visualization cheatsheet
Winston Chang (2018), R
Graphics Cookbook , 2nd edition , O’Reilly. (Book source on GitHub )
The idea is that any basic plot can be built out of a combination
of
a data set;
one or more geometrical representation (geoms );
mappings of values to aesthetic features of the
geom;
a stat to produce values to be mapped;
position adjustments;
a coordinate system;
a scale specification;
a faceting scheme.
ggplot2 provides tools for specifying these components
and adjusting their features.
Many components and features are provided by default and do not need
to be specified explicitly unless the defaults are to be changed.
A Basic Template
The simplest graph needs a data set, a geom, and a mapping:
ggplot(data = <DATA>) + <GEOM>(mapping = aes(<MAPPINGS>))
The appearance of geom objects is controlled by aesthetic
features.
Each geom has some required and some optional aesthetics.
For geom_point the required aesthetics are
Optional aesthetics include
color (or colour)
fill
shape
size
geom_point is used to produce a scatter
plot .
Scatter Plots Using geom_point
The mpg data set included in the ggplot2
package includes EPA fuel economy data from 1999 to 2008 for 38 popular
models of cars.
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
A simple scatter plot:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy))
Map color to vehicle class:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = class))
And map shape to number of cylinders:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = class,
shape = factor(cyl)))
Perception:
Too many colors;
shapes are too small;
interference between shapes and colors.
Aesthetics can be mapped to a variable or set to a fixed common
value.
This can be used to override default settings:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy),
color = "blue",
shape = 1)
Changing the size aesthetic makes shapes easier to
recognize:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = class,
shape = factor(cyl)),
size = 3)
Perception: Still too many colors; still have interference.
Available point shapes are specified by number:
Shapes 1-20 have their color set by the color aesthetic
and ignore the fill aesthetic.
For shapes 21-25 the color aesthetic specifies the
border color and fill specifies the interior
color .
Using shape 21 with cyl mapped to the
fill aesthetic:
ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl),
shape = 21,
size = 4)
Perception: Borders, larger symbols, fewer colors help.
Specifying a new default is very different from specifying a constant
value as an aesthetic.
Constant aesthetic: Rarely what you want:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy,
color = "blue"))
Default: Probably what you want:
ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy),
color = "blue")
Geometric Objects
ggplot2 provides a number of geoms:
geom_abline geom_area geom_bar geom_bin_2d
geom_bin2d geom_blank geom_boxplot geom_col
geom_contour geom_contour_filled geom_count geom_crossbar
geom_curve geom_density geom_density_2d geom_density_2d_filled
geom_density2d geom_density2d_filled geom_dotplot geom_errorbar
geom_errorbarh geom_freqpoly geom_function geom_hex
geom_histogram geom_hline geom_jitter geom_label
geom_line geom_linerange geom_map geom_path
geom_point geom_pointrange geom_polygon geom_qq
geom_qq_line geom_quantile geom_raster geom_rect
geom_ribbon geom_rug geom_segment geom_sf
geom_sf_label geom_sf_text geom_smooth geom_spoke
geom_step geom_text geom_tile geom_violin
geom_vline
Additional geoms are available in packages like ggforce,
ggridges, and others described on the ggplot2
extensions site .
Geoms can be added as layers to a plot.
Mappings common to all, or most, geoms can be specified in the
ggplot call:
ggplot(mpg,
aes(x = displ,
y = hwy)) +
geom_smooth() +
geom_point()
Geoms can also use different data sets.
One way to highlight Europe in a plot of life expectancy against log
income for 2007 is to start with a plot of the full data:
library(dplyr)
library(gapminder)
gm_2007 <- filter(gapminder, year == 2007)
(p <- ggplot(gm_2007, aes(x = gdpPercap,
y = lifeExp)) +
geom_point() +
scale_x_log10())
Then add a layer showing only Europe:
gm_2007_eu <- filter(gm_2007, continent == "Europe")
p + geom_point(data = gm_2007_eu,
color = "red",
size = 3)
Statistical Transformations
All geoms use a statistical transformation (stat ) to convert
raw data to the values to be mapped to the object’s features.
The available stats are
stat_align stat_bin stat_bin_2d
stat_bin_hex stat_bin2d stat_binhex
stat_boxplot stat_connect stat_contour
stat_contour_filled stat_count stat_density
stat_density_2d stat_density_2d_filled stat_density2d
stat_density2d_filled stat_ecdf stat_ellipse
stat_function stat_identity stat_manual
stat_qq stat_qq_line stat_quantile
stat_sf stat_sf_coordinates stat_smooth
stat_spoke stat_sum stat_summary
stat_summary_2d stat_summary_bin stat_summary_hex
stat_summary2d stat_unique stat_ydensity
Each geom has a default stat, and each stat has a default geom.
For geom_point the default stat is
stat_identity.
For geom_bar the default stat is
stat_count.
For geom_histogram the default stat is
stat_bin.
Stats can provide computed variables that can be mapped to
aesthetic features.
For stat_bin some of the computed variables are
count: number of points in bin
density: density of points in bin, scaled to integrate
to 1
The density variable can be accessed as
after_stat(density).
Older approaches that also work but are now discouraged:
stat(density)
..density..
By default, geom_histogram uses
y = after_stat(count).
ggplot(faithful) +
geom_histogram(aes(x = eruptions),
binwidth = 0.25,
fill = "grey",
color = "black")
Explicitly specifying y = after_stat(count) produces the
same plot:
ggplot(faithful) +
geom_histogram(aes(x = eruptions,
y = after_stat(count)),
binwidth = 0.25,
fill = "grey",
color = "black")
Using y = after_stat(density) produces a density scaled
axis.
(p <- ggplot(faithful) +
geom_histogram(aes(x = eruptions,
y = after_stat(density)),
binwidth = 0.25,
fill = "grey",
color = "black"))
stat_function can be used to add a density curve
specified as a mixture of two normal densities:
(ms <- mutate(faithful,
type = ifelse(eruptions < 3,
"short",
"long")) |>
group_by(type) |>
summarize(mean = mean(eruptions),
sd = sd(eruptions),
n = n()) |>
mutate(p = n / sum(n)))
## # A tibble: 2 × 5
## type mean sd n p
## <chr> <dbl> <dbl> <int> <dbl>
## 1 long 4.29 0.411 175 0.643
## 2 short 2.04 0.267 97 0.357
f <- function(x)
ms$p[1] * dnorm(x, ms$mean[1], ms$sd[1]) +
ms$p[2] * dnorm(x, ms$mean[2], ms$sd[2])
p + stat_function(fun = f, color = "red")
Position Adjustments
The available position adjustments:
position_dodge position_dodge2 position_fill
position_identity position_jitter position_jitterdodge
position_nudge position_stack
A bar chart showing the counts for the different cut
categories in the diamonds data:
ggplot(diamonds, aes(x = cut)) +
geom_bar()
Mapping clarity to fill shows the breakdown
by both cut and clarity in a stacked bar
chart :
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar()
The default position for bar charts is
position_stack:
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar(position = "stack")
position_dodge produces side-by-side bar
charts :
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar(position = "dodge")
position_fill rescales all bars to be equal height to
help compare proportions within bars.
ggplot(diamonds, aes(x = cut,
fill = clarity)) +
geom_bar(position = "fill")
Using the counts to scale the widths would produce a spine
plot , a variant of a mosaic plot .
This is easiest to do with the ggmosaic package.
position_jitter can be used with geom_point
to avoid overplotting or break up rounding artifacts.
Another version of the Old Faithful data available as
geyser in package MASS has some rounding in
the duration variable:
data(geyser, package = "MASS")
## Adjust for different meaning of `waiting` variable
geyser2 <- na.omit(mutate(geyser,
duration = lag(duration)))
p <- ggplot(geyser2, aes(x = duration, y = waiting))
p + geom_point()
Jittering can help break up the distracting heaping
of values on durations of 2 and 4 minutes.
The default amount of jittering isn’t quite enough in this case:
p + geom_point(position = "jitter")
To jitter only horizontally and by a larger amount you can use
p + geom_point(position =
position_jitter(height = 0,
width = 0.1))
Coordinate Systems
Coordinate system functions include
coord_cartesian coord_equal coord_fixed coord_flip
coord_map coord_munch coord_polar coord_quickmap
coord_radial coord_sf coord_trans coord_transform
The default coordinate system is coord_cartesian.
Cartesian Coordinates
coord_cartesian can be used to zoom in on a
particular regiion:
p + geom_point() +
coord_cartesian(xlim = c(3, 4))
coord_fixed and coord_equal fix the
aspect ratio for a cartesian coordinate system.
The aspect ratio is the ratio of the number physical display units
per y unit to the number of physical display units per
x unit.
The aspect ratio can be important for recognizing features and
patterns.
river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat")
r <- data.frame(flow = river, month = seq_along(river))
ggplot(r, aes(x = month, y = flow)) +
geom_point() +
coord_fixed(ratio = 4)
Polar Coordinates
A filled bar chart
(p <- ggplot(diamonds) +
geom_bar(aes(x = 1, fill = cut),
position = "fill"))
is turned into a pie chart by changing to polar coordinates:
p + coord_polar(theta = "y")
Coordinate Systems for Maps
Coordinate systems are particularly important for maps.
Polygons for many political and geographic boundaries are available
through the map_data function.
Boundaries for the lower 48 US states can be obtained as
usa <- map_data("state")
Polygon vertices are encoded by longitude and latitude.
Plotting these in the default cartesian coordinate system usually
does not work well:
usa <- map_data("state")
m <- ggplot(usa, aes(x = long,
y = lat,
group = group)) +
geom_polygon(fill = "white",
color = "black")
m
Using a fixed aspect ratio is better, but an aspect ratio of 1 does
not work well:
m + coord_equal()
The problem is that away from the equator a one degree change in
latitude corresponds to a larger distance than a one degree change in
longitude.
The ratio of one degree longitude separation to one degree latitude
separation for the latitude at the middle of Iowa of 41 degrees is
longlat <- cos(41 / 90 * pi / 2)
longlat
## [1] 0.7547096
A better map is obtained using the aspect ratio
1 / longlat:
m + coord_fixed(1 / longlat)
The best approach is to use a coordinate system designed specifically
for maps.
There are many projections used in map making.
The default projection used by coord_map is the Mercator
projection.
m + coord_map()
Proper map projections are non-linear; this is easier to see with an
Albers projection:
m + coord_map("albers", 20, 50)
Scales
Scales are used for controlling the mapping of values to physical
representations such as colors, shapes, and positions.
Scale functions are also responsible for producing guides
for translating physical representations back to values, such as
axis labels and marks;
color or shape legends.
There are currently 131 scale functions; some examples are
scale_color_gradient scale_shape_manual scale_x_log10
scale_color_manual scale_size_area scale_y_log10
scale_fill_gradient scale_x_sqrt
scale_fill_manual scale_y_sqrt
An experimental tool to
help choosing scales is available.
Start with a basic scatter plot:
(p <- ggplot(mpg, aes(x = displ,
y = hwy)) +
geom_point())
Remove the x tick marks and labels (this can also be
done with theme settings):
p + scale_x_continuous(labels = NULL,
breaks = NULL)
Change the tick locations and labels:
p + scale_x_continuous(labels =
paste(c(2, 4, 6), "ltr"),
breaks = c(2, 4, 6))
Use a logarithmic axis:
p + scale_x_log10(labels = paste(c(2, 4, 6), "ltr"),
breaks = c(2, 4, 6),
minor_breaks = c(3, 5, 7))
The Scales
section in R for Data Science
provides some more details.
Color assignment can also be controlled by scale functions.
For example, for some presidential approval ratings data
pr_appr
## pres appr party year
## 1 Obama 79 D 2009
## 2 Carter 78 D 1977
## 3 Clinton 68 D 1993
## 4 G.W. Bush 65 R 2001
## 5 Reagan 58 R 1981
## 6 G.H.W Bush 56 R 1989
## 7 Trump 40 R 2017
the default color scale is not ideal:
ggplot(pr_appr,
aes(x = appr, y = pres, fill = party)) +
geom_col()
The common assignment of red for Republican and blue for Democrat can
be obtained by
ggplot(pr_appr,
aes(x = appr, y = pres, fill = party)) +
geom_col() +
scale_fill_manual(values
= c(R = "red", D = "blue"))
A better choice is to use a well-designed color palette :
ggplot(pr_appr,
aes(x = appr, y = pres, fill = party)) +
geom_col() +
colorspace::scale_fill_discrete_diverging(
palette = "Blue-Red 2")
Facets
Faceting uses the small multiples approach to introduce
additional variables.
For a single variable facet_wrap is usually used:
p <- ggplot(mpg) +
geom_point(aes(x = displ,
y = hwy))
p + facet_wrap(~ class)
For two variables, each with a modest number of categories,
facet_grid can be effective:
p + facet_grid(factor(cyl) ~ drv)
To show common data in all facets make sure the data does not contain
the faceting variable.
This was used to show muted views of the full data in faceted
plots.
A faceted plot of the gapminder data:
library(gapminder)
years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
year %in% years_to_keep)
ggplot(gd,
aes(x = gdpPercap,
y = lifeExp,
color = continent)) +
geom_point(size = 2.5) +
scale_x_log10() +
facet_wrap(~ year)
Add a muted version of the full data in the background of each
panel:
library(gapminder)
years_to_keep <- c(1977, 1987, 1997, 2007)
gd <- filter(gapminder,
year %in% years_to_keep)
gd_no_year <- mutate(gd, year = NULL)
ggplot(gd,
aes(x = gdpPercap,
y = lifeExp,
color = continent)) +
geom_point(data = gd_no_year,
color = "grey80") +
geom_point(size = 2.5) +
scale_x_log10() +
facet_wrap(~ year)
Usually facets use common axis scales, but one or both can be allowed
to vary.
A useful approach for showing time series data with a good aspect
ratio can be to split the data into facets for non-overlapping portions
of the time axis.
pd <- rep(paste(seq(1, by = 32, length.out = 4),
seq(32, by = 32, length.out = 4),
sep = " - "),
each = 32)
rd <- data.frame(month = seq_along(river),
flow = river,
panel = pd)
ggplot(rd, aes(x = month,
y = flow)) +
geom_point() +
facet_wrap(~ panel,
scale = "free_x", #<<
ncol = 1)
Facet arrangement can also be used to convey other information, such
as geographic location.
The geofacet
package allows facets to be placed in approximate locations of
different geographic regions.
An example for data from US states:
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
geom_line() +
facet_geo(~ state,
grid = "us_state_grid2",
label = "code") +
scale_x_continuous(labels =
function(x) paste0("'", substr(x, 3, 4))) +
labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
caption = "Data Source: bls.gov",
x = "Year",
y = "Unemployment Rate (%)") +
theme(strip.text.x = element_text(size = 6),
axis.text = element_text(size = 5))
Arrangement according to a calendar can also be useful.
Themes
ggplot2 supports the notion of themes for
adjusting non-data appearance aspects of a plot, such as
Theme elements can be customized in several ways:
theme() can be used to adjust individual elements in
a plot.
theme_set() adjusts default settings for a
session;
pre-defined theme functions allow consistent style
changes.
The full
documentation of the theme function lists many
customizable elements.
One simple example:
ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl),
shape = 21,
size = 3) +
theme(legend.position = "top",
axis.text = element_text(size = 12),
axis.title = element_text(size = 14,
face = "bold"))
Another example:
gthm <-
theme(plot.background =
element_rect(fill = "lightblue",
color = NA),
panel.background =
element_rect(fill = "pink"))
p + gthm
Some alternate complete themes provided by ggplot2
are
theme_bw theme_gray theme_minimal theme_void
theme_classic theme_grey theme_dark theme_light
Some examples:
p_bw <- p + theme_bw() + ggtitle("BW")
p_classic <- p + theme_classic() + ggtitle("Classic")
p_min <- p + theme_minimal() + ggtitle("Minimal")
p_void <- p + theme_void() + ggtitle("Void")
library(patchwork)
(p_bw + p_classic) / (p_min + p_void)
The ggthemes
package provides some additional themes.
Some examples:
library(ggthemes)
p_econ <- p + theme_economist() + ggtitle("Economist")
p_wsj <- p + theme_wsj() + ggtitle("WSJ")
p_tufte <- p + theme_tufte() + ggtitle("Tufte")
p_few <- p + theme_few() + ggtitle("Few")
(p_econ + p_wsj) / (p_tufte + p_few)
ggthemes also provides theme_map that
removes unnecessary elements from maps:
m + coord_map() + theme_map()
The Themes
section in R for Data Science
provides some more details.
A More Complete Template
ggplot(data = <DATA>) +
<GEOM>(mapping = aes(<MAPPINGS>),
stat = <STAT>,
position = <POSITION>) +
< ... MORE GEOMS ... > +
<COORDINATE_ADJUSTMENT> +
<SCALE_ADJUSTMENT> +
<FACETING> +
<THEME_ADJUSTMENT>
Labels and Annotations
A basic plot:
p <- ggplot(mpg, aes(x = displ,
y = hwy))
p1 <- p + geom_point(aes(color = factor(cyl)),
size = 2.5)
p1
Axis labels are based on the expressions given to
aes.
This is convenient for exploration but usually not ideal for a
report.
The labs() function can be used to change axis and
legend labels:
p1 + labs(x = "Displacement (Liters)",
y = "Highway Miles Per Gallon",
color = "Cylinders")
The labs() function can also add a title, subtitle, and
caption:
p2 <- p1 +
labs(x = "Displacement (Liters)",
y = "Highway Miles Per Gallon",
color = "Cylinders",
title = "Gas Mileage and Displacement",
subtitle = paste("For models which had a new release every year",
"between 1999 and 2008"),
caption = "Data Source: https://fueleconomy.gov/")
p2
Annotations can be used to provide popout that draws a viewer’s
attention to particular features.
The annotate() function is one option:
p2 +
annotate("label", x = 2.8, y = 43,
label = "Volkswagens") +
annotate("rect",
xmin = 1.7, xmax = 2.1,
ymin = 40, ymax = 45,
fill = NA, color = "black")
Often more convenient are some geom_mark objects
provided by the ggforce package:
library(ggforce)
p2 +
geom_mark_hull(aes(filter = class == "2seater"),
description =
paste("2-Seaters have high displacement",
"values, but also high fuel efficiency",
"for their displacement.")) +
geom_mark_rect(aes(filter = hwy > 40),
description =
"These are Volkswagens") +
geom_mark_circle(aes(filter = hwy == 12),
description =
"Three pickups and an SUV.")
These annotations can be customized in a number of ways.
Arranging Plots
There are several tools available for assembling ensemble plots.
The patchwork
package is a good choice.
A simple example:
p1 <- ggplot(mpg, aes(x = displ,
y = hwy)) +
geom_point()
p2 <- ggplot(mpg, aes(x = cyl,
y = hwy,
group = cyl)) +
geom_boxplot()
p3 <- ggplot(mpg, aes(x = cyl)) +
geom_bar()
library(patchwork)
(p1 + p2) / p3
Animation
The gganimate
package can be used to add animation to a ggplot graph.
Start with a plot p for all years in the
gapminder data, with year in the
background:
p <- gapminder |>
arrange(desc(pop)) |>
ggplot(aes(x = gdpPercap, y = lifeExp)) +
geom_text(aes(x = 5000, y = 55, label = as.character(year)),
size = 50, color = "grey",
hjust = "center", vjust = "center") +
geom_point(aes(size = pop, fill = continent), shape = 21) +
scale_x_log10(labels = scales::comma) +
ylim(c(20, 85)) +
scale_size_area(max_size = 20,
labels = scales::comma,
breaks = c(0.25 * 10 ^ 9, 0.5 * 10 ^ 9, 10 ^ 9)) +
scale_fill_manual(values = c(Africa = "deepskyblue",
Asia = "red",
Americas = "green",
Europe = "gold",
Oceania = "brown")) +
labs(x = "Income", y = "Life expectancy") +
theme(text = element_text(size = 16)) +
guides(fill = guide_legend(title = "Continent",
override.aes = list(size = 5),
order = 1),
size = guide_legend(title = "Population",
label.hjust = 1,
order = 2)) +
theme_minimal() +
theme(panel.border = element_rect(fill = NA, color = "grey20"))
A GIF
animation:
library(gganimate)
animate(p +
transition_states(
year,
transition_length = 2,
state_length = 0))
A movie:
animate(p +
transition_states(
year,
transition_length = 2,
state_length = 0,
wrap = FALSE),
renderer = ffmpeg_renderer())
Interaction
Plotly
The ggplotly function in the plotly package can be used
to add some interactive features to a plot created with
ggplot2.
In an R session a call to ggplotly() may open a
browser window with the interactive plot.
In an RStudio session the plot appears in the graphics
panel.
In an Rmarkdown document the interactive plot is embedded in the
html file.
Another interactive plotting approach that can be used from R is
described in an Infoworld
article .
A simple example using ggplotly():
library(ggplot2)
library(plotly)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl),
shape = 21,
size = 3)
ggplotly(p)
Adding a text aesthetic allows the tooltip display to be
customized:
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point(aes(x = displ,
y = hwy,
fill = cyl,
text = paste(year,
manufacturer,
model)),
shape = 21,
size = 3)
ggplotly(p, tooltip = "text") |>
style(hoverlabel = list(bgcolor = "white"))
Ggiraph
The ggiraph
package provides another approach.
library(ggplot2)
library(ggiraph)
p <- ggplot(mutate(mpg, cyl = factor(cyl))) +
geom_point_interactive(
aes(x = displ,
y = hwy,
fill = cyl,
tooltip = paste(year,
manufacturer,
model)),
shape = 21,
size = 3)
girafe(ggobj = p)
Grammar of Interactive Graphics
There have been several efforts to develop a grammar of interactive
graphics, including ggvis and animint ;
neither seems to be under active development at this time.
A promising approach is Vega-Lite , with a Python
interface Altair and an R
interface altair to
the Python interface.
An example using the altair package:
rub <- read.csv(here::here("rubber.csv"))
library(altair)
chartTH <- alt$Chart(rub)$
mark_point()$
encode(x = alt$X("H:Q", scale = alt$Scale(domain = range(rub$H))),
y = alt$Y("T:Q", scale = alt$Scale(domain = range(rub$T))))
brush <- alt$selection_interval()
chartTH_brush <- chartTH$add_selection(brush)
chartTH_selection <-
chartTH_brush$encode(color = alt$condition(brush,
"Origin:N",
alt$value("lightgray")))
chartAT <- chartTH_selection$
encode(x = alt$X("T:Q", scale = alt$Scale(domain = range(rub$T))),
y = alt$Y("A:Q", scale = alt$Scale(domain = range(rub$A))))
chartAT | chartTH_selection
The resulting linked plots:
## Error importing Altair python package:
##
## ModuleNotFoundError: No module named 'altair'
## Run `reticulate::py_last_error()` for details.
##
## Output from reticulate::py_config():
## python: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/python
## libpython: /home/luke/.cache/R/reticulate/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/libpython3.12.so
## pythonhome: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz:/home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz
## virtualenv: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/activate_this.py
## version: 3.12.12 (main, Dec 17 2025, 21:10:06) [Clang 21.1.4 ]
## numpy: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/lib/python3.12/site-packages/numpy
## numpy_version: 2.4.2
## altair: [NOT FOUND]
##
## NOTE: Python version was forced by py_require()
## Error:
## ! Error loading Python module altair
## Error importing Altair python package:
##
## ModuleNotFoundError: No module named 'altair'
## Run `reticulate::py_last_error()` for details.
##
## Output from reticulate::py_config():
## python: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/python
## libpython: /home/luke/.cache/R/reticulate/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/libpython3.12.so
## pythonhome: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz:/home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz
## virtualenv: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/bin/activate_this.py
## version: 3.12.12 (main, Dec 17 2025, 21:10:06) [Clang 21.1.4 ]
## numpy: /home/luke/.cache/R/reticulate/uv/cache/archive-v0/L55B6HpO160mlLsIjpHUz/lib/python3.12/site-packages/numpy
## numpy_version: 2.4.2
## altair: [NOT FOUND]
##
## NOTE: Python version was forced by py_require()
## Error:
## ! Error loading Python module altair
## Error:
## ! object 'chartTH' not found
## Error:
## ! object 'chartTH_brush' not found
## Error:
## ! object 'chartTH_selection' not found
## Error:
## ! object 'chartAT' not found
Interactive Tutorial
An interactive learnr
tutorial for these notes is available .
You can run the tutorial with
STAT4580::runTutorial("ggplot")
You can install the current version of the STAT4580
package with
remotes::install_gitlab("luke-tierney/STAT4580")
You may need to install the remotes package from CRAN
first.
Exercises
In the following expression, which value of the shape
aesthetic produces a plot with points represented as triangles outlined
in black colored according to the number of cylinders?
```r
library(ggplot2)
ggplot(mpg, aes(x = displ, y = hwy, fill = factor(cyl))) +
geom_point(size = 4, shape = ---)
```
a. 15
b. 17
c. 21
d. 24
It can sometimes be useful to plot text labels in a scatterplot
instead of points. Consider the plot set up as
library(ggplot2)
library(dplyr)
data(gapminder, package = "gapminder")
p <- filter(gapminder, year == 2007) |>
group_by(continent) |>
summarize(gdpPercap = mean(gdpPercap), lifeExp = mean(lifeExp)) |>
ggplot(aes(x = gdpPercap, y = lifeExp))
Which of the following produces a plot with continent names on white
rectangles?
p + geom_text(aes(label = continent))
p + geom_label(aes(label = continent))
p + geom_label(label = continent)
p + geom_text(text = continent)
The following code plots a kernel density estimate for
the eruptions variable in the faithful data
set:
library(ggplot2)
ggplot(faithful, aes(x = eruptions)) + geom_density(bw = 0.1)
Look at the help page for geom_density. Which of the
following best describes what specifying a value for bw
does:
Changes the kernel used to construct the estimate.
Changes the smoothing bandwidth to make the result more or
less smooth.
Changes the stat used to stat_bw.
Has no effect on the retult.
This code creates a map of Iowa counties.
library(ggplot2)
p <- ggplot(map_data("county", "iowa"),
aes(x = long, y = lat, group = group)) +
geom_polygon(, fill = "White", color = "black")
Which of these produces a plot with an aspect ratio that best matches
the map on this
page ?
p + coord_fixed(0.5)
p + coord_fixed(0.75)
p + coord_fixed(1.35)
p + coord_fixed(1.95)
Consider the two plots created by this code (print the values of
p1 and p2 to see the plots):
library(ggplot2)
data(gapminder, package = "gapminder")
p1 <- ggplot(gapminder, aes(x = log(gdpPercap), y = lifeExp)) +
geom_point() +
scale_x_continuous(name = "")
p2 <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10(labels = scales::comma, name = "")
Which of these statements is true?
The x axis labels are identical in both plots.
The x axis labels in p2 are in dollars;
the labels in p1 are in log dollars.
The x axis labels in p1 are in dollars;
the labels in p2 are in log dollars.
There are no labels on the x axis in
p2.
Consider the plot created by
library(ggplot2)
data(gapminder, package = "gapminder")
p <- ggplot(gapminder, aes(x = gdpPercap, y = lifeExp)) +
geom_point() +
scale_x_log10(labels = scales::comma)
Which of these expressions produces a plot with a white
background?
p
p + theme_grey()
p + theme_classic()
p + ggthemes::theme_economist()
There are many different ways to change the x axis
label in ggplot. Consider the plot created by
library(ggplot2)
p <- ggplot(mpg, aes(x = displ, y = hwy)) +
geom_point()
Which of the following does not change the
x axis label to Displacement ?
p + labs(x = "Displacement")
p + scale_x_continuous("Displacement")
p + xlab("Displacement")
p + theme(axis.title.x = "Displacement")
