## 1. Gapminder Tooltips

First create the plot object with a text aesthetic mapped to country:

library(dplyr)
library(ggplot2)
library(gapminder)
gap <- filter(gapminder, year %% 10 == 7 & year >= 1977)
p <- ggplot(gap, aes(x = gdpPercap, y = lifeExp,
color = continent,
size = pop,
text = country)) +
geom_point() + scale_size_area(max_size=8) + facet_wrap(~year)

Then specify the text aesthetic as the tooltip in the ggplotly call:

library(plotly)
ggplotly(p, tooltip = "text")

## 2. Arrival and Departure Delays

It is a good idea to start with a look at the distributions of the two variables individually, such as a pair of density plots:

library(nycflights13)
fl <- tidyr::gather(flights, which, delay, ends_with("delay"))
ggplot(fl) + geom_density(aes(x = delay)) + facet_wrap(~which)
## Warning: Removed 17685 rows containing non-finite values (stat_density).

Most of the flights have delays less than one hour:

mean(flights$dep_delay <= 60 & flights$arr_delay, na.rm = TRUE) * 100
## [1] 90.23529

The percentage of flights with departure delays of more than 10 hours is extremely small:

mean(flights\$dep_delay > 600, na.rm = TRUE) * 100
## [1] 0.01217578

For flights with departure delays over three hours the departure and arrival delays are close to identical:

ggplot(filter(flights, dep_delay >= 180),
aes(x = dep_delay, y = arr_delay)) +
geom_abline(intercept = 0, slope = 1, color = "blue", lty = 2) +
geom_point(alpha = 0.1)
## Warning: Removed 57 rows containing missing values (geom_point).

A single scatterplot that captures these observation could:

• use a 1% sample of the data, limited to departure delays less that 10 hours;
• use alpha blending and point size adjustment to reduce over-plotting:
set.seed(12345)
fl <- sample_n(filter(flights, dep_delay <= 600), 3000)
ggplot(fl, aes(x = dep_delay, y = arr_delay)) +
geom_point(alpha = 0.1, size = 1)
## Warning: Removed 10 rows containing missing values (geom_point).

Raising the alpha level to make the extreme delays more visible and using density contours to show the concentration near the origin is another option.

ggplot(fl, aes(x = dep_delay, y = arr_delay)) +
geom_point(alpha = 0.3, size = 1) +
geom_density2d(bins = 5, color = "red")
## Warning: Removed 10 rows containing non-finite values (stat_density2d).
## Warning: Removed 10 rows containing missing values (geom_point).

## 3. Evaluate a Visualization

The Vox visualization attracted some attention in the internet; some examples:

Analysis of the visualization:

• Items: diseases and associated measurements.

• Attributes: disease, money raised; deaths.

• Marks: circles, text.

• Channels: vertical position, area, color (hue), text.

• Mappings:

• Ranks within the numeric variables are mapped to vertical position.

• Magnitudes of numeric variables are mapped to circle areas.

• Magnitudes are also mapped to text labels.

• Disease is mapped to color (hue).

A goal of the visualization is to show the discrepancy between the relative amounts raised and the relative numbers of deaths. This relation is communicated by matching the positions or sizes of the corresponding circles by color, a weaker channel.

One good alternative, used in one of the links above, is a scatter plot:

Other options:

• a Tufte-style slope graph using standardized variables or ranks (essentially a parallel coordinates plot; used in another of the links above);

• visualizing a derived variable, such as funds per death.

There are issues with the data; some of these are discussed in the articles linked to above.