## Some River Flow Data

river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat")
plot(river)

## A Simple Model of Visual Perception

The eyes acquire an image, which is processed through three stages of memory:

• Iconic memory

• Working memory, or short-term memory

• Long-term memory

### Iconic Memory

The first processing stage of an image happens in iconic memory.

• Images remain in iconic memory for less than a second.

• Processing in iconic memory is massively parallel and automatic.

• This is called preattentive processing.

Preattentive processing is a fast recognition process.

### Working Memory

Meaningful visual chunks are moved from iconic memory to short term memory.

These chunks are used by conscious, or attentive, processing.

Attentive processing often involves conscious comparisons or search.

Short term memory is limited;

• information is retained for only a few seconds;

• only three or fours chunks can be held at a time.

Chunks can be of varying size; a coherent pattern can form a single chunk even if it is quite large.

If more chunks are needed or chunks are needed longer they need to be reacquired or retrieved from long term memory.

### Long Term Memory

Long term visual memory is built up over a lifetime, though infrequently used visual chunks may become lost.

Chunks processed repeatedly in working memory may be transferred to long term memory.

Common patterns and contextual information can be retrieved from long term memory for attentive processing in working memory.

### Visual Design Implications

Try to make as much use of preattentive features as possible.

Recognize when preattentive features might mislead.

For features that require attentive processing keep in mind that working memory is limited.

## Some Examples of Challenges

### Context Matters

Which of the inner circles is larger, or are they the same size?

Which of the lines is longer, or are they the same length?

The sine Illusion: which of the bars are longer, or are they the same length?

x <- seq(0, 5 * pi, length.out = 100)
w <- 0.5
plot(x, sin(x), ylim = c(-1, 1 + w), type = "n")
segments(x0 = x, y0 = sin(x), y1 = sin(x) + w, lwd = 3)

Which of the squares A and B is darker, or are they the same shade?

### Some Optical Illusions

R implementations of some optical illusions by Kohske Takahashi:

Are these lines parallel?

Again, are these lines parallel?

Scintillating grid illusion: Black dots at the intersections appear and disappear; are they real?

Hermann grid illusion: “Ghost-like” grey blobs at the intersections of white lines.

Grid illusion in a cartogram:

A case where there are more dots than we can see at once:

Some illusions in pictures:

A large collection of optical illusions is available at http://www.michaelbach.de/ot/index.html.

Other links to optical illusions can be found here.

### Motion

n <- 50
x <- 2 * (1 : n)
y <- rep(2, n)
lim <- c(min(x) + 0.1 * (max(x) - min(x)), max(x) - 0.1 * (max(x) - min(x)))
v <- TRUE
for (i in 1 : 2) {
plot(x + v, y, xlim = lim)
v <- ! v
Sys.sleep(0.1)
}

Images contain no information about the direction of the rotation.

library(rgl)
d <- data.frame(x = rnorm(1000), y = rnorm(1000), z = rnorm(1000))
points3d(d)
par3d(FOV = 1)  ## removes perspective distortion
if (interactive())
play3d(spin3d(axis = c(0, 0, 1), rpm = 30), duration = 20)

Interactive rotation can help.

Depth cueing and perspective can also help.

### Popout and Distractors

Where is the red dot:

## Items, Attributes, Marks, and Channels

To evaluate or design a visualization it is useful to have some terms for the components.

Several schools have developed different but similar sets of terms.

Some References:

• Bertin, J. (1967), The Semiology of Graphics.
• Cleveland, W. S. (1988), The Elements of Graphing Data.
• Cleveland, W. S. (1993), Visualizing Data.
• Few, S. (2012), Show Me the Numbers: Designing Tables and Graphs to Enlighten, 2nd ed.
• Munzner, T. (2014), Visualization Analysis and Design
• Ware, C. (2012), Information Visualization: Perception for Design, 3rd ed.
• Wilkinson, L. (2005), The Grammar of Graphics, 2nd ed.

Munzner uses the terminology of items, attributes, links, marks, and channels:

• Items are the basic units on which data is collected: the entities represented by the rows in a tidy data frame.

• Attributes are the numerical or categorical features of the data items we want to represent; the variables in a tidy data frame.

• Links are relations among items: e.g. months within a year, or countries within a continent.

• Marks are the geometric entities used to represent items: points, lines, areas. These correspond to the simple geom objects in ggplot.

• Visual channels are features of marks that can be used to reflect values of attributes.

Channels correspond approximately to aesthetics in ggplot but are more focused on the visual aspect:

• An x aesthetic that is transformed to polar coordinates is a different channel than an x aesthetic representing a position on a standard linear scale.

Channels are used to encode attributes (aesthetic mappings).

A single attribute can be encoded in several channels.

• This makes the attribute easier to perceive.

• But is also uses up available channels.

Some channels are well suited to encode quantitative or ordered values; they are quantitatively perceived.

Others are only suited for nominal values.

A useful classification, adapted from Few(2012):

Type Channel Quantitatively Perceived?
Form Length Yes
Width Yes, but limited
Orientation No
Size Yes, but limited
Shape No
Color Hue No
Intensity Yes, but limited
Position 2D position Yes

Munzner uses the terms magnitude channels and identity channels.

Ideally,

• numeric and ordered attributes should be represented by quantitatively perceived channels;

• unordered categorical attributes should be represented by non-quantitative channels

A useful principle: The most important attributes should be mapped to the most effective channels.

## Channel Effectiveness

• What kind and how much information can a channel encode?

• Are some channels better than others?

• Can channels be used independently, or do they interfere?

Some criteria for evaluating channels:

• Accuracy: How well can a viewer decode the information in the channel?

• Discriminability: How easily can differences between attribute levels be perceived?

• Separability: Can channels be used independently or is there interference?

• Popout: can a channel provide popout where a difference is perceived preattentively?

• Grouping: can a channel show perceptual grouping of items?

### Channel Accuracy

Stevens (1957) argues that accuracy of magnitude channels can be described by a power law:

$\text{perceived sensation} = (\text{physical intensity})^\gamma$

Experiments by Stevens suggest these values for some visual channels:

Others have raised concerns about the validity of these findings.

Another approach has used controlled experiments to assess accuracy of various channels used in visualizations:

• William S. Cleveland and Robert McGill (1984), “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,” Journal of the American Statistical Association 79, 531–554.

• William S. Cleveland and Robert McGill (1987), “Graphical Perception: The Visual Decoding of Quantitative Information on Graphical Displays of Data” Journal of the Royal Statistical Society. Series A, 192-229.

• Jeffrey Heer and Michael Bostock (2010) “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design,” Proceedings of the SIGCHI, 203-212.

Munzner’s ordering by accuracy:

This ordering is sometimes referred to as a perceptual ladder.

Line width is another channel; not sure there is agreement on its accuracy, but it is not high.

### Discriminability

Many channels, in particular identity channels, can only support a limited number of discriminable levels.

• Line width is one of the most limited with perhaps 3 levels.

• Using more than 5 or 6 color hues is not recommended.

• Similarly, using more than 5 or 6 symbol shapes can create difficulties.

If the number of levels that can be represented by a channel is smaller than the number of attribute levels then some form of meaningful aggregation is needed.

### Separability

Some encodings can be used independently of each other; others interfere with each other to some degree.

• Vertical an horizontal position can be used independently.

• Color (hue) and position can be used independently

• Size and hue interfere somewhat; hue is harder to perceive on smaller objects.

• Width and height do not function well independently; the result is perceived primarily as shape.

• Encoding two different values in the red and green channels as a hue does not work at all.

### Popout

Many channels support visual popout: having one item or a few items immediately stand out from the others.

• Color (hue and intensity) do this well.

• Shape and size can also be used effectively to create popout.

Annotation can also be used to create popout.

### Grouping

Perceptual grouping can be achieved in several ways:

• Using an identity channel to to represent items as a group.

• By enclosure.

• By spatial proximity.

## Experimental Evidence

### Cleveland-McGill

The 1984 paper is available from JSTOR.

The paper formulates a theory for ranking Elementary Perceptual Tasks; these correspond to channel mappings.

Some orderings were addressed by informal experiments (obvious to the authors at least).

Others were assessed by formal experiments with about 50 subjects.

Experiments focused on accuracy of decoding, though this is not viewed as the primary purpose of a graph:

One must be careful not to fall into a conceptual trap of adopting accuracy as a criterion. … The power of a graph is its ability to enable one to take in the quantitative information, organize it, and see patterns and structure not readily revealed by other means of studying the data.

Their premise:

A graphical form that involves elementary perceptual tasks that lead to more accurate judgments than another graphical form (with the same quantitative information) will result in better organization and increase the chances of a correct perception of patterns and behavior."

• Identify which of two marked items is smaller.

• Estimate the percentage the smaller is of the larger.

Results:

Percent large errors:

Absolute error:

### Heer and Bostock

Heer and Bostock (2010) set out to replicate the Cleveland McGill experiment using crowd sourcing via Amazon Mechanical Turk

They used the five position stimuli from Cleveland and McGill and some new ones:

50 subjects were recruited for each task.

Results were consistent with Cleveland-McGill results:

Use of Mechanical Turk was deemed a success.

### Pie Chart Experiments

Pie charts are popular but somewhat controversial.

• Pie charts are inferior for comparisons to bar charts.

• Pie charts are quite good at representing part-whole relationships.

• Cleveland and McGill suggested pie charts are read by angle.

• Kosara and Skau report experiments that suggest this is not the case.

• If it were, donut charts would be even less effective, but they seem to be very comparable.

• Kosara’s blog provides a review of other pie chart studies.

## Improving Some Common Charts

Cleveland and McGill set out to suggest improvements to some common charts. This is a selection of their examples.

### Dot Charts

Cleveland and McGill use their perceptual ladder to argue strongly for using dot charts in place of bar charts and pie charts.

### Playfair’s Balance of Trade plots

Playfair presented a number of plots showing imports and exports between England and other nations.

A primary goal was to show the balance of trade, the difference between exports and imports:

Assessing the differences from a plot showing exports and imports as separate curves requires length judgments, which are less accurate than comparisons to a common stale.

Plotting the difference makes the balance of trade much easier to assess:

### Framed Unaligned Bars

It is difficult to compare lengths of unaligned rectangles when the lengths are close.

Adding a frame moves the task up the perceptual ladder to an unaligned comparison against a common scale.

Comparing to a common scale is still the most effective approach:

But this does suggest that using unaligned framed rectangles to encode a third variable, with position encoding the two primary variables may be effective.

### Framed Rectangle Maps

A choropleth map is a common way to depict a quantitative variable in a geographic context.

Cleveland and McGill suggest the use of framed rectangles positioned on the map as an alternative.

This does not seem to have caught on so far, though you do sometimes see the use of other glyphs, such as pie charts.

## Analyzing a Design

Graph layout involves several levels:

• Primary data representation

• items, attributes
• marks, channels
• Supporting features

• trend lines, reference lines, annotations
• axes, legends

A useful structure for describing the primary features:

• What are the data items?
• What are the attributes?
• What marks are used?
• What channels are used?
• Which attribute is matched to channel 1
• Which attribute is matched to channel 2

Useful questions:

• Are the most important attributes mapped to the strongest channels?
• Do the mappings do a good job of conveying the primary message?
• If not, can the graph be improved by adjusting the mappings?

### A Gapminder Plot

One of the frames of a plot from the GapMinder site:

• Items are countries (in a particular year)

• Attributes in the plot:
• life expectancy
• income per person
• population
• continent
• country name
• year
• Marks: points (or bubbles).

• Channels and mappings:
• vertical position, mapped to life expectancy
• area, mapped to population
• color (hue), mapped to continent
• interactive: mouse-over, mapped to country name
• interactive: time (or frame), mapped to year

The basic plot can be created with ggplot and aesthetic mappings:

library(gapminder)
ggplot(filter(gapminder, year == 2007)) +
geom_point(aes(x = gdpPercap,
y = lifeExp,
size = pop,
color = continent)) +
scale_x_log10() +
ylim(c(20, 85))

The data in the gapminder package differ somewhat from the data used by the Gapminder site, but overall the plot designs are very close.

With some adjustments the basic plot can be brought close to the Gapminder version in appearance:

gm2007 <- filter(gapminder, year == 2007) %>%
arrange(desc(pop)) ## sort to avoid over-plotting
ggplot(gm2007) +
geom_point(aes(x = gdpPercap,
y = lifeExp,
size = pop,
fill = continent),
shape = 21) + ## to allow the using fill aesthetic
scale_x_log10() +
ylim(c(20, 85)) +
scale_size_area(max_size = 20,
labels = comma,
breaks = c(0.25 * 10 ^ 9, 0.5 * 10 ^ 9, 10 ^ 9)) +
scale_fill_manual(values = c(Africa = "deepskyblue",
Asia = "red",
Americas = "green",
Europe = "gold",
Oceania = "brown")) +
labs(x = "Income", y = "Life expectancy") +
theme(text = element_text(size = 16)) +
guides(fill = guide_legend(title = "Continent",
override.aes = list(size = 5),
order = 1),
size = guide_legend(title = "Population",
label.hjust = 1,
order = 2)) +
theme_minimal() +
theme(panel.border = element_rect(fill = NA, color = "grey20"))

Some notes:

• Using larger bubbles makes the plot more engaging.

• Using larger bubbles makes differences in population easier to assess, but makes the strength of the relationship between life expectancy and income harder to assess.

• Using larger bubbles also increases the risk of over-plotting. Sorting the rows so larger bubble are drawn first helps reduce the risk somewhat.

These plots show a fairly strong marginal association between life expectancy and income.

There does not seem to be a strong association of population with the other two variables, but these plots are not ideal for that assessment.

To judge the marginal association between life expectancy and population size we can change the channel mapping:

• map the horizontal axis to population;
• map area to income.
ggplot(filter(gapminder, year == 2007)) +
geom_point(aes(x = pop,
y = lifeExp,
size = gdpPercap)) +
scale_x_log10() +
ylim(c(20, 85)) +
theme_minimal() +
theme(panel.border =
element_rect(fill = NA,
color = "grey20"))

This confirms that there is very little association between life expectancy and population.

The association between life expectancy and income is still visible, but is easier to assess when these two variables are mapped to 2D position.

### Michelin Stars

This image, from a blog post, shows the total number of stars for different countries:

• Items: Countries (in a particular year)

• Attributes:
• country name
• number of stars
• Marks: bubbles, text

• Channels and mappings:
• bubble color, mapped to country
• bubble area, mapped to number of stars
• text, mapped to country name (where possible)
• text, mapped to number of stars (where possible)

Observations:

• None of the channels are very strong.

• The strongest channels, 2D position, are not used.

• The number of colors used is too high.

A simple dot plot would convey the distribution better.

A dot plot using 2017 data from an article in The Telegraph:

michelin <- read.table(WLNK("michelin.dat"),
michelin <- mutate(michelin,
stars = one + 2 * two + 3 * three,
country = reorder(country, stars))
ggplot(michelin, aes(x = stars, y = country)) +
geom_point() +
labs(x = "Stars", y = NULL) +
theme_minimal() +
theme(text = element_text(size = 16)) +
theme(panel.border =
element_rect(fill = NA,
color = "grey20"))

Even if a bubble plot is desired for aesthetic reasons, position could be used to

• group countries by continent;
• show countries on a map.

A plot like the original can be constructed by

• computing locations for a set of packed spheres with specified radii;
• using geom_point and geom_text.
## create randomly located  packed spheres with specified areas
library(packcircles)
set.seed(54321)
packing <- circleProgressiveLayout(michelin$stars) ## merge circle locations with starts data mcirc <- bind_cols(packing, michelin) %>% mutate(country = factor(country)) ## clean out stray attributes ## compute some colors to use nr <- nrow(mcirc) pal <- colorRampPalette(RColorBrewer::brewer.pal(9, "Set1")) cols <- sample(pal(nr), nr) ## these need tuning for the screen or R markdown csize <- 45 tsize <- 2.5 ## the basic plot ## uses shape = 19 and color aesthetic p <- ggplot(mcirc, aes(x = x, y = y)) + geom_point(aes(size = radius ^ 2, color = country), shape = 19) + scale_size_area(max_size = csize) + geom_text(aes(label = paste(country, stars, sep = "\n")), data = filter(mcirc, stars >= 120), size = tsize) + coord_fixed() ## adjustments p + guides(size = "none") + scale_color_manual(values = cols) + xlim(with(mcirc, range(x) + c(-1, 1) * max(radius))) + ylim(with(mcirc, range(y) + c(-1, 1) * max(radius))) + theme_void() A somewhat more robust approach • creates vertex data for polygon approximations to the circles; • merges the stars data with the polygon data; • uses geom_polygon and geom_text. This is very similar to the way simple choropleth maps are created. ## compute polygon approximations to spheres mcircpoly <- circleLayoutVertices(mcirc, idcol = "country", npoints = 100) %>% rename(country = id) ## merge the stars data into the polygon data sttab <- select(michelin, country, stars) mcircpoly <- left_join(mcircpoly, sttab, "country") ## create the plot ggplot(mcircpoly, aes(x = x, y = y)) + geom_polygon(aes(fill = country)) + geom_text(aes(label = paste(country, stars, sep = "\n")), data = filter(mcirc, stars >= 120), size = tsize) + coord_fixed() + scale_fill_manual(values = cols) + theme_void() ## Aspect Ratio and Perception The river flow data shows how important aspect ratio can be to our ability to detect patterns: Using a line plot the basic periodicity becomes apparent even in the first aspect ratio. p2 <- p0 + geom_line() p2 + coord_fixed(ratio = 35) But the steeper increase/shallower decrease of most periods is easier to see in the second aspect ratio: p2 + coord_fixed(ratio = 4) The aspect ratio also influences interpretation of results. Some alternative views of (suspect) data on the number of people on government assistance over a time period: library(readr) w <- read_csv(WLNK("hw2-welfare.csv")) w <- mutate(w, onAssistance = onAssistance / 10 ^ 6) w <- mutate(w, date = seq(as.Date("2009-01-01"), by = "quarter", length.out = 10)) p0 <- ggplot(w, aes(x = date, y = onAssistance)) + theme_minimal() + theme(panel.border = element_rect(fill = NA, color = "grey20")) p1 <- p0 + geom_line(aes(group = 1)) grid.arrange(p1, p1 + coord_fixed(ratio = 8), p1 + ylim(0, max(w$onAssistance)),
p0 + geom_col(width = 40), ncol = 2)

Some notes:

• Automated choices of axis scaling can affect the aspect ratio of the content of a plot.

• A zero base line supports ratio comparisons.

• The preattentive response to bar charts is always to compare ratios, so using a zero base line is important.

• Using a non-zero base line for line plots and scatter plots encourages interval, or difference, comparisons.

• Research on the effect of aspact ratio on perception has focused on accuracy of slope comparisons.

• The general message is that keeping away from slopes that are too steep or too shallow is best.

• Banking to 45 degrees, or choosing an aspect ratio so the slope magnitudes are distributed around 45 degrees is often recommended.

• This also tends to be a useful “neutral ground” when political implications are involved.

Some references:

• A blog post by Robert Kosara.

• William S. Cleveland, Marylyn E. McGill and Robert McGill (1988), “The Shape Parameter of a Two-Variable Graph”, Journal of the American Statistical Association (JSTOR)

• Justin Talbot, John Gerth, Pat Hanrahan (2012), “An Empirical Model of Slope Ratio Comparisons”, IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis) (PDF)

## Ensemble Plots and Faceting

Using multiple channels allows a single plot to show a lot of information.

But over-plotting and interference can become problems.

One alternative is to use several related views in a useful arrangement.

Such arrangements are sometimes called ensemble plots.

There are a number of variations; a few are introduced below.

### Similar Plots with Different Variables and Shared Encodings

One way to show three continuous variables is with two plots that share an axis:

library(patchwork)
pe_thm <- theme_minimal() +
theme(panel.background = element_rect(color = "black",
size = 0.5,
fill = NA))

p1 <- ggplot(gm2007, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point() +
scale_x_log10() +
guides(color = "none") +
pe_thm
p2 <- ggplot(gm2007, aes(x = pop / 10 ^ 6, y = lifeExp, color = continent)) +
geom_point() +
scale_x_log10() +
pe_thm +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank())
p1 + p2

Life expectancy is mapped to the vertical position in both plots.

Continent is mapped to color in both plots.

Guides can be shared when encodings are shared.

### Different Plots with Shared Encodings

Multiple views of the data are often helpful.

Sharing encodings makes the relations between views easier to perceive.

p3 <- ggplot(gm2007,
aes(x = pop / 10 ^ 6,
y = reorder(continent, pop, FUN = sum),
fill = continent)) +
geom_col() +
ylab(NULL) +
pe_thm
p1 + p3

### Small Multiples

Small multiples refers to a collection of plots with identical structure showing different subsets of the data and organized in a useful way.

These plot collections are also called trellis plots, lattice plots, or faceted plots.

A plot of life expectancy against income per capita in 2007 faceted by continent:

gd <- filter(gapminder, year %in% c(1977, 1987, 1997, 2007))
gd2007 <- filter(gapminder, year == 2007)
fct_thm <- theme_minimal() +
theme(panel.background = element_rect(color = "black",
size = 0.5,
fill = NA))
ggplot(gd2007, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(size = 2.5) +
scale_x_log10() +
facet_wrap(~ continent) +
fct_thm

A useful variation is to show a muted view of the full data in the background:

ggplot(gd2007, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(data = mutate(gd2007, continent = NULL), color = "grey80") +
geom_point(size = 2.5) +
scale_x_log10() +
facet_wrap(~ continent) +
fct_thm

Data can be facetet on two variables.

This plot shows the full data faceted by both continent and a set of years:

ggplot(gd, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(size = 2.5) +
scale_x_log10() +
facet_grid(continent ~ year) +
fct_thm

Adding muted data for each year helps regonizing where each continent group fits within a year

ggplot(gd, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point(data = mutate(gd, continent = NULL), color = "grey80") +
geom_point(size = 2.5) +
scale_x_log10() +
facet_grid(continent ~ year) +
fct_thm

## Exercises

1. Which of the following channels are magnitude channels and which are identity channels?

1. Position on a common scale
2. Length
3. Color hue (red, green, etc.)
4. Symbol shape (dot, cross, etc.)
2. Consider the following visualizations of the 2017 Michelin star data:

Which plot makes it easier to compare the numbers of stars for different countries? Explain your conclusion by identifying the channels used and their relative strengths.

1. Identify the items, attributes, marks, channels, and mappings use in the following plot: