A Simple Model of Visual Perception

The eyes acquire an image, which is processed through three stages of memory:

Iconic Memory

The first processing stage of an image happens in iconic memory.

  • Images remain in iconic memory for less than a second.

  • Processing in iconic memory is massively parallel and automatic.

  • This is called preattentive processing.

Preattentive processing is a fast recognition process.

Working Memory

Meaningful visual chunks are moved from iconic memory to short term memory.

  • These chunks are used by conscious, or attentive, processing.

  • Attentive processing often involves conscious comparisons or search.

  • Short term memory is limited;

    • information is retained for only a few seconds;

    • only three or fours chunks can be held at a time.

  • Chunks can be of varying size; a coherent pattern can form a single chunk even if it is quite large.

  • If more chunks are needed or chunks are needed longer they need to be reacquired or retrieved from long term memory.

Long Term Memory

Long term visual memory is built up over a lifetime, though infrequently used visual chunks may become lost.

  • Chunks processed repeatedly in working memory may be transferred to long term memory.

  • Common patterns and contextual information can be retrieved from long term memory for attentive processing in working memory.

Visual Design Implications

  • Try to make as much use of preattentive features as possible.

  • Recognize when preattentive features might mislead.

  • For features that require attentive processing keep in mind that working memory is limited.

Some Examples of Challenges

Context Matters

Which of the inner circles is larger, or are they the same size?

Which of the lines is longer, or are they the same length?

The sine Illusion: which of the bars are longer, or are they the same length?

x <- seq(0, 5 * pi, length.out = 100)
w <- 0.5
plot(x, sin(x), ylim = c(-1, 1 + w), type = "n")
segments(x0 = x, y0 = sin(x), y1 = sin(x) + w, lwd = 3)

Which of the squares A and B is darker, or are they the same shade?

Some Optical Illusions

R implementations of some optical illusions by Kohske Takahashi:

Are these lines parallel?

Again, are these lines parallel?

Black dots at the intersections appear and disappear; are they real?

A large collection of optical illusions is avaialble at http://www.michaelbach.de/ot/index.html.

Other links to optical illusions can be found here.


n <- 50
x <- 2 * (1 : n)
y <- rep(2, n)
lim <- c(min(x) + 0.1 * (max(x) - min(x)), max(x) - 0.1 * (max(x) - min(x)))
v <- TRUE
while (TRUE) { plot(x + v, y, xlim = lim); v <- ! v; Sys.sleep(0.1) }
d <- data.frame(x = rnorm(1000), y = rnorm(1000), z = rnorm(1000))
par3d(FOV=1)  ## removes perspective distortion
play3d(spin3d(axis = c(0, 0, 1), rpm = 30), duration = 20)

Popout and Distractors

Where is the red dot:

Items, Attributes, Marks, and Channels

To evaluate or design a visualization it is useful to have some terms for the components.

Several schools have developed different but similar sets of terms.

Some References:

Munzner uses the terminology of items, attributes, marks, and channels:

Channels correspond approximately to aesthetics in ggplot but are more focused on the visual aspect:

Channels are used to encode attributes (aesthetic mappings).

A single attribute can be encoded in several channels.

Some channels are well suited to encode quantitative or ordered values; they are quantitatively perceived.

Others are only suited for nominal values.

A useful classification, adapted from Few(2012):

Type Channel Quantitatively Perceived?
Form Length Yes
Width Yes, but limited
Orientation No
Size Yes, but limited
Shape No
Color Hue No
Intensity Yes, but limited
Position 2D position Yes

Munzner uses the terms magnitude channels and identity channels.


A useful principle: The most important attributes should be mapped to the most effective channels.

Channel Effectiveness

Some questions about channels:

Some criteria for evaluating channels:

Channel Accuracy

Stevens (1957) argues that accuracy of magnitude channels can be described by a power law:

\[ \text{perceived sensation} = (\text{physical intensity})^\gamma \]

Experiments by Stevens suggest these values for some visual channels:

Others have raised concerns about the validity of these findings.

Another approach has used controlled experiments to assess accuracy of various channels used in visualizations:

  • William S. Cleveland and Robert McGill (1984), “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods,” Journal of the American Statistical Association 79, 531–554.

  • William S. Cleveland and Robert McGill (1987), “Graphical Perception: The Visual Decoding of Quantitative Information on Graphical Displays of Data” Journal of the Royal Statistical Society. Series A, 192-229.

  • Jeffrey Heer and Michael Bostock (2010) “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design,” Proceedings of the SIGCHI, 203-212.

Munzner’s ordering by accuracy:

Magnitude Channels (Ordered, Numerical) Identity Channels (Categorical)
Position on common scale Spatial grouping
Position on unaligned scale Color hue
Length (1D size) Shape
Tilt, angle
Area (2D size)
Depth (3D position)
Color luminance, saturation
Curvature, volume (3D size)

Line width is another channel; not sure there is agreement on its accuracy, but it is not high.


Many channels, in particular identity channels, can only support a limited number of discriminable levels.

  • Line width is one of the most limited with perhaps 3 levels.

  • Using more than 5 or 6 color hues is not recommended.

  • Similarly, using more than 5 or 6 symbol shapes can create difficulties.

If the number of levels than can be represented by a channel is smaller than the number of attribute levels then some form of meaningful aggregation is needed.


Some encodings can be used independently of each other; others interfere with each other to some degree.

  • Vertical an horizontal position can be used independently.

  • Color (hue) and position can be used independently

  • Size and hue interfere somewhat; hue is harder to perceive on smaller objects.

  • Width and height do not function well independently; the result is perceived primarily as shape.

  • Encoding two different values in the red and green channels as a hue does not work at all.


Many channels support visual popout: having one item or a few items immediately stand out from the others.

  • Color (hue and intensity) do this well.

  • Shape and size can also be used effectively to create popout.

Annotation can also be used to create popout.


Perceptual grouping can be achieved in several ways:

  • Using an identity channel to to represent items as a group.

  • Using link marks.

  • By enclosure.

  • By spatial proximity.

Experimental Evidence


  • The 1984 paper is available from JSTOR.

  • The paper formulates a theory for ranking Elementary Perceptual Tasks; these correspond to channel mappings.

  • Some orderings were addressed by informal experiments (obvious to the authors at least).

  • Others were assessed by formal experiments with about 50 subjects.

  • Experiments focused on accuracy of decoding, though this is not viewed as the primary purpose of a graph:

    • “One must be careful not to fall into a conceptual trap of adopting accuracy as a criterion. … The power of a graph is its ability to enable one to take in the quantitative information, organize it, and see patterns and structure not readily revealed by other means of studying the data.”
  • Their premise:

    • “A graphical form that involves elementary perceptual tasks that lead to more accurate judgments than another graphical form (with the same quantitative information) will result in better organization and increase the chances of a correct perception of patterns and behavior.”
  • The tasks: For each setting

    • Identify which of two marked items is smaller.

    • Estimate the percentage the smaller is of the larger.

  • Results:

    • Percent large errors:

    • Absolute error:

Heer and Bostock

  • Heer and Bostock (2010) set out to replicate the Cleveland McGill experiment using crowd sourcing via Amazon Mechanical Turk

  • They used the five position stimuli and some new ones.

  • 50 subjects were recruited for each task.

  • Results were consistent with Cleveland-McGill results:

  • Use of Mechanical Turk was deemed a success.

Pie Chart Experiments

Pie charts are popular but somewhat controversial.

  • Pie charts are inferior for comparisons to bar charts.

  • Pie charts are quite good at representing part-whole relationships.

  • Cleveland and McGill suggested pie charts are read by angle.

  • Kosara and Skau report experiments that suggest this is not the case.

  • If it were, donut charts would be even less effective, but they seem to be very comparable.

  • Kosara’s blog provides a review of other pie chart studies.

Improving Some Common Charts

Cleveland and McGill set out to suggest improvements to some common charts. This is a selection of their examples.

Dot Charts

Cleveland and McGill use their perceptual ladder to argue strongly for using dot charts in place of bar charts and pie charts.

Tukey’s Hanging Rootogram

Suggested earlier, but in line with their principles:

  • Comparing a histogram to a theoretical frequency model requires assessing differences.

  • It also suffers from the fact that changes in the tails are hard to see.

  • Tukey suggested the hanging rootogram:

    • plot square roots of frequencies instead of frequencies;

    • move the bars so their tops line up with the theoretical frequencies.

  • The assessment can now be based on comparing the bottoms of the bars to the zero line; thus comparisons are on an aligned scale, the highest ranked channel.

A simple example:

The rootogram function in the latticeExtra package implements this idea:

lambda <- 500
n <- 1000
x <- rpois(n, lambda = lambda)
rootogram(~x, dfun = function(x) dpois(x, lambda = lambda))

The help page shows an example for use with binned continuous data. This approach is also more suitable for this discreate example with many possible levels:

h <- hist(x, plot = FALSE)
scale.factor <- sum(dpois(h$mids, lambda))
rootogram(counts ~ mids, data = h,
          dfun = function(x) dpois(x, lambda = lambda) / scale.factor)

The vcd package provides another version that may be more suitable for data with a small number of distinct observations:

observed <- table(rnbinom(200, size = 1.5, prob = 0.8))
fitted <- dnbinom(as.numeric(names(observed)),
                  size = 1.5, prob = 0.8) * sum(observed)
vcd::rootogram(observed, fitted)

Playfair’s Balance of Trade plots

  • Playfair presented a number of plots showing imports and exports between England and other nations.

  • A primary goal was to show the balance of trade, the difference between exports and imports:

  • Assessing the differences from a plot showing exports and imports as separate curves requires length judgments, which are less accurate than comparisons to a common stale.

  • Plotting the difference makes the balance of trade much easier to assess:

Framed Unaligned Bars

It is difficult to compare lengths of unaligned rectangles when the lengths are close.

Adding a frame moves the task up the perceptual ladder to an unaligned comparison against a common scale.

Comparing to a common scale is still the most effective approach:

But this does suggest that using unaligned framed rectangles to encode a third variable, with position encoding the two primary variables may be effective.

Framed Rectangle Maps

  • A choropleth map is a common way to depict a quantitative variable in a geographic context.

  • Shading is quite low on the perceptual ladder.

  • Cleveland and McGill suggest the use of framed rectangles positioned on the map as an alternative.

  • This does not seem to have caught on so far, though you do sometimes see the use of other glyphs, such as pie charts.

Analyzing a Design

Graph layout involves several levels:

A useful structure for describing the primary features:

Useful questions:

Is Uber Replacing Taxis in Central Manhattan?

Overall pickups among taxis, Uber, and similar services changed very little within central Manhattan from 2014 to 2015.

Taxi pickups decreased and Uber pickups increased. Was this a substitution of Uber for taxis?

An article on the FiveThirtyEight examines this question with a graph:

  • Items: Taxi zones in Mahattan’s core.
  • Attributes/variables:
    • change in taxi pickups
    • change in Uber pickups
    • total number of pickups in the 2015 period
  • Marks: points.
  • Channels:
    • horizontal position, mapped to Uber change
    • vertical position, mapped to taxi change
    • point size, mapped to total number of pickups
  • Supporting features:
    • Negative 45 degree line
    • Regression line
    • Annotations

An approximate reconstruction using an approximation to the data in the graph:

if (! file.exists("nyuber.dat"))
tu <- read.table("nyuber.dat", head = TRUE)
ggplot(tu, aes(x = uber, y = taxi)) +
    geom_point(aes(size = rides)) +
    geom_abline(slope = -1, linetype = 2) +
    geom_smooth(method = "lm", se = FALSE, color = "red") +

The primary question is to what degree at the zone level decreases in taxi pickups match increases in Uber pickups.

The strongest channels, 2D position, have been used for the two change variables.

Answering the question requires assessing the position of a point relative to the 1:1 replacement line. This is much harder than assessing position relative to an axis.

An alternate approach is to plot the number of taxi pickups lost per Uber ride gained against uber gains or taxi losses:

This allows the taxi/tradeoff to be assessed by comparison to a common scale.

Several other variations are possible.

Michelin Stars

This image, from a blog post, shows the total number of stars for different countries:

  • Items: Countries
  • Attributes:
    • country name
    • number of stars
  • Marks: circles, text
  • Channels:
    • color, mapped to country
    • area, mapped to number of stars
    • text, mapped to country name (where possible)
    • text, mapped to number of stars (where possible)


  • None of the channel are very strong.

  • The strongest channels, 2D position, are not used.

  • The number of colors used is too high.

  • A simple dot plot would convey the distribution better.

  • Even if a bubble plot is desired for aesthetic reasons, position could be used
    • to group countries by continent
    • to show countries on a map

Using 2017 data from an article in The Telegraph one alternative is

Aspect Ratio and Perception

The river flow data shows how important aspect ratio can be to our ability to detect patterns:

Using a line plot the basic periodicity becomes apparent even in the first aspect ratio.

But the steeper increase/shallower decrease of most periods is easier to see in the second aspect ratio:

The aspact ratio also influence interpretation of results.

Some variations on the graph from HW2:

w <- read_csv("hw2-welfare.csv")
p0 <- ggplot(w, aes(x = quarter, y = onAssistance))
p1 <- p0 + geom_line(aes(group = 1))

grid.arrange(p1, p1 + coord_fixed(ratio = 1e-7),
    p1 + ylim(0, max(w$onAssistance)),
    p0 + geom_col(width = 0.3), ncol = 2)

Automated choices of axis scaling can also affect the aspect ratio of the content of a plot.

A simple simulation of coins flips to illustrate the law of large numbers:

n <- 100000
prob = 0.5
flips <- data.frame(trial = 1 : n, flip = rbinom(n, 1, prob))
flips <- mutate(flips, phat = cumsum(flip) / trial)
head(flips, 10)
##    trial flip      phat
## 1      1    0 0.0000000
## 2      2    0 0.0000000
## 3      3    0 0.0000000
## 4      4    1 0.2500000
## 5      5    1 0.4000000
## 6      6    0 0.3333333
## 7      7    1 0.4285714
## 8      8    1 0.5000000
## 9      9    1 0.5555556
## 10    10    1 0.6000000
p <- ggplot(flips, aes(x = trial, y = phat)) +
    geom_line() +
    geom_abline(intercept = prob, slope = 0, lty = 2)

Focusing the y axis around 1/2 shows the early fluctuations near 1/2 more clearly:

p + coord_cartesian(ylim = c(0.45, 0.55))

Focusing on the x axis for the last half of the trials shows that, as expected, the sample proportion is very close to 1/2.

The y axis scale still reflects the range of the complete data.

p + coord_cartesian(xlim = c(50000, 100000))

Using a subset of the data for trials >= 50000 produces a very different picture that emphasizes the details of the fluctuations:

Research on the effect of aspact ratio on perception has focused on accuracy of slope comparisons.

The general message is that keeping away from slopes that are too steep or too shallow is best.

Banking to 45 degrees, or choosing an aspect ratio so the slope magnitudes are distributed around 45 degrees is often recommended.

This also tends to be a useful “neutral ground” when political implications are involved.

Some references: