Background

The goals of a visualization can vary:

Usually

A project may involve all three.

Four terms are in common use:

Some view these as interchangeable; others view them as a continuum.

Visualizations can be

Historically, only static graphics were available.

Static graphics remain very useful for exploration, especially if they can be created quickly and easily.

Interactive graphics are very effective for engagement and are used heavily in on-line publications.

Traditional scientific publications are mostly limited to static visualizations, though on-line supplements are becoming more common.

We will focus primarily on static visualizations but will also look at a few interactive options.

Visualization in the Data Analysis Process

A data-driven project typically involves several cycles of

A figure that is often used to capture these steps:

Visualization can help at each stage and is often crucial for

Visualizing the data should almost always come before modeling or summarizing.

A famous example created by Anscombe (1973):

The regression lines for all four groups are essentially identical!

Another set of examples in the same spirit is provided by the package datasauRus; the package vignette shows the examples.

Some Historical Graphics

Easy construction of graphics is highly computational, but a computer isn’t necessary.

Many graphical ideas and elaborate statistical graphs were created in the 1800s and before.

The following are some classical examples.

William Playfair

Playfair’s The Commercial and Political Atlas and Statistical Breviary (1801) introduced a number of new graphs, including:

A bar graph:

A pie chart:

Charles Joseph Minard

Minard developed many elaborate graphs, some available as thumbnail images, including an illustration of Napoleon’s Russia campaign

This can be recreated approximately in R.

Florence Nightingale

Florence Nightingale used a polar area diagram to illustrate causes of death among British troops in the Crimean war.

An approximate recreation in R is available.

John Snow

John Snow used a map (higher resolution) to identify the source of the 1854 London cholera epidemic.

The data is available, and has been used for some interactive visualizations.

A short movie was produced in 2013.

Statistical Atlas of the United States

A Statistical Atlas of the US from the late 1800s shows a number of nice examples.

The complete atlases are also available.

A project to show modern data in a similar style.

Some References

Graphics Software

Most statistical systems provide software for producing static graphics.

Statistical static graphics software typically provides

Some software is more flexible than others.

Non-statistical graph or chart software often emphasizes appearance over content: results may look pretty, but content is hard to extract (e.g. 3D pie charts).

Chart drawing packages can be used to produce good statistical graphs but they may not make it easy.

Some newspapers and magazines have very good graphics departments, including

Sometimes tools like Adobe Illustrator or Inkscape can be used to edit and improve graphics produced by statistical software.

NY Times graphics creators often create initial graphs in R and enhance in Adobe Illustrator

Graphics in R

R has several flexible static graphics system, including

We will mostly be using ggplot.

Some Task Levels for Visualization

In evaluating visualization methods it can be useful to think about several levels of tasks that might be accomplished with a visualiation.

A useful list, from highest to lowest level:

Each higher level builds on the levels below.

As we look at different methods it is useful to consider the tasks they are suited for,

Scalability

Data sets come in many different sizes and shapes.

Some techniques work well for smaller data sets but deteriorate in effectiveness as size increases.

Sometimes modifications are available that slow the deterioration.

Other methods scale better, though usually at the expense of giving up some level of detail.

As we look at different methods it is useful to consider the scale of data sets they are suited for.

Some Interactive Viualizations

