Background

The goals of a visualization can vary:

Usually

A project may involve all three.

Four terms are in common use:

Some view these as interchangeable; others view them as a continuum.

Visualizations can be

Historically, only static graphics were available.

Static graphics remain very useful for exploration, especially if they can be creates quickly and easily.

Interactive graphics are very effective for engagement and are used heavily in on line publications. Some examples:

Traditional scientific publications are mostly limited to static visualizations, though on line supplements are becoming more common.

Visualization in the Data Analysis Process

A data-driven project typically involves several cycles of

Visualization can help at each stage an is often crucial for

Visualizing the data should almost always be the first step. A famous example created by Anscombe (1973):

The regression lines for all four groups are essentially identical!

Some Historical Graphics

Easy construction of graphics is highly computational, but a computer isn’t necessary.

Many graphical ideas and elaborate statistical graphs were created in the 1800s and before.

Some classical examples:

Some references:

Graphics Software

Most statistical systems provide software for producing static graphics. Statistical static graphics software typically provides

Some software is more flexible than others.

Dynamic graphical software should provide similar flexibility but often does not.

Non-statistical graph or chart software often emphasizes appearance over content: results may look pretty, but content is hard to extract (e.g. 3D pie charts).

Chart drawing packages can be used to produce good statistical graphs but they may not make it easy.

Some newspapers and magazines have very good graphics departments, including

Sometimes tools like Adobe Illustrator or Inkscape can be used to edit and improve graphics produced by statistical software.

NY Times graphics creators often create initial graphs in R and enhance in Adobe Illustrator

Graphics in R

R has several flexible static graphics system, including

Some interactive exploratory graphics packages include

Some Internal Structure

Static graphics output is produced by a device. These can be

  • screen devices, like Windows, Quartz, X11
  • file devices, like pdf or png

Devices are used through a device-independent layer implemented in the grDevices package.

Base graphics is implemented directly using this layer.

lattice and ggplot2 are built on an intermediate framework known as grid graphics and implemented in the grid package.

Using grid features is occasionally useful for arranging multiple lattice or ggplot2 plots on a page.

Some Task Levels for Visualization

In evaluating visualization methods it can be useful to think about several levels of tasks that might be accomplished with a visualiation. A useful list, from highest to lowest level:

Each higher level builds on the levels below.

As we look at different methods it is useful to consider the tasks they are suited for,

Scalability

Data sets come in many different sizes and shapes.

Some techniques work well for smaller data sets but deteriorate in effectiveness as size increases.

Sometimes modifications are available that slow the deterioration.

Other methods scale better, though usually at the expense of giving up some level of detail.

As we look at different methods it is useful to consider the scale of data sets they are suited for.