---
title: "Visualizing Three or More Numeric Variables"
output:
html_document:
toc: yes
code_folding: show
code_download: true
---
```{r setup, include = FALSE, message = FALSE}
source(here::here("setup.R"))
knitr::opts_chunk$set(collapse = TRUE, message = FALSE,
fig.height = 5, fig.width = 6, fig.align = "center")
set.seed(12345)
library(dplyr)
library(tidyr)
library(ggplot2)
library(lattice)
library(gridExtra)
library(patchwork)
source(here::here("datasets.R"))
theme_set(theme_minimal() +
theme(text = element_text(size = 16),
panel.border = element_rect(color = "grey30", fill = NA)))
```
## Moving Beyond Two Dimensions
Paper and screens are two-dimensional.
We live in a three-dimensional world.
For visualizing three-dimensional data we can take advantage of our
visual system's ability to reconstruct three dimensional scenes from
two-dimensional images using:
* perspective rendering, lighting, and shading;
* motion with animation and interaction;
* stereo viewing methods.
Most of us have no intuition for four and more dimensions.
Some techniques that work in three dimensions but can also be used in
higher dimensions:
* _Grouping_ by encoding additional variables in color or shape channels.
* _Conditioning_ by using small multiples for different levels of
additional variables.
Higher dimensions maybe up to ten; the [_curse of
dimensionality_](https://en.wikipedia.org/wiki/Curse_of_dimensionality)
is a limiting factor.
The `lattice` package provides some facilities not easily available in
`ggplot` so I will use `lattice` in a few examples.
## Scatterplot Matrices
A _scatterplot matrix_ is a useful overview that shows all pairwise
scatterplots.
There are many options for creating scatterplot matrices in R; a few are:
* `pairs` in base graphics;
* `splom` in package `lattice`
* `ggpairs` in `GGally`.
Some examples using the `mpg` data:
```{r, class.source = "fold-hide"}
library(lattice)
splom(select(mpg, cty, hwy, displ),
cex = 0.5, pch = 19)
```
```{r, class.source = "fold-hide"}
library(GGally)
ggpairs(select(mpg, cty, hwy, displ),
lower = list(continuous =
wrap("points",
size = 1)))
```
Some variations:
* diagonal left-top to right-bottom or left-bottom to right-top;
* how to use the panels in the two triangles;
* how to use the panels on the diagonal.
Some things to look for in the panels:
* clusters or separation of groups;
* strong relationships;
* outliers, rounding, clumping.
Notes:
* Scatterplot matrices were popularized by Cleveland and co-workers at
Bell Laboratories in the 1980s.
* Cleveland recommends using the full version displaying both triangles of
plots to facilitate _visual linking_.
* If you do use only one triangle, and one variable is a response,
then it is a good idea to arrange for that variable to be shown on
the vertical axis against all other variables.
* The symmetry in the plot with the diagonal running from bottom-left
to top-tight as produced by `splom` is simpler than the symmetry in
the plot with the diagonal running from top-left t bottom-right
produced by `pairs` and `ggpairs`.
## Three Data Sets
Thee useful data sets to explore:
* The `ethanol` data frame in the `lattice` package.
* Soil resistivity data from from Cleveland's _Visualizing Data_ book.
* The `quakes` data frame in the `datasets` package.
### Ethanol Data
The `ethanol` data frame in the `lattice` package contains data from
an experiment on efficiency and emissions in small one-cylinder
engines.
The data frame contains 88 observations on three variables:
* `NOx`: Concentration of nitrogen oxides (NO and NO2) in micrograms.
* `C` Compression ratio of the engine.
* `E` Equivalence ratio, a measure of the richness of the air and
ethanol fuel mixture.
A scatterplot matrix:
```{r, class.source = "fold-hide"}
library(lattice)
splom(ethanol, ced = 0.5, pch = 19)
```
A goal is to understand the relationship between the pollutant `NOx` and the
controllable variables `E` and `C`.
### Soil Resistivity Data
[Data](http://www.stat.uiowa.edu/~luke/data/soil.dat) from Cleveland's
_Visualizing Data_ book contains measurements of soil resistivity of
an agricultural field along a roughly rectangular grid.
A scatterplot matrix of the `resistivity`, `northing` and `easting`
variables:
```{r, class.source = "fold-hide"}
if (! file.exists("soil.dat"))
download.file("http://www.stat.uiowa.edu/~luke/data/soil.dat",
"soil.dat")
soil <- read.table("soil.dat")
splom(soil[1 : 3], cex = 0.1, pch = 19)
```
The data is quite noisy but there is some structure.
A goal is to understand how resistivity varies across the
field.
### Earth Quake Locations and Magnitudes
The `quakes` data frame contains data on locations of seismic events
of magnitude 4.0 or larger in a region near Fiji.
The time frame is from 1964 to perhaps 2000.
More recent data is available from a number of sources on the web.
A scatter plot matrix:
```{r, fig.height = 7, fig.align = "center", class.source = "fold-hide"}
library(lattice)
splom(quakes, cex = 0.1, pch = 19)
```
Quake locations:
```{r, class.source = "fold-hide"}
md <- map_data("world2", c("Fiji", "Tonga", "New Zealand"))
ggplot(quakes, aes(long, lat)) +
geom_polygon(aes(group = group), data = md, color = "black", fill = NA) +
geom_point(size = 0.5, color = "red") +
coord_map() +
ggthemes::theme_map()
```
Some goals:
* understand the three-dimensional location of the quakes;
* see if there is any association between location and magnitude.
## Grouping for Conditioning
For the `ethanol` data there are only a small number of distinct
levels for `C`.
This suggests considering a plot mapping the level to color.
```{r ethanol-gradient, eval = FALSE}
ggplot(ethanol,
aes(E, NOx,
color = C)) +
geom_point(size = 2)
```
```{r ethanol-gradient, echo = FALSE}
```
A qualitative scheme can help distinguish the levels.
```{r ethanol-factor, eval = FALSE}
ggplot(ethanol,
aes(E, NOx,
color = factor(C))) +
geom_point(size = 2)
```
```{r ethanol-factor, echo = FALSE}
```
Adding smooths further helps the visual grouping:
```{r ethanol, eval = FALSE}
ggplot(ethanol,
aes(E, NOx,
color = factor(C))) +
geom_point(size = 2) +
geom_smooth(se = FALSE)
```
```{r ethanol, echo = FALSE}
```
Some observations:
* At each level of `C` there is a strong non-linear relation between
`NOx` and `E`.
* At levels of `E` above 1 the value of `C` has little effect.
* For lower levels of `E` the `NOx` level increases with `C`.
For the `quakes` data, breaking the `depth` values into thirds gives
some insights:
```{r quakes-thirds, eval = FALSE}
quakes2 <-
mutate(quakes,
depth_cut = cut_number(depth, 3))
ggplot(quakes2, aes(x = long,
y = lat,
color = depth_cut)) +
geom_point() +
theme_bw() +
coord_map()
```
```{r quakes-thirds, echo = FALSE}
```
## Conditioning Plots (Coplots)
One way to try to get a handle on higher dimensional data is to try to
fix values of some variables and visualize the values of others in 2D.
This can be done with
* interactive tools;
* small multiples with lattice/trellis displays or faceting.
A conditioning plot, or _coplot_:
* Shows a collection of plots of two variables for different
settings of one or more additional variables, the conditioning
variables.
* For ordered conditioning variables the plots are arranged in a way
that reflects the order.
* When a conditioning variable is numeric, or ordered categorical with
many levels, the values of the conditioning variable are grouped
into bins.
For the soil resistivity data, a coplot of `resistivity` against
`easting`, conditioning on `northing` with bins of size 0.5:
```{r, class.source = "fold-hide"}
p1 <- ggplot(soil,
aes(easting, resistivity)) +
geom_point(size = 0.5) +
facet_wrap(~ cut_width(northing,
width = 0.5,
center = 0))
p1
```
Adding a smooth is often helpful.
With a large amount of data the smooth is hard to see.
Some options:
* Omit the data and only show the smooth.
* Show the data in a less intense color, such as light gray.
* Use a contrasting color for the smooth curves.
* Show the data using alpha blending.
This uses a muted representation of the data:
```{r, class.source = "fold-hide"}
p2 <- ggplot(soil,
aes(easting, resistivity)) +
geom_point(size = 0.5,
color = "lightgrey") +
facet_wrap(~ cut_width(northing,
width = 0.5,
center = 0)) +
geom_smooth()
p2
```
The conditioning bins are quite wide.
Using rounding and keeping only points within 0.05 of the rounded
values reduces the variability:
```{r, class.source = "fold-hide"}
soil_trm <-
mutate(soil,
nrnd = round(northing * 2) / 2) %>%
filter(abs(northing - nrnd) < 0.05)
p1 %+% soil_trm +
facet_wrap(~ cut_width(northing,
width = 0.1,
center = 0))
```
```{r, class.source = "fold-hide"}
p2 %+% soil_trm +
facet_wrap(~ cut_width(northing,
width = 0.1,
center = 0))
```
For the quakes data a plot of latitude against longitude conditioned
on three depth levels:
```{r, class.source = "fold-hide"}
qthm <- theme(panel.border = element_rect(color = "grey30", fill = NA))
ggplot(quakes2, aes(x = long, y = lat)) +
geom_point(color = scales::muted("blue"),
size = 0.5) +
facet_wrap(~ depth_cut,
nrow = 1) +
coord_map() +
qthm
```
The relative positions of the depth groups are much harder to see than
in the grouped conditioning plot.
Adding the full data for background context, and using a more intense
color for the panel subset, helps a lot:
```{r, eval = TRUE, class.source = "fold-hide"}
## quakes does not contain the depth_cut
## variable used in the facet
ggplot(quakes2, aes(x = long, y = lat)) +
geom_point(data = quakes,
color = "gray", size = 0.5) +
geom_point(color = "blue", size = 0.5) +
facet_wrap(~ depth_cut, nrow = 1) +
coord_map() +
qthm
```
Switching latitude and depth shows another aspect:
```{r, class.source = "fold-hide"}
quakes3 <-
mutate(quakes,
lat_cut = cut_width(lat,
width = 5,
boundary = 0))
ggplot(quakes3, aes(x = long, y = depth)) +
geom_point(data = quakes,
color = "gray", size = 0.5) +
geom_point(color = "blue", size = 0.5) +
scale_y_reverse() +
facet_wrap(~ lat_cut) +
qthm
```
Coplot for the `ethanol` data:
```{r, class.source = "fold-hide"}
ggplot(ethanol, aes(x = E, y = NOx)) +
geom_point() +
facet_wrap(~ C)
```
Adding muted full data for context:
```{r, class.source = "fold-hide"}
ggplot(ethanol, aes(x = E, y = NOx)) +
geom_point(color = "grey",
data = mutate(ethanol, C = NULL)) +
geom_point() +
facet_wrap(~ C)
```
## Contour and Level Plots for Surfaces
A number of methods can be used to estimate a smooth signal surface as
a function of the two location variables.
One option is the `loess` local polynomial smoother; another is `gam`
from package `mgcv`.
The estimated surface level can be computed on a grid of points using
the `predict` method of the fit.
These estimated surfaces can be visualized using contour plots or
level plots.
```{r, class.source = "fold-hide"}
m <- loess(resistivity ~ easting * northing, span = 0.25,
degree = 2, data = soil)
eastseq <- seq(.15, 1.410, by = .015)
northseq <- seq(.150, 3.645, by = .015)
soi.grid <- expand.grid(easting = eastseq, northing = northseq)
soi.fit <- predict(m, soi.grid)
soi.fit.df <- mutate(soi.grid, fit = as.numeric(soi.fit))
```
### Contour Plots
_Contour plots_ compute contours, or level curves, as polygons at a
set of levels.
Contour plots draw the level curves, often with a level annotation.
Contour plots can also have their polygons filled in with colors
representing the levels.
A basic contour plot of the fit soil resistivity surface in `ggplot`
using `geom_contour`:
```{r, class.source = "fold-hide"}
p <- ggplot(soi.fit.df,
aes(x = easting,
y = northing,
z = fit)) +
coord_fixed()
p + geom_contour()
```
Neither `lattice` nor `ggplot` seem to make it easy to fill in the
contours.
The base function `filled.contour` is available for this:
```{r, fig.asp = 1, class.source = "fold-hide"}
cm.rev <- function(...) rev(cm.colors(...))
filled.contour(eastseq, northseq, soi.fit,
asp = 1,
color.palette = cm.rev)
```
### Level Plots
A _level plot_ colors a grid spanned by two variables by the color of a
third variable.
Level plots are also called _image plots_
The term _heat map_ is also used, in particular with a specific color
scheme.
But heat map often means a more complex visualization with
an image plot at its core.
`ggplot` provides `geom_tile` that can be used for a level plot:
```{r, class.source = "fold-hide"}
p + geom_tile(aes(fill = fit)) +
scale_fill_gradientn(
colors = rev(cm.colors(100)))
```
Superimposing contours on a level plot is often helpful.
```{r, class.source = "fold-hide"}
p + geom_tile(aes(fill = fit)) +
geom_contour() +
scale_fill_gradientn(
colors = rev(cm.colors(100)))
```
Level plots do not require computing contours, but are not not as
smooth as filled contour plots.
Visually, image plots and filled contour plots are very similar for
fine grids, but image plots are less smooth for coarse ones.
Lack of smoothness is less of an issue when the data values
themselves are noisy.
The grid for the
[`volcano` data](https://teara.govt.nz/en/photograph/3920/maungawhau-mt-eden)
set is coarser and illustrates the lack of smoothness.
```{r, class.source = "fold-hide"}
vd <- expand.grid(x = seq_len(nrow(volcano)), y = seq_len(ncol(volcano)))
vd$z <- as.numeric(volcano)
ggplot(vd, aes(x, y, fill = z)) +
geom_tile() +
scale_fill_gradientn(colors = rev(cm.colors(100))) +
coord_equal()
```
A `filled.contour` plot looks like this:
```{r, class.source = "fold-hide"}
filled.contour(seq_len(nrow(volcano)),
seq_len(ncol(volcano)),
volcano,
nlevels = 10,
color.palette = cm.rev,
asp = 1)
```
A coarse grid can be interpolated to a finer grid.
Irregularly spaced data can also be interpolated to a grid.
The `interp` function in the
[`akima`](https://cran.r-project.org/package=interp) or
[`interp`](https://cran.r-project.org/package=interp) packages is
useful for this kind of interpolation.
(`interp` has a more permissive license.)
## Three-Dimensional Views
There are several options for viewing surfaces or collections of
points as three-dimensional objects:
* Fixed views of rotated projections.
* Animated or interactive views showing a sequence of rotated projections.
### Fixed Views
The `lattice` function `cloud()` shows a projection of a rotated point
cloud in three dimensions.
For the soil resistivity data:
```{r, class.source = "fold-hide"}
cloud(resistivity ~ easting * northing,
pch = 19, cex = 0.1, data = soil)
```
For the `quakes` data:
```{r}
cloud(-depth ~ long * lat,
data = quakes)
```
A surface can also be visualized using a _wire frame_ plot showing a
3D view of the surface from a particular viewpoint.
A simple wire frame plot is often sufficient.
Lighting and shading can be used to enhance the 3D effect.
A basic wire frame plot for the volcano data:
```{r, class.source = "fold-hide"}
wireframe(z ~ x * y,
data = vd,
aspect = c(61 / 89, 0.3))
```
_Wire frame_ is a bit of a misnomer since surface panels in front
occlude lines behind them.
For a fine grid, as in the soil surface, the lines are too dense.
The use of shading for the surfaces can help.
```{r, class.source = "fold-hide"}
wireframe(z ~ x * y,
data = vd,
aspect = c(61 / 89, 0.3),
shade = TRUE)
```
A wire frame plot with shading for the fit soil resistivity surface:
```{r}
asp <- with(soi.grid,
diff(range(northing)) /
diff(range(easting)))
wf <- wireframe(
soi.fit ~
soi.grid$easting * soi.grid$northing,
aspect = asp, shade = TRUE,
screen = list(z = -50, x = -30),
xlab = "Easting (km",
ylab = "Northing (km)")
wf
```
Both ways of looking at a surface are useful:
```{r, fig.height = 6, class.source = "fold-hide"}
lv <-
levelplot(soi.fit ~
soi.grid$easting *
soi.grid$northing,
cuts = 9,
aspect = asp,
contour = TRUE,
xlab = "Easting (km)",
ylab = "Northing (km)")
print(lv, split = c(1, 1, 2, 1),
more = TRUE)
print(wf, split = c(2, 1, 2, 1))
```
* The level plot/contour representation is useful for recognizing
locations of key features.
* The wire frame view helps build a mental model of the 3D structure.
* Being able to interactively adjust the viewing position for a wire
frame model greatly enhances our ability to understand the 3D
structure.
### Interactive 3D Plots Using OpenGL
[OpenGL](https://www.opengl.org/) is a standardized framework for high
performance graphics.
The `rgl` package provides an R interface to some of OpenGL's
capabilities.
[WebGL](https://www.khronos.org/webgl/) is a JavaScript framework
for using OpenGL within a browser window.
Most desktop browsers support WebGL; some mobile browsers do as well.
In some cases support may be available but not enabled by default.
You may be able to get help at .
`knitr` and `rgl` provide support for embedding OpenGL images in web
pages.
It is also possible to embed OpenGL images in PDF files, but not all
PDF viewers support this.
Start by creating the fit surface data frame.
```{r, eval = FALSE, class.source = "fold-hide"}
library(dplyr)
soil <- read.table("http://www.stat.uiowa.edu/~luke/data/soil.dat")
m <- loess(resistivity ~ easting * northing, span = 0.25,
degree = 2, data = soil)
eastseq <- seq(.15, 1.410, by = .015)
northseq <- seq(.150, 3.645, by = .015)
soi.grid <- expand.grid(easting = eastseq, northing = northseq)
soi.fit <- predict(m, soi.grid)
soi.fit.df <- mutate(soi.grid, fit = as.numeric(soi.fit))
```
This code run in R will open a new window containing an interactive 3D
scene (but this may not work on FastX and is not available on the RStudio
server):
```{r soi.fit_rgl, eval = FALSE}
library(rgl)
bg3d(color = "white")
clear3d()
par3d(mouseMode = "trackball")
surface3d(eastseq, northseq,
soi.fit / 100,
color = rep("red", length(soi.fit)))
```
This will work in the RStudio notebook server:
```{r, eval = FALSE}
options(rgl.useNULL = TRUE)
<>
rglwidget()
```
To embed an image in an HTML document, first set the `webgl` hook with
a code chunk like this:
```{r}
knitr::knit_hooks$set(webgl = rgl::hook_webgl)
options(rgl.useNULL = TRUE)
```
Then a chunk with the option `webgl = TRUE` can produce an embedded OpenGL
image:
```{r, ref.label = "soi.fit_rgl", webgl = TRUE, eval = FALSE}
```
```{r, ref.label = "soi.fit_rgl", webgl = TRUE, echo = FALSE}
```
A view that includes the points and uses alpha blending to make the
surface translucent:
```{r, fig.height = 7, fig.width = 8, webgl = TRUE, fig.align = "center", class.source = "fold-hide"}
library(rgl)
clear3d()
points3d(soil$easting,
soil$northing,
soil$resistivity / 100,
col = rep("black", nrow(soil)))
surface3d(eastseq, northseq,
soi.fit / 100,
col = rep("red", length(soi.fit)),
alpha = 0.9,
front = "fill",
back = "fill")
```
A view of the `volcano` surface:
```{r, webgl = TRUE}
library(rgl)
knitr::knit_hooks$set(webgl = rgl::hook_webgl)
options(rgl.useNULL = TRUE)
clear3d()
surface3d(x = 10 * seq_len(nrow(volcano)),
y = 10 * seq_len(ncol(volcano)),
z = 2 * volcano, color = "red")
```
The
[WebGL vignette](https://cran.r-project.org/web/packages/rgl/vignettes/WebGL.html)
in the `rgl` package shows some more examples.
The embedded graphs show properly for me on most browser/OS
combinations I have tried.
To include a static view of an rgl scene you can use the
`rgl.snapshot` function.
To use the view found interactively with rgl for creating a `lattice`
`wireframe` view, you can use the `rglToLattice` function:
```{r, eval = FALSE}
wireframe(soi.fit ~ soi.grid$easting * soi.grid$northing,
aspect = c(asp, 0.7), shade = TRUE,
xlab = "Easting (km)",
ylab = "Northing (km)", screen = rglToLattice())
```
Ethanol data:
```{r, webgl = TRUE, class.source = "fold-hide"}
library(rgl)
clear3d()
with(ethanol, points3d(E, C / 5, NOx / 5,
size = 4, col = rep("black", nrow(ethanol))))
```
Quakes data:
```{r, webgl = TRUE, class.source = "fold-hide"}
library(rgl)
clear3d()
with(quakes, points3d(long, lat, -depth / 50, size = 2,
col = rep("black", nrow(quakes))))
```
### Other Animation Technologies
Animated GIF or PNG images can be used to show a _fly around_ of a
surface.
```{r, echo = FALSE, eval = FALSE}
## I could not figure out how to get either wireframe or persp to use
## a fixed target box, so attempting to animate would appear to move
## the viewer closer or farther depending on the rotation. Hacking one
## of the mirc3d functions was the best option I could find.
library(misc3d)
ds <- function(scene, light = c(0, 0, 1), screen = list(z = 40, x = -60),
scale = TRUE, R.mat = diag(4), perspective = FALSE,
distance = if (perspective) 0.2 else 0,
fill = TRUE, xlim = NULL, ylim = NULL, zlim = NULL,
aspect = c(1, 1),
col.mesh = if (fill) NA else "black", polynum = 100,
lighting = phongLighting, add = FALSE, engine = "standard",
col.bg = "transparent", depth = 0, newpage = TRUE,
fixbox)
{
scene <- colorScene(scene)
sr <- sceneRanges(scene, xlim, ylim, zlim)
if (add)
rot.mat <- R.mat
else rot.mat <- makeViewTransform(sr, scale, aspect, screen,
R.mat)
scene <- transformScene(scene, rot.mat)
scene <- lightScene(scene, lighting, light)
if (distance > 0) {
scene <- addPerspective(scene, distance)
rot.mat <- makePerspMatrix(distance) %*% rot.mat
}
if (missing(fixbox)) {
box <- as.matrix(expand.grid(sr$xlim, sr$ylim, sr$zlim))
box <- trans3dto3d(box, rot.mat)
}
else {
lims <- fixbox * c(-1, 1)
box <- as.matrix(expand.grid(lims, lims, lims))
}
renderScene(scene, box, fill, col.mesh, add, engine, polynum,
col.bg, depth, newpage)
invisible(t(rot.mat))
}
environment(ds) <- environment(drawScene)
f <- function(z) {
ds(vtri, light = c(1, 1.5, 0), list(z = z, x = -60, y = 3),
scale = FALSE, fixbox = 1.2)
}
library(animation)
ani.options(interval = 0.1, nmax = 100)
saveGIF(for (i in seq(0, 360, by = 5)) f(i + 40))
```
```{r, echo = FALSE}
knitr::include_graphics(IMG("animvolcano.gif"))
```
A fly-around can also be recorded as a movie.
A movie can be paused and replayed; animated images typically cannot.
### Alternatives and Variations
Interactive or animated views rely on our visual system's ability to
extract [_depth
queues_](https://en.wikipedia.org/wiki/Depth_perception) from motion.
This is called _motion parallax_.
When the motion stops, the 3D illusion is lost.
Other options for viewing a 3D scene that do not rquire motion:
* [Anaglyph 3D](http://en.wikipedia.org/wiki/Anaglyph_3D) using
red/cyan glasses.
* [Polarized 3D](http://en.wikipedia.org/wiki/Polarized_3D_system) as
currently used in many 3D movies.
* [Stereograms, stereoscopy](http://en.wikipedia.org/wiki/Stereoscopy)
A stereogram
* presents each eye with an image from a slightly different viewing angle;
* the brain fuses the images into a 3D view in a process called
[_stereopsis_](https://en.wikipedia.org/wiki/Stereopsis).
This approach is or was used
* with virtual reality displays like
[Oculus Rift](https://www.oculus.com/rift/);
* in the [View-Master toy](https://en.wikipedia.org/wiki/View-Master);
* in aerial photography using viewers like this:
```{r, echo = FALSE, out.width = "80%"}
knitr::include_graphics("https://upload.wikimedia.org/wikipedia/commons/3/31/Pocket_stereoscope.jpg")
```
You can create your own viewer with paper or cardboard tubes, for example.
A stereo image for the soil data surface:
```{r, message = FALSE, fig.width = 10, class.source = "fold-hide"}
tp <- trellis.par.get()
trellis.par.set(theme = col.whitebg())
ocol <- trellis.par.get("axis.line")$col
oclip <- trellis.par.get("clip")$panel
trellis.par.set(list(axis.line = list(col = "transparent"),
clip = list(panel = FALSE)))
print(wireframe(soi.fit ~ soi.grid$easting * soi.grid$northing,
cuts = 9,
aspect = diff(range(soi.grid$n)) / diff(range(soi.grid$e)),
xlab = "Easting (km)",
ylab = "Northing (km)", shade = TRUE,
screen = list(z = 40, x = -70, y = 3)),
split = c(1, 1, 2, 1), more = TRUE)
print(wireframe(soi.fit ~ soi.grid$easting * soi.grid$northing,
cuts = 9,
aspect = diff(range(soi.grid$n)) / diff(range(soi.grid$e)),
xlab = "Easting (km)",
ylab = "Northing (km)", shade = TRUE,
screen = list(z = 40, x = -70, y = 0)),
split = c(2, 1, 2, 1))
```
```{r, echo = FALSE}
trellis.par.set(list(axis.line = list(col = ocol), clip = list(panel = oclip)))
trellis.par.set(tp)
```
A stereo view of the `volcano` surface:
```{r, fig.width = 10, class.source = "fold-hide"}
library(lattice)
v1 <- wireframe(volcano, shade = TRUE,
aspect = c(61 / 87, 0.4),
light.source = c(10, 0, 10),
screen = list(z = 40, x = -60, y = 0))
v2 <- wireframe(volcano, shade = TRUE,
aspect = c(61 / 87, 0.4),
light.source = c(10, 0, 10),
screen = list(z = 40, x = -60, y = 3))
print(v1, split = c(1, 1, 2, 1), more = TRUE)
print(v2, split = c(2, 1, 2, 1))
```
A stereo view of the `quakes` data:
```{r, fig.width = 10, class.source = "fold-hide"}
q1 <- cloud(-depth ~ long * lat, data = quakes, pch = 19, cex = 0.3,
screen = list(z = 40, x = -60, y = 0))
q2 <- cloud(-depth ~ long * lat, data = quakes, pch = 19, cex = 0.3,
screen = list(z = 40, x = -60, y = 3))
print(q1, split = c(1, 1, 2, 1), more = TRUE)
print(q2, split = c(2, 1, 2, 1))
```
## Coplots for Surfaces
We can also use the idea of a coplot for examining a surface a few
slices at a time.
For the soil resistivity surface:
```{r, class.source = "fold-hide"}
wireframe(
soi.fit ~
soi.grid$easting * soi.grid$northing,
cuts = 9,
aspect = diff(range(soi.grid$n)) /
diff(range(soi.grid$e)),
xlab = "Easting (km)",
ylab = "Northing (km)", shade = TRUE,
screen = list(z = 40, x = -70, y = 0))
```
Choosing 12 approximately equally spaced slices along `easting`:
```{r, class.source = "fold-hide"}
sf <- soi.grid
sf$fit <- as.numeric(soi.fit)
sube <- eastseq[seq_len(length(eastseq)) %% 7 == 0]
sube <- eastseq[round(seq(1, length(eastseq), length.out = 12))]
ssf <- filter(sf, easting %in% sube)
ggplot(ssf, aes(x = northing, y = fit)) +
geom_line() +
facet_wrap(~ easting)
```
For examining a surface this way we fix one variable at a specific value.
For examining data it is also sometimes useful to choose a narrow window.
* A narrow window minimizes the variation within the variable we
are conditioning on.
* Too narrow a window contains to few observations to see a signal.
## Conditioning with a Single Plot
It is possible to show conditioning in a single plot using an identity
channel to distinguish the conditions.
```{r, class.source = "fold-hide"}
sf <- soi.grid
sf$fit <- as.numeric(soi.fit)
sube4 <- eastseq[round(seq(1, length(eastseq), length.out = 4))]
ssf4 <- filter(sf, easting %in% sube4)
ggplot(mutate(ssf4, easting = factor(easting)),
aes(x = northing, y = fit,
color = easting,
group = easting)) +
geom_line()
```
This is most useful when the effect of the conditioning variable is
a level shift.
The number of different levels that can be used effectively is lower.
Over-plotting becomes an issue when used with data points.
## 3D Density Estimates
Density estimates can be used with 3D data as well.
The density is a function of three variables, so the density surface
is in 4D.
The points in 3D with a common density level for a _contour surface_.
The contour surface for a higher density level will be inside the
surface for a lower level.
Some artificial data:
```{r, webgl = TRUE, class.source = "fold-hide"}
library(rgl)
library(misc3d)
x <- matrix(rnorm(9000), ncol = 3)
x[1001 : 2000, 2] <- x[1001 : 2000, 2] + 4
x[2001 : 3000, 3] <- x[2001 : 3000, 3] + 4
clear3d()
bg3d(col = "white")
points3d(x = x[, 1], y = x[, 2], z = x[, 3],
size = 2, color = "black")
```
3D plot of one contour surface:
```{r, webgl = TRUE, class.source = "fold-hide"}
g <- expand.grid(x = seq(-4, 8.5, len = 30),
y = seq(-4, 8.5, len = 30),
z = seq(-4, 8.5, len = 30))
d <- kde3d(x[, 1], x[, 2], x[, 3],
0.5, 30, c(-4, 8.5))$d
clear3d()
xv <- seq(-4, 8.5, len = 30)
contour3d(d, 20 / 3000, xv, xv, xv,
color = "red")
```
Adding the data:
```{r, webgl = TRUE, class.source = "fold-hide"}
d2 <- kde3d(x[, 1], x[, 2], x[, 3],
0.5, 30, c(-4, 8.5))$d
clear3d()
points3d(x = x[, 1], y = x[, 2], z = x[, 3],
size = 2, color = "black")
contour3d(d2, 20 / 3000, xv, xv, xv,
color = "red", alpha = 0.5,
add = TRUE)
```
3D plot of several contours for bw = 0.5:
```{r, webgl = TRUE, class.source = "fold-hide"}
clear3d()
contour3d(d2, c(10, 25, 40) / 3000, xv, xv, xv,
color = c("red", "green", "blue"),
alpha = c(0.2, 0.5, 0.7),
add = TRUE)
```
Density contour for the `quakes` data:
```{r, webgl = TRUE, class.source = "fold-hide"}
bg3d(col = "white")
clear3d()
d <- kde3d(quakes$long, quakes$lat, -quakes$depth, n = 40)
contour3d(d$d, level = exp(-12),
x = d$x / 22, y = d$y / 28, z = d$z / 640,
color = "green", ## color2 = "gray",
scale = FALSE, add = TRUE)
box3d(col = "gray")
```
Adding data and second contour:
```{r, webgl = TRUE, class.source = "fold-hide"}
clear3d()
points3d(quakes$long / 22, quakes$lat / 28, -quakes$depth / 640,
size = 2, col = rep("black", nrow(quakes)))
box3d(col = "gray")
d <- kde3d(quakes$long, quakes$lat, -quakes$depth, n = 40)
contour3d(d$d, level = exp(c(-10, -12)),
x = d$x / 22, y = d$y / 28, z = d$z / 640,
color = c("red", "green"), alpha = 0.1,
scale = FALSE, add = TRUE)
```
## Parallel Coordinates Plots
The same idea as a slope graph, but usually with more variables.
Some references:
> A [post](https://eagereyes.org/techniques/parallel-coordinates) by
> Robert Kosara.
> [Wikipedia entry](https://en.wikipedia.org/wiki/Parallel_coordinates).
> [Paper](http://www.ifs.tuwien.ac.at/~mlanzenberger/teaching/ps/ws07/stuff/00146402.pdf)
> on recognizing mathematical objects in parallel coordinate plots.
Some R implementations include `parallelplot()` in `lattice` and
`ggparcoord()` in `GGally`
A parallel coordinate plot of a data set on the chemical composition of
coffee samples:
```{r, class.source = "fold-hide"}
library(gridExtra)
library(GGally)
data(coffee, package = "pgmm")
coffee <- mutate(coffee,
Type = ifelse(Variety == 1,
"Arabica",
"Robusta"))
ggparcoord(coffee[order(coffee$Type), ],
columns = 3 : 14,
groupColumn = "Type",
scale = "uniminmax") +
xlab("") +
ylab("") +
scale_colour_manual(
values = c("grey", "red")) +
theme(axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.text.x =
element_text(angle = 45,
vjust = 1,
hjust = 1),
legend.position = "top")
```
There are 43 samples from 29 countries and two varieties, _Arabica_ or
_Robusta_.
Examining the plot shows that the varieties are distinguished by their
fat and caffeine contents:
```{r, class.source = "fold-hide"}
ggplot(coffee,
aes(x = Fat,
y = Caffine,
colour = Type)) +
geom_point(size = 3) +
scale_colour_manual(
values = c("grey", "red"))
```
Some useful adjustments:
* alpha blending for larger data sets;
* vary axis scaling;
* reorder axes;
* reverse axes;
Interactive implementations that support these and more are available.
## Australian Crabs
The data frame `crabs` in the `MASS` package
contains measurement on crabs of a species that has been split into
two based on color, orange or blue.
Preserved specimens lose their color (and possibly ability to identify
sex).
Data were collected to help classify preserved specimens.
It would be useful to have a fairly simple classification rule.
A scatterplot matrix view:
```{r, fig.height = 7, fig.width = 10, class.source = "fold-hide"}
data(crabs, package = "MASS")
## splom(~ crabs[4 : 8],
## group = paste(sex, sp),
## data = crabs,
## auto.key = TRUE,
## pscale = 0)
ggpairs(crabs,
aes(color = interaction(sp, sex)),
columns = 4 : 8,
upper = list(continuous = "points"),
legend = 1)
```
The variables shown are:
* `FL` frontal lobe size (mm);
* `RW` rear width (mm);
* `CL` carapace length (mm);
* `CW` carapace width (mm);
* `BD` body depth (mm).
The variables are highly correlated, reflecting overall size and age.
The `RW * CL` or `RW * CW` plots separate males and females well, at
least for larger crabs.
A parallel coordinates view:
```{r, eval = TRUE, class.source = "fold-hide"}
ggparcoord(crabs,
columns = 4 : 8,
groupColumn = "sp") +
scale_color_manual(
values = c(B = "blue", O = "orange"))
```
A possible next step: Reduce the correlation by a scaling by one of
the variables.
```{r}
cr <- mutate(crabs,
FLCL = FL / CL,
RWCL = RW / CL,
CWCL = CW / CL,
BDCL = BD / CL)
```
```{r, fig.height = 7, fig.width = 10, class.source = "fold-hide"}
## splom(~ cr[9 : 12], group = sp,
## data = cr,
## auto.key = TRUE, pscale = 0)
ggpairs(cr, aes(color = sp),
columns = 9 : 12,
upper = list(continuous = "points"),
legend = 1) +
scale_color_manual(
values = c(B = "blue", O = "orange")) +
scale_fill_manual(
values = c(B = "blue", O = "orange"))
```
The `CWCL * BDCL` plot shows good species separation by a line.
A parallel coordinates plot after scaling by `CL`:
```{r, class.source = "fold-hide"}
ggparcoord(cr,
columns = 9 : 12,
groupColumn = "sp") +
scale_color_manual(
values = c(B = "blue", O = "orange"))
```
Reorder the variables:
```{r}
ggparcoord(cr,
columns = c(10, 9, 11, 12),
groupColumn = "sp") +
scale_color_manual(
values = c(B = "blue", O = "orange"))
```
Reorder again:
```{r, class.source = "fold-hide"}
ggparcoord(cr,
columns = c(10, 9, 12, 11),
groupColumn = "sp") +
scale_color_manual(
values = c(B = "blue", O = "orange"))
```
Reverse the `CWCL` variable:
```{r, class.source = "fold-hide"}
ggparcoord(mutate(cr, CWCL = -CWCL),
columns = c(10, 9, 12, 11),
groupColumn = "sp") +
scale_color_manual(
values = c(B = "blue", O = "orange"))
```
The patterns for `FLCL`, `CWLC`, and `BDCL` for the two species differ.
## Dimension Reduction by PCA
A number of methods are available for extracting a lower dimensional
representation of a data set that captures most important features.
One approach is _principal component analysis_, or _PCA_.
* Principal component analysis identifies a rotation of the data that
produces uncorrelated scores.
* Components are ordered by the amount they contribute to the overall
variation in the data.
* Sometimes the first few principal components capture most of the
interesting variation.
* The components are linear combinations of the original variables;
they may not be easy to interpret.
The first two principal components for the scaled crabs data:
```{r, class.source = "fold-hide"}
fit <- select(cr, 9 : 12) %>%
mutate(across(everything(), scale)) %>%
prcomp()
cr_PC <- cbind(cr, fit$x)
p <- ggplot(cr_PC, aes(PC1, PC2, color = sp)) +
geom_point() +
scale_color_manual(values = c(O = "orange", B = "blue"))
library(ggrepel)
p + geom_segment(aes(x = 0, y = 0, xend = PC1, yend = PC2),
data = as.data.frame(fit$rotation),
color = "black",
arrow = arrow(length = unit(0.03, "npc"))) +
geom_text_repel(aes(x = PC1, y = PC2,
label = rownames(fit$rotation)),
data = as.data.frame(fit$rotation),
color = "black")
```
The two species are separated quite well by the first principal
component alone.
For the coffee data the first principal component also separates the
varieties well:
```{r, class.source = "fold-hide"}
fit <- select(coffee, 4:14) %>%
mutate(across(everything(), scale)) %>%
prcomp()
coffee_PC <- cbind(coffee, fit$x)
ggplot(coffee_PC, aes(x = PC1, y = PC2, color = Type)) +
geom_point(size = 3) +
scale_colour_manual(values = c("grey", "red"))
```
The first principal component is a weighted combination of the
original variables; there is some weight, positive or negative, on
almost every variable, which makes the result hard to interpret.
```{r, class.source = "fold-hide"}
data.frame(variable = rownames(fit$rotation),
weight = as.vector(fit$rotation[, 1])) %>%
ggplot(aes(x = weight, y = variable)) +
geom_col(width = 0.2)
```
## Grand Tours
```{r, echo = FALSE}
knitr::include_graphics(IMG("fleatour.gif"))
```
The grand tour can be viewed as carrying out a sequence of rotations
in high dimensional space and showing images of some of the
coordinates.
This can be useful for discovering groups, outliers, and some
lower-dimensional structures.
The rotations can be random or selected to optimize some criterion
(guided tours).
The `tourr` package provides one implementation.
Good interactive interfaces do not seem to be readily available at the
moment.
## Reading
Chapter [_Visualizing associations among two or more quantitative
variables_](https://clauswilke.com/dataviz/visualizing-associations.html)
in [_Fundamentals of Data
Visualization_](https://clauswilke.com/dataviz/).