class: center, middle, title-slide # Visualizing Three or More Numeric Variables ### Luke Tierney ### University of Iowa ### 2022-04-03 --- layout: true <link rel="stylesheet" href="stat4580.css" type="text/css" /> <style type="text/css"> .remark-code { font-size: 85%; } </style> <!-- title based on Wilke's chapter --> ## Moving Beyond Two Dimensions --- Paper and screens are two-dimensional. -- We live in a three-dimensional world. -- For visualizing three-dimensional data we can take advantage of our visual system's ability to reconstruct three dimensional scenes from two-dimensional images using: -- * perspective rendering, lighting, and shading; -- * motion with animation and interaction; -- * stereo viewing methods. -- Most of us have no intuition for four and more dimensions. --- Some techniques that work in three dimensions but can also be used in higher dimensions: -- * _Grouping_ by encoding additional variables in color or shape channels. -- * _Conditioning_ by using small multiples for different levels of additional variables. -- Higher dimensions maybe up to ten; the [_curse of dimensionality_](https://en.wikipedia.org/wiki/Curse_of_dimensionality) is a limiting factor. -- The `lattice` package provides some facilities not easily available in `ggplot` so I will use `lattice` in a few examples. --- layout: true ## Scatterplot Matrices --- A _scatterplot matrix_ is a useful overview that shows all pairwise scatterplots. -- There are many options for creating scatterplot matrices in R; a few are: -- * `pairs` in base graphics; -- * `splom` in package `lattice` -- * `ggpairs` in `GGally`. --- Some examples using the `mpg` data: -- .pull-left[ .hide-code[ ```r library(lattice) splom(select(mpg, cty, hwy, displ), cex = 0.5, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ .hide-code[ ```r library(GGally) ggpairs(select(mpg, cty, hwy, displ), lower = list(continuous = wrap("points", size = 1))) ``` <img src="threeplus_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] ] --- Some variations: -- * diagonal left-top to right-bottom or left-bottom to right-top; -- * how to use the panels in the two triangles; -- * how to use the panels on the diagonal. -- Some things to look for in the panels: -- * clusters or separation of groups; -- * strong relationships; -- * outliers, rounding, clumping. --- Notes: * Scatterplot matrices were popularized by Cleveland and co-workers at Bell Laboratories in the 1980s. -- * Cleveland recommends using the full version displaying both triangles of plots to facilitate _visual linking_. -- * If you do use only one triangle, and one variable is a response, then it is a good idea to arrange for that variable to be shown on the vertical axis against all other variables. -- * The symmetry in the plot with the diagonal running from bottom-left to top-tight as produced by `splom` is simpler than the symmetry in the plot with the diagonal running from top-left t bottom-right produced by `pairs` and `ggpairs`. --- layout: false ## Three Data Sets Thee useful data sets to explore: -- * The `ethanol` data frame in the `lattice` package. -- * Soil resistivity data from from Cleveland's _Visualizing Data_ book. -- * The `quakes` data frame in the `datasets` package. --- layout: true ### Ethanol Data --- The `ethanol` data frame in the `lattice` package contains data from an experiment on efficiency and emissions in small one-cylinder engines. -- The data frame contains 88 observations on three variables: -- * `NOx`: Concentration of nitrogen oxides (NO and NO2) in micrograms. -- * `C` Compression ratio of the engine. -- * `E` Equivalence ratio, a measure of the richness of the air and ethanol fuel mixture. --- .pull-left[ A scatterplot matrix: .hide-code[ ```r library(lattice) splom(ethanol, ced = 0.5, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-left[ A goal is to understand the relationship between the pollutant `NOx` and the controllable variables `E` and `C`. ] --- layout: false ### Soil Resistivity Data [Data](http://www.stat.uiowa.edu/~luke/data/soil.dat) from Cleveland's _Visualizing Data_ book contains measurements of soil resistivity of an agricultural field along a roughly rectangular grid. <!-- Something interesting is going on with the `track` variable, but I'm not sure what. --> -- .pull-left[ A scatterplot matrix of the `resistivity`, `northing` and `easting` variables: .hide-code[ ```r if (! file.exists("soil.dat")) download.file("http://www.stat.uiowa.edu/~luke/data/soil.dat", "soil.dat") soil <- read.table("soil.dat") splom(soil[1 : 3], cex = 0.1, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ The data is quite noisy but there is some structure. {{content}} ] -- A goal is to understand how resistivity varies across the field. --- layout: true ### Earth Quake Locations and Magnitudes --- .pull-left[ The `quakes` data frame contains data on locations of seismic events of magnitude 4.0 or larger in a region near Fiji. {{content}} ] -- The time frame is from 1964 to perhaps 2000. {{content}} -- More recent data is available from a number of sources on the web. {{content}} -- A scatter plot matrix: -- .pull-right[ .hide-code[ ```r library(lattice) splom(quakes, cex = 0.1, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Quake locations: .hide-code[ ```r md <- map_data("world2", c("Fiji", "Tonga", "New Zealand")) ggplot(quakes, aes(long, lat)) + geom_polygon(aes(group = group), data = md, color = "black", fill = NA) + geom_point(size = 0.5, color = "red") + coord_map() + ggthemes::theme_map() ``` <img src="threeplus_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ Some goals: {{content}} ] -- * understand the three-dimensional location of the quakes; {{content}} -- * see if there is any association between location and magnitude. --- layout: true ## Grouping for Conditioning --- .pull-left[ For the `ethanol` data there are only a small number of distinct levels for `C`. {{content}} ] -- This suggests considering a plot mapping the level to color. {{content}} -- ```r ggplot(ethanol, aes(E, NOx, * color = C)) + geom_point(size = 2) ``` -- .pull-right[ <img src="threeplus_files/figure-html/ethanol-gradient-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ A qualitative scheme can help distinguish the levels. ```r ggplot(ethanol, aes(E, NOx, * color = factor(C))) + geom_point(size = 2) ``` ] .pull-right[ <img src="threeplus_files/figure-html/ethanol-factor-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ Adding smooths further helps the visual grouping: ```r ggplot(ethanol, aes(E, NOx, color = factor(C))) + geom_point(size = 2) + * geom_smooth(se = FALSE) ``` ] .pull-right[ <img src="threeplus_files/figure-html/ethanol-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ <img src="threeplus_files/figure-html/ethanol-1.png" style="display: block; margin: auto;" /> ] .pull-right[ Some observations: {{content}} ] -- * At each level of `C` there is a strong non-linear relation between `NOx` and `E`. {{content}} -- * At levels of `E` above 1 the value of `C` has little effect. {{content}} -- * For lower levels of `E` the `NOx` level increases with `C`. --- .pull-left[ For the `quakes` data, breaking the `depth` values into thirds gives some insights: ```r quakes2 <- mutate(quakes, depth_cut = cut_number(depth, 3)) ggplot(quakes2, aes(x = long, y = lat, color = depth_cut)) + geom_point() + theme_bw() + coord_map() ``` ] .pull-right[ <img src="threeplus_files/figure-html/quakes-thirds-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Conditioning Plots (Coplots) --- One way to try to get a handle on higher dimensional data is to try to fix values of some variables and visualize the values of others in 2D. -- This can be done with -- * interactive tools; -- * small multiples with lattice/trellis displays or faceting. -- A conditioning plot, or _coplot_: -- * Shows a collection of plots of two variables for different settings of one or more additional variables, the conditioning variables. -- * For ordered conditioning variables the plots are arranged in a way that reflects the order. -- * When a conditioning variable is numeric, or ordered categorical with many levels, the values of the conditioning variable are grouped into bins. --- .pull-left[ For the soil resistivity data, a coplot of `resistivity` against `easting`, conditioning on `northing` with bins of size 0.5: ] .pull-right[ .hide-code[ ```r p1 <- ggplot(soil, aes(easting, resistivity)) + geom_point(size = 0.5) + facet_wrap(~ cut_width(northing, width = 0.5, center = 0)) p1 ``` <img src="threeplus_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Adding a smooth is often helpful. {{content}} ] -- With a large amount of data the smooth is hard to see. {{content}} -- Some options: {{content}} -- * Omit the data and only show the smooth. {{content}} -- * Show the data in a less intense color, such as light gray. {{content}} -- * Use a contrasting color for the smooth curves. {{content}} -- * Show the data using alpha blending. {{content}} -- This uses a muted representation of the data: -- .pull-right[ .hide-code[ ```r p2 <- ggplot(soil, aes(easting, resistivity)) + geom_point(size = 0.5, color = "lightgrey") + facet_wrap(~ cut_width(northing, width = 0.5, center = 0)) + geom_smooth() p2 ``` <img src="threeplus_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] ] --- The conditioning bins are quite wide. -- Using rounding and keeping only points within 0.05 of the rounded values reduces the variability: -- .pull-left[ .hide-code[ ```r soil_trm <- mutate(soil, nrnd = round(northing * 2) / 2) %>% filter(abs(northing - nrnd) < 0.05) p1 %+% soil_trm + facet_wrap(~ cut_width(northing, width = 0.1, center = 0)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ .hide-code[ ```r p2 %+% soil_trm + facet_wrap(~ cut_width(northing, width = 0.1, center = 0)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ For the quakes data a plot of latitude against longitude conditioned on three depth levels: ] .pull-right[ .hide-code[ ```r qthm <- theme(panel.border = element_rect(color = "grey30", fill = NA)) ggplot(quakes2, aes(x = long, y = lat)) + geom_point(color = scales::muted("blue"), size = 0.5) + facet_wrap(~ depth_cut, nrow = 1) + coord_map() + qthm ``` <img src="threeplus_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ The relative positions of the depth groups are much harder to see than in the grouped conditioning plot. Adding the full data for background context, and using a more intense color for the panel subset, helps a lot: ] .pull-right[ .hide-code[ ```r ## quakes does not contain the depth_cut ## variable used in the facet ggplot(quakes2, aes(x = long, y = lat)) + geom_point(data = quakes, color = "gray", size = 0.5) + geom_point(color = "blue", size = 0.5) + facet_wrap(~ depth_cut, nrow = 1) + coord_map() + qthm ``` <img src="threeplus_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Switching latitude and depth shows another aspect: ] .pull-right[ .hide-code[ ```r quakes3 <- mutate(quakes, lat_cut = cut_width(lat, width = 5, boundary = 0)) ggplot(quakes3, aes(x = long, y = depth)) + geom_point(data = quakes, color = "gray", size = 0.5) + geom_point(color = "blue", size = 0.5) + scale_y_reverse() + facet_wrap(~ lat_cut) + qthm ``` <img src="threeplus_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Coplot for the `ethanol` data: .hide-code[ ```r ggplot(ethanol, aes(x = E, y = NOx)) + geom_point() + facet_wrap(~ C) ``` <img src="threeplus_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ Adding muted full data for context: .hide-code[ ```r ggplot(ethanol, aes(x = E, y = NOx)) + geom_point(color = "grey", data = mutate(ethanol, C = NULL)) + geom_point() + facet_wrap(~ C) ``` <img src="threeplus_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] ] --- layout: false ## Contour and Level Plots for Surfaces -- A number of methods can be used to estimate a smooth signal surface as a function of the two location variables. -- One option is the `loess` local polynomial smoother; another is `gam` from package `mgcv`. -- The estimated surface level can be computed on a grid of points using the `predict` method of the fit. -- These estimated surfaces can be visualized using contour plots or level plots. .hide-code[ ```r m <- loess(resistivity ~ easting * northing, span = 0.25, degree = 2, data = soil) eastseq <- seq(.15, 1.410, by = .015) northseq <- seq(.150, 3.645, by = .015) soi.grid <- expand.grid(easting = eastseq, northing = northseq) soi.fit <- predict(m, soi.grid) soi.fit.df <- mutate(soi.grid, fit = as.numeric(soi.fit)) ``` ] --- layout: true ### Contour Plots --- .pull-left[ _Contour plots_ compute contours, or level curves, as polygons at a set of levels. {{content}} ] -- Contour plots draw the level curves, often with a level annotation. {{content}} -- Contour plots can also have their polygons filled in with colors representing the levels. {{content}} -- A basic contour plot of the fit soil resistivity surface in `ggplot` using `geom_contour`: -- .pull-right[ .hide-code[ ```r p <- ggplot(soi.fit.df, aes(x = easting, y = northing, z = fit)) + coord_fixed() p + geom_contour() ``` <img src="threeplus_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Neither `lattice` nor `ggplot` seem to make it easy to fill in the contours. The base function `filled.contour` is available for this: ] .pull-right[ .hide-code[ ```r cm.rev <- function(...) rev(cm.colors(...)) filled.contour(eastseq, northseq, soi.fit, asp = 1, color.palette = cm.rev) ``` <img src="threeplus_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] ] --- layout: true ### Level Plots --- .pull-left[ A _level plot_ colors a grid spanned by two variables by the color of a third variable. {{content}} ] -- Level plots are also called _image plots_ {{content}} -- The term _heat map_ is also used, in particular with a specific color scheme. {{content}} -- But heat map often means a more complex visualization with an image plot at its core. {{content}} -- `ggplot` provides `geom_tile` that can be used for a level plot: -- .pull-right[ .hide-code[ ```r p + geom_tile(aes(fill = fit)) + scale_fill_gradientn( colors = rev(cm.colors(100))) ``` <img src="threeplus_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Superimposing contours on a level plot is often helpful. ] .pull-right[ .hide-code[ ```r p + geom_tile(aes(fill = fit)) + geom_contour() + scale_fill_gradientn( colors = rev(cm.colors(100))) ``` <img src="threeplus_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Level plots do not require computing contours, but are not not as smooth as filled contour plots. {{content}} ] -- Visually, image plots and filled contour plots are very similar for fine grids, but image plots are less smooth for coarse ones. {{content}} -- Lack of smoothness is less of an issue when the data values themselves are noisy. {{content}} -- The grid for the [`volcano` data](https://teara.govt.nz/en/photograph/3920/maungawhau-mt-eden) set is coarser and illustrates the lack of smoothness. -- .pull-right[ .hide-code[ ```r vd <- expand.grid(x = seq_len(nrow(volcano)), y = seq_len(ncol(volcano))) vd$z <- as.numeric(volcano) ggplot(vd, aes(x, y, fill = z)) + geom_tile() + scale_fill_gradientn(colors = rev(cm.colors(100))) + coord_equal() ``` <img src="threeplus_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ A `filled.contour` plot looks like this: ] .pull-right[ .hide-code[ ```r filled.contour(seq_len(nrow(volcano)), seq_len(ncol(volcano)), volcano, nlevels = 10, color.palette = cm.rev, asp = 1) ``` <img src="threeplus_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> ] ] --- A coarse grid can be interpolated to a finer grid. -- Irregularly spaced data can also be interpolated to a grid. -- The `interp` function in the [`akima`](https://cran.r-project.org/package=interp) or [`interp`](https://cran.r-project.org/package=interp) packages is useful for this kind of interpolation. -- (`interp` has a more permissive license.) --- layout: false ## Three-Dimensional Views There are several options for viewing surfaces or collections of points as three-dimensional objects: -- * Fixed views of rotated projections. -- * Animated or interactive views showing a sequence of rotated projections. --- layout: true ### Fixed Views --- The `lattice` function `cloud()` shows a projection of a rotated point cloud in three dimensions. -- .pull-left[ For the soil resistivity data: .hide-code[ ```r cloud(resistivity ~ easting * northing, pch = 19, cex = 0.1, data = soil) ``` <img src="threeplus_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ .hide-code[ For the `quakes` data: ```r cloud(-depth ~ long * lat, data = quakes) ``` <img src="threeplus_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ A surface can also be visualized using a _wire frame_ plot showing a 3D view of the surface from a particular viewpoint. {{content}} ] -- A simple wire frame plot is often sufficient. {{content}} -- Lighting and shading can be used to enhance the 3D effect. {{content}} -- A basic wire frame plot for the volcano data: -- .pull-right[ .hide-code[ ```r wireframe(z ~ x * y, data = vd, aspect = c(61 / 89, 0.3)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ _Wire frame_ is a bit of a misnomer since surface panels in front occlude lines behind them. For a fine grid, as in the soil surface, the lines are too dense. The use of shading for the surfaces can help. ] .pull-right[ .hide-code[ ```r wireframe(z ~ x * y, data = vd, aspect = c(61 / 89, 0.3), shade = TRUE) ``` <img src="threeplus_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ A wire frame plot with shading for the fit soil resistivity surface: ] .pull-right[ .hide-code[ ```r asp <- with(soi.grid, diff(range(northing)) / diff(range(easting))) wf <- wireframe( soi.fit ~ soi.grid$easting * soi.grid$northing, aspect = asp, shade = TRUE, screen = list(z = -50, x = -30), xlab = "Easting (km", ylab = "Northing (km)") wf ``` <img src="threeplus_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Both ways of looking at a surface are useful: .hide-code[ ```r lv <- levelplot(soi.fit ~ soi.grid$easting * soi.grid$northing, cuts = 9, aspect = asp, contour = TRUE, xlab = "Easting (km)", ylab = "Northing (km)") print(lv, split = c(1, 1, 2, 1), more = TRUE) print(wf, split = c(2, 1, 2, 1)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ * The level plot/contour representation is useful for recognizing locations of key features. {{content}} ] -- * The wire frame view helps build a mental model of the 3D structure. {{content}} -- * Being able to interactively adjust the viewing position for a wire frame model greatly enhances our ability to understand the 3D structure. --- layout: true ### Interactive 3D Plots Using OpenGL --- [OpenGL](https://www.opengl.org/) is a standardized framework for high performance graphics. -- The `rgl` package provides an R interface to some of OpenGL's capabilities. -- [WebGL](https://www.khronos.org/webgl/) is a JavaScript framework for using OpenGL within a browser window. -- Most desktop browsers support WebGL; some mobile browsers do as well. -- In some cases support may be available but not enabled by default. You may be able to get help at <https://get.webgl.org/>. -- `knitr` and `rgl` provide support for embedding OpenGL images in web pages. -- It is also possible to embed OpenGL images in PDF files, but not all PDF viewers support this. --- Start by creating the fit surface data frame. .hide-code[ ```r library(dplyr) soil <- read.table("http://www.stat.uiowa.edu/~luke/data/soil.dat") m <- loess(resistivity ~ easting * northing, span = 0.25, degree = 2, data = soil) eastseq <- seq(.15, 1.410, by = .015) northseq <- seq(.150, 3.645, by = .015) soi.grid <- expand.grid(easting = eastseq, northing = northseq) soi.fit <- predict(m, soi.grid) soi.fit.df <- mutate(soi.grid, fit = as.numeric(soi.fit)) ``` ] -- .pull-left[ This code run in R will open a new window containing an interactive 3D scene (but this may not work on FastX and is not available on the RStudio server): ```r library(rgl) bg3d(color = "white") clear3d() par3d(mouseMode = "trackball") surface3d(eastseq, northseq, soi.fit / 100, color = rep("red", length(soi.fit))) ``` ] -- .pull-right[ This will work in the RStudio notebook server: ```r *options(rgl.useNULL = TRUE) library(rgl) bg3d(color = "white") clear3d() par3d(mouseMode = "trackball") surface3d(eastseq, northseq, soi.fit / 100, color = rep("red", length(soi.fit))) *rglwidget() ``` ] --- To embed an image in an HTML document, first set the `webgl` hook with a code chunk like this: ```r knitr::knit_hooks$set(webgl = rgl::hook_webgl) options(rgl.useNULL = TRUE) ``` -- Then a chunk with the option `webgl = TRUE` can produce an embedded OpenGL image: .pull-left[ ```r library(rgl) bg3d(color = "white") clear3d() par3d(mouseMode = "trackball") surface3d(eastseq, northseq, soi.fit / 100, color = rep("red", length(soi.fit))) ``` ] -- .pull-right[