class: center, middle, title-slide .title[ # Visualizing Three or More Numeric Variables ] .author[ ### Luke Tierney ] .institute[ ### University of Iowa ] .date[ ### 2023-05-06 ] --- layout: true <link rel="stylesheet" href="stat4580.css" type="text/css" /> <style type="text/css"> .remark-code { font-size: 85%; } </style> <!-- title based on Wilke's chapter --> ## Moving Beyond Two Dimensions --- Paper and screens are two-dimensional. -- We live in a three-dimensional world. -- For visualizing three-dimensional data we can take advantage of our visual system's ability to reconstruct three dimensional scenes from two-dimensional images using: -- * perspective rendering, lighting, and shading; -- * motion with animation and interaction; -- * stereo viewing methods. -- Most of us have no intuition for four and more dimensions. --- Some techniques that work in three dimensions but can also be used in higher dimensions: -- * _Grouping_ by encoding additional variables in color or shape channels. -- * _Conditioning_ by using small multiples for different levels of additional variables. -- Higher dimensions maybe up to ten; the [_curse of dimensionality_](https://en.wikipedia.org/wiki/Curse_of_dimensionality) is a limiting factor. -- The `lattice` package provides some facilities not easily available in `ggplot` so I will use `lattice` in a few examples. --- layout: true ## Scatterplot Matrices --- A _scatterplot matrix_ is a useful overview that shows all pairwise scatterplots. -- There are many options for creating scatterplot matrices in R; a few are: -- * `pairs` in base graphics; -- * `splom` in package `lattice` -- * `ggpairs` in `GGally`. --- Some examples using the `mpg` data: -- .pull-left[ .hide-code[ ```r library(lattice) splom(select(mpg, cty, hwy, displ), cex = 0.5, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-1-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ .hide-code[ ```r library(GGally) ggpairs(select(mpg, cty, hwy, displ), lower = list(continuous = wrap("points", size = 1))) ``` <img src="threeplus_files/figure-html/unnamed-chunk-2-1.png" style="display: block; margin: auto;" /> ] ] --- Some variations: -- * diagonal left-top to right-bottom or left-bottom to right-top; -- * how to use the panels in the two triangles; -- * how to use the panels on the diagonal. -- Some things to look for in the panels: -- * clusters or separation of groups; -- * strong relationships; -- * outliers, rounding, clumping. --- Notes: * Scatterplot matrices were popularized by Cleveland and co-workers at Bell Laboratories in the 1980s. -- * Cleveland recommends using the full version displaying both triangles of plots to facilitate _visual linking_. -- * If you do use only one triangle, and one variable is a response, then it is a good idea to arrange for that variable to be shown on the vertical axis against all other variables. -- * The symmetry in the plot with the diagonal running from bottom-left to top-tight as produced by `splom` is simpler than the symmetry in the plot with the diagonal running from top-left t bottom-right produced by `pairs` and `ggpairs`. --- layout: false ## Three Data Sets Thee useful data sets to explore: -- * The `ethanol` data frame in the `lattice` package. -- * Soil resistivity data from from Cleveland's _Visualizing Data_ book. -- * The `quakes` data frame in the `datasets` package. --- layout: true ### Ethanol Data --- The `ethanol` data frame in the `lattice` package contains data from an experiment on efficiency and emissions in small one-cylinder engines. -- The data frame contains 88 observations on three variables: -- * `NOx`: Concentration of nitrogen oxides (NO and NO2) in micrograms. -- * `C` Compression ratio of the engine. -- * `E` Equivalence ratio, a measure of the richness of the air and ethanol fuel mixture. --- .pull-left[ A scatterplot matrix: .hide-code[ ```r library(lattice) splom(ethanol, ced = 0.5, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-3-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-left[ A goal is to understand the relationship between the pollutant `NOx` and the controllable variables `E` and `C`. ] --- layout: false ### Soil Resistivity Data [Data](https://stat.uiowa.edu/~luke/data/soil.dat) from Cleveland's _Visualizing Data_ book contains measurements of soil resistivity of an agricultural field along a roughly rectangular grid. <!-- Something interesting is going on with the `track` variable, but I'm not sure what. --> -- .pull-left[ A scatterplot matrix of the `resistivity`, `northing` and `easting` variables: .hide-code[ ```r if (! file.exists("soil.dat")) download.file("http://www.stat.uiowa.edu/~luke/data/soil.dat", "soil.dat") soil <- read.table("soil.dat") splom(soil[1 : 3], cex = 0.1, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ The data is quite noisy but there is some structure. {{content}} ] -- A goal is to understand how resistivity varies across the field. --- layout: true ### Earth Quake Locations and Magnitudes --- .pull-left[ The `quakes` data frame contains data on locations of seismic events of magnitude 4.0 or larger in a region near Fiji. {{content}} ] -- The time frame is from 1964 to perhaps 2000. {{content}} -- More recent data is available from a number of sources on the web. {{content}} -- A scatter plot matrix: -- .pull-right[ .hide-code[ ```r library(lattice) splom(quakes, cex = 0.1, pch = 19) ``` <img src="threeplus_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Quake locations: .hide-code[ ```r md <- map_data("world2", c("Fiji", "Tonga", "New Zealand")) ggplot(quakes, aes(long, lat)) + geom_polygon(aes(group = group), data = md, color = "black", fill = NA) + geom_point(size = 0.5, color = "red") + coord_map() + ggthemes::theme_map() ``` <img src="threeplus_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ Some goals: {{content}} ] -- * understand the three-dimensional location of the quakes; {{content}} -- * see if there is any association between location and magnitude. --- layout: true ## Grouping for Conditioning --- .pull-left[ For the `ethanol` data there are only a small number of distinct levels for `C`. {{content}} ] -- This suggests considering a plot mapping the level to color. {{content}} -- ```r ggplot(ethanol, aes(E, NOx, * color = C)) + geom_point(size = 2) ``` -- .pull-right[ <img src="threeplus_files/figure-html/ethanol-gradient-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ A qualitative scheme can help distinguish the levels. ```r ggplot(ethanol, aes(E, NOx, * color = factor(C))) + geom_point(size = 2) ``` ] .pull-right[ <img src="threeplus_files/figure-html/ethanol-factor-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ Adding smooths further helps the visual grouping: ```r ggplot(ethanol, aes(E, NOx, color = factor(C))) + geom_point(size = 2) + * geom_smooth(se = FALSE) ``` ] .pull-right[ <img src="threeplus_files/figure-html/ethanol-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ <img src="threeplus_files/figure-html/ethanol-1.png" style="display: block; margin: auto;" /> ] .pull-right[ Some observations: {{content}} ] -- * At each level of `C` there is a strong non-linear relation between `NOx` and `E`. {{content}} -- * At levels of `E` above 1 the value of `C` has little effect. {{content}} -- * For lower levels of `E` the `NOx` level increases with `C`. --- .pull-left[ For the `quakes` data, breaking the `depth` values into thirds gives some insights: ```r quakes2 <- mutate(quakes, depth_cut = cut_number(depth, 3)) ggplot(quakes2, aes(x = long, y = lat, color = depth_cut)) + geom_point() + theme_bw() + coord_map() ``` ] .pull-right[ <img src="threeplus_files/figure-html/quakes-thirds-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Conditioning Plots (Coplots) --- One way to try to get a handle on higher dimensional data is to try to fix values of some variables and visualize the values of others in 2D. -- This can be done with -- * interactive tools; -- * small multiples with lattice/trellis displays or faceting. -- A conditioning plot, or _coplot_: -- * Shows a collection of plots of two variables for different settings of one or more additional variables, the conditioning variables. -- * For ordered conditioning variables the plots are arranged in a way that reflects the order. -- * When a conditioning variable is numeric, or ordered categorical with many levels, the values of the conditioning variable are grouped into bins. --- .pull-left[ For the soil resistivity data, a coplot of `resistivity` against `easting`, conditioning on `northing` with bins of size 0.5: ] .pull-right[ .hide-code[ ```r p1 <- ggplot(soil, aes(easting, resistivity)) + geom_point(size = 0.5) + facet_wrap(~ cut_width(northing, width = 0.5, center = 0)) p1 ``` <img src="threeplus_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Adding a smooth is often helpful. {{content}} ] -- With a large amount of data the smooth is hard to see. {{content}} -- Some options: {{content}} -- * Omit the data and only show the smooth. {{content}} -- * Show the data in a less intense color, such as light gray. {{content}} -- * Use a contrasting color for the smooth curves. {{content}} -- * Show the data using alpha blending. {{content}} -- This uses a muted representation of the data: -- .pull-right[ .hide-code[ ```r p2 <- ggplot(soil, aes(easting, resistivity)) + geom_point(size = 0.5, color = "lightgrey") + facet_wrap(~ cut_width(northing, width = 0.5, center = 0)) + geom_smooth() p2 ``` <img src="threeplus_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] ] --- The conditioning bins are quite wide. -- Using rounding and keeping only points within 0.05 of the rounded values reduces the variability: -- .pull-left[ .hide-code[ ```r soil_trm <- mutate(soil, nrnd = round(northing * 2) / 2) %>% filter(abs(northing - nrnd) < 0.05) p1 %+% soil_trm + facet_wrap(~ cut_width(northing, width = 0.1, center = 0)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ .hide-code[ ```r p2 %+% soil_trm + facet_wrap(~ cut_width(northing, width = 0.1, center = 0)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ For the quakes data a plot of latitude against longitude conditioned on three depth levels: ] .pull-right[ .hide-code[ ```r qthm <- theme(panel.border = element_rect(color = "grey30", fill = NA)) ggplot(quakes2, aes(x = long, y = lat)) + geom_point(color = scales::muted("blue"), size = 0.5) + facet_wrap(~ depth_cut, nrow = 1) + coord_map() + qthm ``` <img src="threeplus_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ The relative positions of the depth groups are much harder to see than in the grouped conditioning plot. Adding the full data for background context, and using a more intense color for the panel subset, helps a lot: ] .pull-right[ .hide-code[ ```r ## quakes does not contain the depth_cut ## variable used in the facet ggplot(quakes2, aes(x = long, y = lat)) + geom_point(data = quakes, color = "gray", size = 0.5) + geom_point(color = "blue", size = 0.5) + facet_wrap(~ depth_cut, nrow = 1) + coord_map() + qthm ``` <img src="threeplus_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Switching latitude and depth shows another aspect: ] .pull-right[ .hide-code[ ```r quakes3 <- mutate(quakes, lat_cut = cut_width(lat, width = 5, boundary = 0)) ggplot(quakes3, aes(x = long, y = depth)) + geom_point(data = quakes, color = "gray", size = 0.5) + geom_point(color = "blue", size = 0.5) + scale_y_reverse() + facet_wrap(~ lat_cut) + qthm ``` <img src="threeplus_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Coplot for the `ethanol` data: .hide-code[ ```r ggplot(ethanol, aes(x = E, y = NOx)) + geom_point() + facet_wrap(~ C) ``` <img src="threeplus_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ Adding muted full data for context: .hide-code[ ```r ggplot(ethanol, aes(x = E, y = NOx)) + geom_point(color = "grey", data = mutate(ethanol, C = NULL)) + geom_point() + facet_wrap(~ C) ``` <img src="threeplus_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> ] ] --- layout: false ## Contour and Level Plots for Surfaces -- A number of methods can be used to estimate a smooth signal surface as a function of the two location variables. -- One option is the `loess` local polynomial smoother; another is `gam` from package `mgcv`. -- The estimated surface level can be computed on a grid of points using the `predict` method of the fit. -- These estimated surfaces can be visualized using contour plots or level plots. .hide-code[ ```r m <- loess(resistivity ~ easting * northing, span = 0.25, degree = 2, data = soil) eastseq <- seq(.15, 1.410, by = .015) northseq <- seq(.150, 3.645, by = .015) soi.grid <- expand.grid(easting = eastseq, northing = northseq) soi.fit <- predict(m, soi.grid) soi.fit.df <- mutate(soi.grid, fit = as.numeric(soi.fit)) ``` ] --- layout: true ### Contour Plots --- .pull-left[ _Contour plots_ compute contours, or level curves, as polygons at a set of levels. {{content}} ] -- Contour plots draw the level curves, often with a level annotation. {{content}} -- Contour plots can also have their polygons filled in with colors representing the levels. {{content}} -- A basic contour plot of the fit soil resistivity surface in `ggplot` using `geom_contour`: -- .pull-right[ .hide-code[ ```r p <- ggplot(soi.fit.df, aes(x = easting, y = northing, z = fit)) + coord_fixed() p + geom_contour() ``` <img src="threeplus_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Neither `lattice` nor `ggplot` seem to make it easy to fill in the contours. The base function `filled.contour` is available for this: ] .pull-right[ .hide-code[ ```r cm.rev <- function(...) rev(cm.colors(...)) filled.contour(eastseq, northseq, soi.fit, asp = 1, color.palette = cm.rev) ``` <img src="threeplus_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> ] ] --- layout: true ### Level Plots --- .pull-left[ A _level plot_ colors a grid spanned by two variables by the color of a third variable. {{content}} ] -- Level plots are also called _image plots_ {{content}} -- The term _heat map_ is also used, in particular with a specific color scheme. {{content}} -- But heat map often means a more complex visualization with an image plot at its core. {{content}} -- `ggplot` provides `geom_tile` that can be used for a level plot: -- .pull-right[ .hide-code[ ```r p + geom_tile(aes(fill = fit)) + scale_fill_gradientn( colors = rev(cm.colors(100))) ``` <img src="threeplus_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Superimposing contours on a level plot is often helpful. ] .pull-right[ .hide-code[ ```r p + geom_tile(aes(fill = fit)) + geom_contour() + scale_fill_gradientn( colors = rev(cm.colors(100))) ``` <img src="threeplus_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Level plots do not require computing contours, but are not not as smooth as filled contour plots. {{content}} ] -- Visually, image plots and filled contour plots are very similar for fine grids, but image plots are less smooth for coarse ones. {{content}} -- Lack of smoothness is less of an issue when the data values themselves are noisy. {{content}} -- The grid for the [`volcano` data](https://teara.govt.nz/en/photograph/3920/maungawhau-mt-eden) set is coarser and illustrates the lack of smoothness. -- .pull-right[ .hide-code[ ```r vd <- expand.grid(x = seq_len(nrow(volcano)), y = seq_len(ncol(volcano))) vd$z <- as.numeric(volcano) ggplot(vd, aes(x, y, fill = z)) + geom_tile() + scale_fill_gradientn(colors = rev(cm.colors(100))) + coord_equal() ``` <img src="threeplus_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ A `filled.contour` plot looks like this: ] .pull-right[ .hide-code[ ```r filled.contour(seq_len(nrow(volcano)), seq_len(ncol(volcano)), volcano, nlevels = 10, color.palette = cm.rev, asp = 1) ``` <img src="threeplus_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" /> ] ] --- A coarse grid can be interpolated to a finer grid. -- Irregularly spaced data can also be interpolated to a grid. -- The `interp` function in the [`akima`](https://cran.r-project.org/package=interp) or [`interp`](https://cran.r-project.org/package=interp) packages is useful for this kind of interpolation. -- (`interp` has a more permissive license.) --- layout: false ## Three-Dimensional Views There are several options for viewing surfaces or collections of points as three-dimensional objects: -- * Fixed views of rotated projections. -- * Animated or interactive views showing a sequence of rotated projections. --- layout: true ### Fixed Views --- The `lattice` function `cloud()` shows a projection of a rotated point cloud in three dimensions. -- .pull-left[ For the soil resistivity data: .hide-code[ ```r cloud(resistivity ~ easting * northing, pch = 19, cex = 0.1, data = soil) ``` <img src="threeplus_files/figure-html/unnamed-chunk-23-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ .hide-code[ For the `quakes` data: ```r cloud(-depth ~ long * lat, data = quakes) ``` <img src="threeplus_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ A surface can also be visualized using a _wire frame_ plot showing a 3D view of the surface from a particular viewpoint. {{content}} ] -- A simple wire frame plot is often sufficient. {{content}} -- Lighting and shading can be used to enhance the 3D effect. {{content}} -- A basic wire frame plot for the volcano data: -- .pull-right[ .hide-code[ ```r wireframe(z ~ x * y, data = vd, aspect = c(61 / 89, 0.3)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ _Wire frame_ is a bit of a misnomer since surface panels in front occlude lines behind them. For a fine grid, as in the soil surface, the lines are too dense. The use of shading for the surfaces can help. ] .pull-right[ .hide-code[ ```r wireframe(z ~ x * y, data = vd, aspect = c(61 / 89, 0.3), shade = TRUE) ``` <img src="threeplus_files/figure-html/unnamed-chunk-26-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ A wire frame plot with shading for the fit soil resistivity surface: ] .pull-right[ .hide-code[ ```r asp <- with(soi.grid, diff(range(northing)) / diff(range(easting))) wf <- wireframe( soi.fit ~ soi.grid$easting * soi.grid$northing, aspect = asp, shade = TRUE, screen = list(z = -50, x = -30), xlab = "Easting (km", ylab = "Northing (km)") wf ``` <img src="threeplus_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Both ways of looking at a surface are useful: .hide-code[ ```r lv <- levelplot(soi.fit ~ soi.grid$easting * soi.grid$northing, cuts = 9, aspect = asp, contour = TRUE, xlab = "Easting (km)", ylab = "Northing (km)") print(lv, split = c(1, 1, 2, 1), more = TRUE) print(wf, split = c(2, 1, 2, 1)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-28-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ * The level plot/contour representation is useful for recognizing locations of key features. {{content}} ] -- * The wire frame view helps build a mental model of the 3D structure. {{content}} -- * Being able to interactively adjust the viewing position for a wire frame model greatly enhances our ability to understand the 3D structure. --- layout: true ### Interactive 3D Plots Using OpenGL --- [OpenGL](https://www.opengl.org/) is a standardized framework for high performance graphics. -- The `rgl` package provides an R interface to some of OpenGL's capabilities. -- [WebGL](https://www.khronos.org/webgl/) is a JavaScript framework for using OpenGL within a browser window. -- Most desktop browsers support WebGL; some mobile browsers do as well. -- In some cases support may be available but not enabled by default. You may be able to get help at <https://get.webgl.org/>. -- `knitr` and `rgl` provide support for embedding OpenGL images in web pages. -- It is also possible to embed OpenGL images in PDF files, but not all PDF viewers support this. --- Start by creating the fit surface data frame. .hide-code[ ```r library(dplyr) soil <- read.table("http://www.stat.uiowa.edu/~luke/data/soil.dat") m <- loess(resistivity ~ easting * northing, span = 0.25, degree = 2, data = soil) eastseq <- seq(.15, 1.410, by = .015) northseq <- seq(.150, 3.645, by = .015) soi.grid <- expand.grid(easting = eastseq, northing = northseq) soi.fit <- predict(m, soi.grid) soi.fit.df <- mutate(soi.grid, fit = as.numeric(soi.fit)) ``` ] -- .pull-left[ This code run in R will open a new window containing an interactive 3D scene (but this may not work on FastX and is not available on the RStudio server): ```r library(rgl) bg3d(color = "white") clear3d() par3d(mouseMode = "trackball") surface3d(eastseq, northseq, soi.fit / 100, color = rep("red", length(soi.fit))) ``` ] -- .pull-right[ This will work in the RStudio notebook server: <!-- ## nolint start --> ```r *options(rgl.useNULL = TRUE) library(rgl) bg3d(color = "white") clear3d() par3d(mouseMode = "trackball") surface3d(eastseq, northseq, soi.fit / 100, color = rep("red", length(soi.fit))) *rglwidget() ``` <!-- ## nolint end --> ] --- .pull-left[ To embed an image in an HTML document, first set the `webgl` hook with a code chunk like this: ```r knitr::knit_hooks$set(webgl = rgl::hook_webgl) options(rgl.useNULL = TRUE) ``` {{content}} ] -- Then a chunk with the option ``` webgl = TRUE ``` can produce an embedded OpenGL image: ```r library(rgl) bg3d(color = "white") clear3d() par3d(mouseMode = "trackball") surface3d(eastseq, northseq, soi.fit / 100, color = rep("red", length(soi.fit))) ``` -- .pull-right[
] --- .pull-left[ A view that includes the points and uses alpha blending to make the surface translucent: ] .pull-right[ .hide-code[ ```r library(rgl) clear3d() points3d(soil$easting, soil$northing, soil$resistivity / 100, col = rep("black", nrow(soil))) surface3d(eastseq, northseq, soi.fit / 100, col = rep("red", length(soi.fit)), alpha = 0.9, front = "fill", back = "fill") ```
] ] --- .pull-left[ A view of the `volcano` surface: ] .pull-right[ .hide-code[ ```r library(rgl) knitr::knit_hooks$set(webgl = rgl::hook_webgl) options(rgl.useNULL = TRUE) clear3d() surface3d(x = 10 * seq_len(nrow(volcano)), y = 10 * seq_len(ncol(volcano)), z = 2 * volcano, color = "red") ```
] ] --- The [WebGL vignette](https://cran.r-project.org/package=rgl/vignettes/WebGL.html) in the `rgl` package shows some more examples. -- The embedded graphs show properly for me on most browser/OS combinations I have tried. -- To include a static view of an rgl scene you can use the `rgl.snapshot` function. -- To use the view found interactively with rgl for creating a `lattice` `wireframe` view, you can use the `rglToLattice` function: ```r wireframe(soi.fit ~ soi.grid$easting * soi.grid$northing, aspect = c(asp, 0.7), shade = TRUE, xlab = "Easting (km)", ylab = "Northing (km)", screen = rglToLattice()) ``` --- .pull-left[ Ethanol data: .hide-code[ ```r library(rgl) clear3d() with(ethanol, points3d(E, C / 5, NOx / 5, size = 4, col = rep("black", nrow(ethanol)))) ```
] ] -- .pull-right[ Quakes data: .hide-code[ ```r library(rgl) clear3d() with(quakes, points3d(long, lat, -depth / 50, size = 2, col = rep("black", nrow(quakes)))) ```
] ] --- layout: true ### Other Animation Technologies --- Animated GIF or PNG images can be used to show a _fly around_ of a surface. <!-- ## nolint start --> <!-- ## nolint end --> <img src="../img/animvolcano.gif" style="display: block; margin: auto;" /> --- A fly-around can also be recorded as a movie. <center> <video width="640" height="480" controls> <source src="../img/animvolcano.mp4" type="video/mp4"> </video> </center> A movie can be paused and replayed; animated images typically cannot. --- layout: true ### Alternatives and Variations --- Interactive or animated views rely on our visual system's ability to extract [_depth queues_](https://en.wikipedia.org/wiki/Depth_perception) from motion. -- This is called _motion parallax_. -- When the motion stops, the 3D illusion is lost. -- Other options for viewing a 3D scene that do not rquire motion: -- * [Anaglyph 3D](https://en.wikipedia.org/wiki/Anaglyph_3D) using red/cyan glasses. -- * [Polarized 3D](https://en.wikipedia.org/wiki/Polarized_3D_system) as currently used in many 3D movies. -- * [Stereograms, stereoscopy](https://en.wikipedia.org/wiki/Stereoscopy) --- A stereogram -- * presents each eye with an image from a slightly different viewing angle; -- * the brain fuses the images into a 3D view in a process called [_stereopsis_](https://en.wikipedia.org/wiki/Stereopsis). -- .pull-left[ This approach is or was used {{content}} ] -- * with virtual reality displays like [Oculus Rift](https://www.meta.com/quest/products/quest-2/); {{content}} -- * in the [View-Master toy](https://en.wikipedia.org/wiki/View-Master); {{content}} -- * in aerial photography using viewers like this: -- .pull-right[ <!-- Image: "https://upload.wikimedia.org/wikipedia/commons/3/31/Pocket_stereoscope.jpg" --> <img src="../img/Pocket_stereoscope.jpg" width="80%" style="display: block; margin: auto;" /> {{content}} ] -- You can create your own viewer with paper or cardboard tubes, for example. --- A stereo image for the soil data surface: .hide-code[ ```r tp <- trellis.par.get() trellis.par.set(theme = col.whitebg()) ocol <- trellis.par.get("axis.line")$col oclip <- trellis.par.get("clip")$panel trellis.par.set(list(axis.line = list(col = "transparent"), clip = list(panel = FALSE))) print(wireframe(soi.fit ~ soi.grid$easting * soi.grid$northing, cuts = 9, aspect = diff(range(soi.grid$n)) / diff(range(soi.grid$e)), xlab = "Easting (km)", ylab = "Northing (km)", shade = TRUE, screen = list(z = 40, x = -70, y = 3)), split = c(1, 1, 2, 1), more = TRUE) print(wireframe(soi.fit ~ soi.grid$easting * soi.grid$northing, cuts = 9, aspect = diff(range(soi.grid$n)) / diff(range(soi.grid$e)), xlab = "Easting (km)", ylab = "Northing (km)", shade = TRUE, screen = list(z = 40, x = -70, y = 0)), split = c(2, 1, 2, 1)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-42-1.png" style="display: block; margin: auto;" /> ] --- A stereo view of the `volcano` surface: .hide-code[ ```r library(lattice) v1 <- wireframe(volcano, shade = TRUE, aspect = c(61 / 87, 0.4), light.source = c(10, 0, 10), screen = list(z = 40, x = -60, y = 0)) v2 <- wireframe(volcano, shade = TRUE, aspect = c(61 / 87, 0.4), light.source = c(10, 0, 10), screen = list(z = 40, x = -60, y = 3)) print(v1, split = c(1, 1, 2, 1), more = TRUE) print(v2, split = c(2, 1, 2, 1)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-44-1.png" style="display: block; margin: auto;" /> ] --- A stereo view of the `quakes` data: .hide-code[ ```r q1 <- cloud(-depth ~ long * lat, data = quakes, pch = 19, cex = 0.3, screen = list(z = 40, x = -60, y = 0)) q2 <- cloud(-depth ~ long * lat, data = quakes, pch = 19, cex = 0.3, screen = list(z = 40, x = -60, y = 3)) print(q1, split = c(1, 1, 2, 1), more = TRUE) print(q2, split = c(2, 1, 2, 1)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Coplots for Surfaces --- We can also use the idea of a coplot for examining a surface a few slices at a time. .pull-left[ For the soil resistivity surface: .hide-code[ ```r wireframe( soi.fit ~ soi.grid$easting * soi.grid$northing, cuts = 9, aspect = diff(range(soi.grid$n)) / diff(range(soi.grid$e)), xlab = "Easting (km)", ylab = "Northing (km)", shade = TRUE, screen = list(z = 40, x = -70, y = 0)) ``` <img src="threeplus_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ Choosing 12 approximately equally spaced slices along `easting`: <!-- **** switch to conditioning on northing? not worth it for now --> .hide-code[ ```r sf <- soi.grid sf$fit <- as.numeric(soi.fit) sube <- eastseq[seq_len(length(eastseq)) %% 7 == 0] sube <- eastseq[round(seq(1, length(eastseq), length.out = 12))] ssf <- filter(sf, easting %in% sube) ggplot(ssf, aes(x = northing, y = fit)) + geom_line() + facet_wrap(~ easting) ``` <img src="threeplus_files/figure-html/unnamed-chunk-47-1.png" style="display: block; margin: auto;" /> ] ] --- For examining a surface this way we fix one variable at a specific value. -- For examining data it is also sometimes useful to choose a narrow window. -- * A narrow window minimizes the variation within the variable we are conditioning on. -- * Too narrow a window contains to few observations to see a signal. --- layout: false ## Conditioning with a Single Plot It is possible to show conditioning in a single plot using an identity channel to distinguish the conditions. .pull-left[ .hide-code[ ```r sf <- soi.grid sf$fit <- as.numeric(soi.fit) sube4 <- eastseq[round(seq(1, length(eastseq), length.out = 4))] ssf4 <- filter(sf, easting %in% sube4) ggplot(mutate(ssf4, easting = factor(easting)), aes(x = northing, y = fit, color = easting, group = easting)) + geom_line() ``` <img src="threeplus_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ This is most useful when the effect of the conditioning variable is a level shift. {{content}} ] -- The number of different levels that can be used effectively is lower. {{content}} -- Over-plotting becomes an issue when used with data points. <!-- **** look into quakes and magnitudes doesn't look worth it --> --- layout: true ## Example: Ozone Levels --- .pull-left[ From Cleveland, William S. (1993) _Visualizing Data_: ```r data(environmental, package = "lattice") ``` Daily measurements of ozone concentration, wind speed, temperature and solar radiation in New York City from May to September of 1973. A data frame with 111 observations on the following 4 variables. - ozone: Average ozone concentration - radiation: Solar radiation - temperature: Maximum daily emperature - wind: Average wind speed ] -- .pull-right[ .hide-code[ ```r pairs(environmental) ``` <img src="threeplus_files/figure-html/unnamed-chunk-50-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Cleveland suggests a cube root transformation for `ozone`: ] .pull-right[ .hide-code[ ```r env2 <- mutate(environmental, ozone = ozone ^ (1 / 3)) pairs(env2) ``` <img src="threeplus_files/figure-html/unnamed-chunk-51-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ There is monotone marginal relationship between the transformed `ozone` and `temperature`. It might be worth looking at this relationship for different levels of `wind` and `radiation`. Two levels each might be a good start: ] .pull-right[ .hide-code[ ```r env2 <- mutate(env2, wind_cut = cut_number(wind, 2, labels = c("low wind", "high wind")), rad_cut = cut_number(radiation, 2, labels = c("low rad", "high rad"))) ggplot(env2, aes(x = temperature, y = ozone)) + geom_point() + geom_smooth(method = "gam") + facet_grid(rad_cut ~ wind_cut) ``` <img src="threeplus_files/figure-html/unnamed-chunk-52-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Faceting on `wind`: ] .pull-right[ .hide-code[ ```r ggplot(env2, aes(x = temperature, y = ozone, color = rad_cut)) + geom_point() + geom_smooth(method = "gam") + facet_wrap(~ wind_cut) ``` <img src="threeplus_files/figure-html/unnamed-chunk-53-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Faceting on `radiation`: ] .pull-right[ .hide-code[ ```r ggplot(env2, aes(x = temperature, y = ozone, color = wind_cut)) + geom_point() + geom_smooth(method = "gam") + facet_wrap(~ rad_cut) ``` <img src="threeplus_files/figure-html/unnamed-chunk-54-1.png" style="display: block; margin: auto;" /> ] ] --- layout: true ## 3D Density Estimates --- Density estimates can be used with 3D data as well. -- The density is a function of three variables, so the density surface is in 4D. -- The points in 3D with a common density level form a _contour surface_. -- The contour surface for a higher density level will be inside the surface for a lower level. --- .pull-left[ Some artificial data: .hide-code[ ```r library(rgl) library(misc3d) x <- matrix(rnorm(9000), ncol = 3) x[1001 : 2000, 2] <- x[1001 : 2000, 2] + 4 x[2001 : 3000, 3] <- x[2001 : 3000, 3] + 4 clear3d() bg3d(col = "white") points3d(x = x[, 1], y = x[, 2], z = x[, 3], size = 2, color = "black") ```
] ] -- .pull-right[ 3D plot of one contour surface: .hide-code[ ```r g <- expand.grid(x = seq(-4, 8.5, len = 30), y = seq(-4, 8.5, len = 30), z = seq(-4, 8.5, len = 30)) d <- kde3d(x[, 1], x[, 2], x[, 3], 0.5, 30, c(-4, 8.5))$d clear3d() xv <- seq(-4, 8.5, len = 30) contour3d(d, 20 / 3000, xv, xv, xv, color = "red") ```
] ] --- .pull-left[ Adding the data: .hide-code[ ```r d2 <- kde3d(x[, 1], x[, 2], x[, 3], 0.5, 30, c(-4, 8.5))$d clear3d() points3d(x = x[, 1], y = x[, 2], z = x[, 3], size = 2, color = "black") contour3d(d2, 20 / 3000, xv, xv, xv, color = "red", alpha = 0.5, add = TRUE) ```
] ] -- .pull-right[ 3D plot of several contours for bw = 0.5: .hide-code[ ```r clear3d() contour3d(d2, c(10, 25, 40) / 3000, xv, xv, xv, color = c("red", "green", "blue"), alpha = c(0.2, 0.5, 0.7), add = TRUE) ```
] ] --- .pull-left[ Density contour for the `quakes` data: .hide-code[ ```r bg3d(col = "white") clear3d() d <- kde3d(quakes$long, quakes$lat, -quakes$depth, n = 40) contour3d(d$d, level = exp(-12), x = d$x / 22, y = d$y / 28, z = d$z / 640, color = "green", ## color2 = "gray", scale = FALSE, add = TRUE) box3d(col = "gray") ```
] ] -- .pull-right[ Adding data and second contour: .hide-code[ ```r clear3d() points3d(quakes$long / 22, quakes$lat / 28, -quakes$depth / 640, size = 2, col = rep("black", nrow(quakes))) box3d(col = "gray") d <- kde3d(quakes$long, quakes$lat, -quakes$depth, n = 40) contour3d(d$d, level = exp(c(-10, -12)), x = d$x / 22, y = d$y / 28, z = d$z / 640, color = c("red", "green"), alpha = 0.1, scale = FALSE, add = TRUE) ```
] ] --- layout: true ## Parallel Coordinates Plots --- -- The same idea as a slope graph, but usually with more variables. -- Some references: -- > A [post](https://eagereyes.org/techniques/parallel-coordinates) by > Robert Kosara. > [Wikipedia entry](https://en.wikipedia.org/wiki/Parallel_coordinates). > [Paper](http://www.ifs.tuwien.ac.at/~mlanzenberger/teaching/ps/ws07/stuff/00146402.pdf) > on recognizing mathematical objects in parallel coordinate plots. -- Some R implementations include `parallelplot()` in `lattice` and `ggparcoord()` in `GGally` --- .pull-left[ A parallel coordinate plot of a data set on the chemical composition of coffee samples: ] .pull-right[ .hide-code[ ```r library(gridExtra) library(GGally) data(coffee, package = "pgmm") coffee <- mutate(coffee, Type = ifelse(Variety == 1, "Arabica", "Robusta")) ggparcoord(coffee[order(coffee$Type), ], columns = 3 : 14, groupColumn = "Type", scale = "uniminmax") + xlab("") + ylab("") + scale_colour_manual( values = c("grey", "red")) + theme(axis.ticks.y = element_blank(), axis.text.y = element_blank(), axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1), legend.position = "top") ``` <img src="threeplus_files/figure-html/unnamed-chunk-61-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ There are 43 samples from 29 countries and two varieties, _Arabica_ or _Robusta_. Examining the plot shows that the varieties are distinguished by their fat and caffeine contents: ] .pull-right[ .hide-code[ ```r ggplot(coffee, aes(x = Fat, y = Caffine, colour = Type)) + geom_point(size = 3) + scale_colour_manual( values = c("grey", "red")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-62-1.png" style="display: block; margin: auto;" /> ] ] --- Some useful adjustments: -- * alpha blending for larger data sets; -- * vary axis scaling; -- * reorder axes; -- * reverse axes; -- Interactive implementations that support these and more are available. --- layout: true ## Australian Crabs --- The data frame `crabs` in the `MASS` package contains measurement on crabs of a species that has been split into two based on color, orange or blue. -- Preserved specimens lose their color (and possibly ability to identify sex). -- Data were collected to help classify preserved specimens. -- It would be useful to have a fairly simple classification rule. --- .pull-left[ A scatterplot matrix view: .hide-code[ ```r data(crabs, package = "MASS") ## splom(~ crabs[4 : 8], ## group = paste(sex, sp), ## data = crabs, ## auto.key = TRUE, ## pscale = 0) ggpairs(crabs, aes(color = interaction(sp, sex)), columns = 4 : 8, upper = list(continuous = "points"), legend = 1) ``` <img src="threeplus_files/figure-html/unnamed-chunk-63-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ The variables shown are: * `FL` frontal lobe size (mm); * `RW` rear width (mm); * `CL` carapace length (mm); * `CW` carapace width (mm); * `BD` body depth (mm). {{content}} ] -- The variables are highly correlated, reflecting overall size and age. {{content}} -- The `RW * CL` or `RW * CW` plots separate males and females well, at least for larger crabs. --- .pull-left[ A parallel coordinates view: ] .pull-right[ .hide-code[ ```r ggparcoord(crabs, columns = 4 : 8, groupColumn = "sp") + scale_color_manual( values = c(B = "blue", O = "orange")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-64-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ A possible next step: Reduce the correlation by a scaling by one of the variables. {{content}} ] -- ```r cr <- mutate(crabs, FLCL = FL / CL, RWCL = RW / CL, CWCL = CW / CL, BDCL = BD / CL) ``` -- .pull-right[ .hide-code[ ```r ## splom(~ cr[9 : 12], group = sp, ## data = cr, ## auto.key = TRUE, pscale = 0) ggpairs(cr, aes(color = sp), columns = 9 : 12, upper = list(continuous = "points"), legend = 1) + scale_color_manual( values = c(B = "blue", O = "orange")) + scale_fill_manual( values = c(B = "blue", O = "orange")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-66-1.png" style="display: block; margin: auto;" /> ] {{content}} ] -- The `CWCL * BDCL` plot shows good species separation by a line. --- .pull-left[ A parallel coordinates plot after scaling by `CL`: ] .pull-right[ .hide-code[ ```r ggparcoord(cr, columns = 9 : 12, groupColumn = "sp") + scale_color_manual( values = c(B = "blue", O = "orange")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-67-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Reorder the variables: ] .pull-right[ .hide-code[ ```r ggparcoord(cr, columns = c(10, 9, 11, 12), groupColumn = "sp") + scale_color_manual( values = c(B = "blue", O = "orange")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-68-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Reorder again: ] .pull-right[ .hide-code[ ```r ggparcoord(cr, columns = c(10, 9, 12, 11), groupColumn = "sp") + scale_color_manual( values = c(B = "blue", O = "orange")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-69-1.png" style="display: block; margin: auto;" /> ] ] --- .pull-left[ Reverse the `CWCL` variable: ] .pull-right[ .hide-code[ ```r ggparcoord(mutate(cr, CWCL = -CWCL), columns = c(10, 9, 12, 11), groupColumn = "sp") + scale_color_manual( values = c(B = "blue", O = "orange")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-70-1.png" style="display: block; margin: auto;" /> ] {{content}} ] -- The patterns for `FLCL`, `CWLC`, and `BDCL` for the two species differ. --- layout: true ## Dimension Reduction by PCA --- -- A number of methods are available for extracting a lower dimensional representation of a data set that captures most important features. -- One approach is _principal component analysis_, or _PCA_. -- * Principal component analysis identifies a rotation of the data that produces uncorrelated scores. -- * Components are ordered by the amount they contribute to the overall variation in the data. -- * Sometimes the first few principal components capture most of the interesting variation. -- * The components are linear combinations of the original variables; they may not be easy to interpret. --- The first two principal components for the scaled crabs data: .hide-code[ ```r fit <- select(cr, 9 : 12) %>% mutate(across(everything(), scale)) %>% prcomp() cr_PC <- cbind(cr, fit$x) p <- ggplot(cr_PC, aes(PC1, PC2, color = sp)) + geom_point() + scale_color_manual(values = c(O = "orange", B = "blue")) library(ggrepel) p + geom_segment(aes(x = 0, y = 0, xend = PC1, yend = PC2), data = as.data.frame(fit$rotation), color = "black", arrow = arrow(length = unit(0.03, "npc"))) + geom_text_repel(aes(x = PC1, y = PC2, label = rownames(fit$rotation)), data = as.data.frame(fit$rotation), color = "black") ``` <img src="threeplus_files/figure-html/unnamed-chunk-71-1.png" style="display: block; margin: auto;" /> ] -- The two species are separated quite well by the first principal component alone. --- For the coffee data the first principal component also separates the varieties well: .hide-code[ ```r fit <- select(coffee, 4:14) %>% mutate(across(everything(), scale)) %>% prcomp() coffee_PC <- cbind(coffee, fit$x) ggplot(coffee_PC, aes(x = PC1, y = PC2, color = Type)) + geom_point(size = 3) + scale_colour_manual(values = c("grey", "red")) ``` <img src="threeplus_files/figure-html/unnamed-chunk-72-1.png" style="display: block; margin: auto;" /> ] --- The first principal component is a weighted combination of the original variables; there is some weight, positive or negative, on almost every variable, which makes the result hard to interpret. .hide-code[ ```r data.frame(variable = rownames(fit$rotation), weight = as.vector(fit$rotation[, 1])) %>% ggplot(aes(x = weight, y = variable)) + geom_col(width = 0.2) ``` <img src="threeplus_files/figure-html/unnamed-chunk-73-1.png" style="display: block; margin: auto;" /> ] --- layout: false ## Grand Tours .pull-left[ <img src="../img/fleatour.gif" style="display: block; margin: auto;" /> ] -- .pull-right[ The grand tour can be viewed as carrying out a sequence of rotations in high dimensional space and showing images of some of the coordinates. {{content}} ] -- This can be useful for discovering groups, outliers, and some lower-dimensional structures. {{content}} -- The rotations can be random or selected to optimize some criterion (guided tours). {{content}} -- The `tourr` package provides one implementation. {{content}} -- Good interactive interfaces do not seem to be readily available at the moment. --- ## Reading Chapter [_Visualizing associations among two or more quantitative variables_](https://clauswilke.com/dataviz/visualizing-associations.html) in [_Fundamentals of Data Visualization_](https://clauswilke.com/dataviz/).
//adapted from Emi Tanaka's gist at //https://gist.github.com/emitanaka/eaa258bb8471c041797ff377704c8505