--- title: "More Plots For A Numeric Variable" output: html_document: toc: yes --- ```{r global_options, include = FALSE} knitr::opts_chunk$set(collapse = TRUE) ``` ```{r, include = FALSE} library(dplyr) library(ggplot2) library(lattice) library(gridExtra) ``` ## Grouped Bar Charts _Grouped bar charts_ can be used to show a quantitative variable within two classifications. For the `barley` data from the `lattice` package the `barchart` function can show the average results for `year` within `site`: ```{r} gbsy <- group_by(barley, site, year) absy <- summarise(gbsy, avg_yield = mean(yield)) barchart(avg_yield ~ site, group = year, data = absy, origin = 0, auto.key = TRUE) ``` Using `ggplot2` this can be done by * assigning `site` to the `x` aesthetic; * assigning `year` to the `fill` aesrhetic; * specifying `position = "dodge"`: ```{r} ggplot(absy) + geom_col(aes(x = site, y = avg_yield, fill = year), position = "dodge") ``` * It is possible to use more than two or three classifications with a grouped bar chart but usually not a good idea. * The number of levels at the inner classification that are visually effective is limited. * Dot plots are often a better option. The bars for the inner classification can also be placed in front of each other: ```{r} ggplot(absy) + geom_col(aes(x = site, y = avg_yield, fill = year), position = "identity") ``` * This does not work well as the taller bars cover the shorter ones. * A re-ordering of the bars is needed to make this effective. * For the `identity` position this can be achieved by arranging the rows in decreasing order of `avg_yield` ```{r} ggplot(arrange(absy, desc(avg_yield))) + geom_col(aes(x = site, y = avg_yield, fill = year), position = position_identity()) ``` ## Polar Area Diagrams A classic, though now rarely used, visualization is a _polar area chart_, or [_coxcomb diagram_](https://understandinguncertainty.org/coxcombs), as introduced by Florence Nightingale: ![](https://understandinguncertainty.org/files/Coxcombs.jpg) The basic plot can be viewed as a bar chart drawn in polar coordinates. * The square root of the variable represented is used as the radius to make the areas of the wedges proportional to the magnitudes. * Specifying `width = 1` ensures that there is no gap between the bars/wedges. ```{r} gbs <- group_by(barley, site) abs <- summarise(gbs, avg_yield = mean(yield)) ggplot(abs) + geom_col(aes(y = sqrt(avg_yield), x = site, fill = site), width = 1, color = "black") + coord_polar() ``` The standard coxcomb diagram for a second classification positions the wedges in front of each other. ```{r} ggplot(arrange(absy, desc(avg_yield))) + geom_col(aes(y = sqrt(avg_yield), x = site, fill = year), width = 1, color = "black", position = "identity") + coord_polar() ``` As a visualization an ordinary bar chart is generally more effective. The only advantage of a polar representation is to reflect a periodic feature, as in the original use. ## Recreating The Nightingale Visualization The data are available as the variable `Nightingale` in the `HistData` package. ```{r} library(HistData) head(Nightingale) ``` The data set is in wide format, so needs some tidying. First, select only variables that might be useful. ```{r} library(dplyr) Night <- select(Nightingale, Date, Army, Disease, Wounds, Other) head(Night) ``` Next, convert to long format with variables `cause` and `deaths`: ```{r} library(tidyr) Night <- gather(Night, cause, deaths, 3 : 5) head(Night) ``` Add a variable with the month of the year: ```{r, message = FALSE} library(lubridate) Night <- mutate(Night, Month = month(Date, label = TRUE)) head(Night) ``` Finally, add a variable to distinguish periods before and after April 1, 1855: ```{r} Night <- mutate(Night, period = ifelse(Date < as.Date("1855-04-01"), "before", "after")) head(Night) ``` The pair of plots can now be created as ```{r, fig.height = 3} p <- ggplot(arrange(Night, desc(deaths))) + geom_col(aes(y = deaths, x = Month, fill = cause), width = 1, color = "black", position = "identity") + scale_y_sqrt() + facet_grid(. ~ period) + coord_polar(start = pi) + scale_fill_manual(values = c(Wounds = "pink", Other = "darkgray", Disease = "lightblue")) p ``` Some final theme adjustments: ```{r} p + theme(axis.title = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank()) ``` ## Radar Charts ```{r, include = FALSE} library(readr) if (! file.exists("GLB.Ts+dSST.csv")) download.file("https://stat.uiowa.edu/~luke/data/GLB.Ts+dSST.csv", "GLB.Ts+dSST.csv") gast <- read_csv("GLB.Ts+dSST.csv", skip = 1)[1 : 13] library(tidyr) lgast <- gather(gast, Month, Temp, -Year, factor_key = TRUE) if (class(lgast$Temp) == "character") lgast <- mutate(lgast, Temp = as.numeric(Temp)) library(ggplot2) ``` A polar coordinate transformation can also be used with a line chart. This leads to a _radar chart_, also called a _spider web chart_. Using the global surface temperature data, the data can be treated as a single time series and draw as a single line showing temerature at each month as ```{r} lgast <- arrange(lgast, Year, Month) library(lubridate) past_year <- year(today()) - 1 lgast_last <- filter(lgast, Year == past_year) p <- ggplot(lgast) + geom_path(aes(x = Month, y = Temp, group = 1, color = Year)) + geom_line(aes(x = Month, y = Temp, group = Year), data = lgast_last, color = "red") p ``` The lines connecting December back to January are rendered more naturally with a radar chart. A slightly modified version of `coord_polar` is needed to make this work properly. The definition is available in at least one package, but can also be included directly: ```{r} coord_radar <- function(theta = "x", start = 0, direction = 1) { theta <- match.arg(theta, c("x", "y")) r <- if (theta == "x") "y" else "x" ggproto("CordRadar", CoordPolar, theta = theta, r = r, start = start, direction = sign(direction), is_linear = function(coord) TRUE) } ``` The radar chart is then ```{r} p + coord_radar() ``` * Radar charts can be useful for periodic data. * Some fields use them heavily, so using them is expected. * They do have drawbacks, as described in [this blog post](https://blog.scottlogic.com/2011/09/23/a-critique-of-radar-charts.html). ## Bubble Charts A form of chart often seen in the popular press is the _bubble chart_. * The bubble chart uses ares of circles to represent magnitudes. * In on-line publications further information on each of the bubble is often provided through interactions, such as a mouse-over popup. * Other charts forms are almost always better for encoding the magnitude information. * It is also easy to get the encoding wrong: ![](img/shrinking-banks-orig.jpg) ([Corrected version](img/shrinking-banks-correct.jpg)) * `ggplot` bubble charts for the average `yield` values from the `barley` data and for the 2007 population sizes for the gapminder data: ```{r, echo = FALSE, message = FALSE} ## derived from ## http://stackoverflow.com/questions/38959093/packed-bubble-pie-charts-in-r library(packcircles) circles <- function(rad2) { stopifnot(all(rad2 > 0)) rad <- sqrt(rad2) n <- length(rad) lim <- n * max(rad) lims <- c(-lim, lim) old.seed <- { runif(1); .Random.seed } set.seed(12345) v <- circleLayout(cbind(runif(n), runif(n), rad), lims, lims) .Random.seed <- old.seed layoutDF <- as.data.frame(v$layout) names(layoutDF) <- c("x", "y", "radius") list(layout = layoutDF, data = circlePlotData(v$layout)) } bubble_theme <- function() { list(coord_equal(), theme_bw(), theme(axis.text = element_blank(), axis.ticks = element_blank(), axis.title = element_blank())) } ## barley yields v <- circles(absy$avg_yield) vv <- v$data vv$year <- absy$year[vv$id] p1 <- ggplot(vv) + geom_polygon(aes(x, y, group = id, fill = year)) + geom_text(aes(x = x, y = y, label = absy$site), data = v$layout, size = 1.5) + bubble_theme() ## gapminder populations for 2007: library(gapminder) gm2007 <- filter(gapminder, year == 2007) v <- circles(gm2007$pop) vv <- v$data vv$continent <- gm2007$continent[vv$id] p2 <- ggplot(vv) + geom_polygon(aes(x, y, group = id, fill = continent)) + coord_equal() + bubble_theme() grid.arrange(p1, p2, nrow = 1) ```