--- title: "DRY: Don't Repeat Yourself" output: html_document: toc: yes --- ```{r global_options, include = FALSE} knitr::opts_chunk$set(collapse = TRUE) ``` ```{r, include = FALSE} library(lattice) library(tidyverse) set.seed(12345) gbsy <- group_by(barley, site, year) absy <- summarize(gbsy, avg_yield = mean(yield)) ## The `year` variable in the summary is an unordered factor with the ## levels in the wrong order, so we need to fix that: absy <- mutate(absy, year = ordered(year, rev(levels(year)))) ## basic barley data slope graph library(ggrepel) p <- ggplot(absy, aes(x = year, y = avg_yield, group = site)) + geom_line() p <- p + geom_text_repel(aes(label = paste0(site, ", ", round(avg_yield, 1))), hjust = "outward", direction = "y") basic_barley_slopes <- p ``` [Don't repeat yourself](https://en.wikipedia.org/wiki/Don't_repeat_yourself) ([DRY](https://web.archive.org/web/20160303190143/http://programmer.97things.oreilly.com/wiki/index.php/Don't_Repeat_Yourself)) is a valuable software design principle. Some specific implications: * avoid typing the same thing repeatedly; * avoid using cut and paste; * automate what you can. `ggplot` seems to make following this principle a little challenging, but there are things `ggplot` lets you do. Some examples: * Capture intermediate states of your plots in variables. * Move common `aes` specifications to the initial `ggplot` call. R allows you to define functions that abstract the generic operations from the details you want to vary. * You can define a function that allows you to repeat an analysis or recreate a graph when the data is updated. * You can try to make your function flexible enough to allow for different data sets with different variables. * For `ggplot` you can try to create new components that play well with features like faceting. * For `lattice` you can try to develop a panel function that works well in that framework. I am trying to follow the two `ggplot` recommendations in my examples, not always successfully. We can look at the barley yields slope graph as an example. ```{r, eval = FALSE} p <- basic_barley_slopes p + <> ``` --> ## Defining a Theme Function Defining a `theme_slopegraph` function to do the theme adjustment allows the adjustments to be easily reused: ```{r} theme_slopechart <- function(toplabels = TRUE) { thm <- theme(panel.background = element_blank(), panel.grid = element_blank(), axis.ticks = element_blank(), axis.text.y = element_blank(), axis.title = element_blank(), panel.border = element_blank()) if (toplabels) list(thm, scale_x_discrete(position = "top")) else thm } p <- basic_barley_slopes ## from twonum.R p + theme_slopechart() ``` * This function makes placing the labels on the top optional. * Combining components like this has to use `list` instead of `+`. ## Defining a Plot Construction Function Abstracting the construction into a simple function allows us to vary some of the settings: ```{r} barley_slopes <- function(data, textsize = 3) { p <- ggplot(data, aes(x = year, y = avg_yield, group = site)) + geom_line() p + geom_text_repel(aes(label = paste0(site, ", ", round(avg_yield, 1))), hjust = "outward", direction = "y") + theme_slopechart() } barley_slopes(absy) ``` This is not a general slope chart function: the variable names `year` and `avg_yield` are hard wired. To pull out the dependence on our variable names we can * have the aesthetic mapping created outside our function; * refer to the `y` variable as `..y..`; * use a new aesthetic, say `id`, to specify the group and label: ```{r} slopechart0 <- function(data, mapping, textsize = 3) { p <- ggplot(data, mapping) + geom_line(aes(group = ..id..)) p + geom_text_repel(aes(label = paste0(..id.., ", ", round(..y.., 1))), size = textsize, hjust = "outward", direction = "y") + theme_slopechart() } slopechart0(absy, aes(x = year, y = avg_yield, id = site)) ``` * It would be nice to avoid creating the `id` aesthetic, but it seems necessary as `..group..` has been converted to an integer. * Allowing an option to specify the number of digits for rounding is possible but is tricky because of the non-standard evaluation of the `aes` arguments. (It can be done with a combination of `aes_` and `substitute`). * An alternative is to make adding the values optional. To allow better interaction with faceting we can pull out the `theme_slopechart` call and also allow labels to be omitted by specifying `textsize = 0`: ```{r} slopechart <- function(data, mapping, textsize = 3) { p <- ggplot(data, mapping) + geom_line(aes(group = ..id..)) if (textsize > 0) p + geom_text_repel(aes(label = paste0(as.character(..id..), ", ", round(..y.., 1))), size = textsize, hjust = "outward", direction = "y") else p } slopechart(absy, aes(x = year, y = avg_yield, id = site)) + theme_slopechart() ``` Using faceting and line types instead of labels: ```{r} slopechart(barley, aes(x = year, y = yield, id = site, linetype = site), textsize = 0) + theme_slopechart() + facet_wrap(~ variety) ``` A more general approach would be to define a `geom_slopechart` that can be used at any layer level. A simple version might be ```{r} geom_slopechart <- function(textsize = 3) { list(geom_line(aes(group = ..id..)), geom_text_repel(aes(label = paste0(..id.., ", ", round(..y.., 1))), size = textsize, hjust = "outward", direction = "y")) } ggplot(barley, aes(x = year, y = yield, id = site, linetype = site)) + geom_slopechart(textsize = 0) + theme_slopechart() + facet_wrap(~ variety) ``` This isn't quite right: * it dos not allow data or mapping to be specified; * it does not make sure `x` is a factor; * ... The [_Extending ggplot2_ vignette](https://cran.r-project.org/package=ggplot2/vignettes/extending-ggplot2.html) in the `ggplt2` package provides some hints on how to do a more complete job. As is, it does handle three levels reasonably: ```{r} library(gapminder) g1 <- filter(gapminder, year %in% c(1982, 1992, 2002)) m1 <- summarize(group_by(g1, continent, year), mean_gdpp = mean(gdpPercap)) m1 <- mutate(m1, year = factor(year)) ggplot(m1, aes(x = year, y = mean_gdpp, id = continent)) + geom_slopechart() + theme_slopechart() ```