--- title: "Color Issues" output: html_document: toc: yes code_folding: show code_download: true --- ```{r setup, include = FALSE} source(here::here("setup.R")) options(htmltools.dir.version = FALSE) options(conflicts.policy = "depends.ok") knitr::opts_chunk$set(collapse = TRUE, fig.height = 5, fig.width = 6, fig.align = "center") library(lattice) library(tidyverse) theme_set(theme_minimal() + theme(text = element_text(size = 16)) + theme(panel.border = element_rect(color = "grey30", fill = NA))) set.seed(12345) ``` Color is very effective when used well. But using color well is not easy. Some of the issues: * Perception depends on context. * Simple color assignments may not separate equally well. * Effectiveness may vary with the medium (screen, projector, print). * Some people do not perceive the full spectrum of colors. * Grey scale printing. * Some colors have cultural significance. * Cultural significance may vary among cultures and with time. An internet "controversy" in 2015: [The Dress](https://en.wikipedia.org/wiki/The_dress) (and a follow-up [article](https://www.independent.co.uk/life-style/fashion/news/the-dress-actual-colour-brand-and-price-details-revealed-10074686.html)) ## Color Spaces ### RGB and HSV Color Spaces Computer monitors and projectors work in terms of red, green, and blue light. Amounts of red green and blue (and alpha level) are stored as integers in the range between 0 and 255 (8-bit bytes). ```{r} cols <- c("red", "green", "blue", "yellow", "cyan", "magenta") rgbcols <- col2rgb(cols); colnames(rgbcols) <- cols rgbcols ``` Colors are often encoded in _hexadecimal_ form (base 16). ```{r} rgb(1, 0, 0) ## pure red rgb(0, 0, 1) ## pure blue ``` ```{r} rgb(255, 0, 0, maxColorValue = 255) rgb(0, 0, 255, maxColorValue = 255) ``` Hue, saturation, value (HSV) is a simple transformation of RGB. ```{r} rgb2hsv(rgbcols) ``` HSV is a little more convenient since it allows the hue to be controlled separately. But saturation and value attributes are not particularly useful for specifying colors that work well perceptually. A color wheel of fully saturated colors: ```{r, class.source = "fold-hide"} wheel <- function(col, radius = 1, ...) pie(rep(1, length(col)), col = col, radius = radius, ...) wheel(rainbow(6)) ``` Removing saturation: ```{r, class.source = "fold-hide"} library(colorspace) wheel(desaturate(rainbow(6))) ``` Fully saturated yellow is brighter than red, which is brighter than blue. ### HCL Color Space The _rainbow_ palette of the color wheel is often a default in visualization systems. A [blog post](https://www.zeileis.org/news/endrainbow/) illustrates why this is a bad idea. The rainbow hues are evenly spaced in the color spectrum, but chroma and luminance are not. Luminance in particular is not monotone across the palette. ```{r, fig.width = 10, class.source = "fold-hide"} rgb2hcl <- function(col) { ## ignores alpha col <- RGB(t(col[1 : 3, ]) / 255) col <- as(col, "polarLUV") col <- t(col@coords[, 3 : 1, drop = FALSE]) rownames(col) <- tolower(rownames(col)) col } col2hcl <- function(col) rgb2hcl(col2rgb(col)) pal <- function(col, border = "light gray", ...) { n <- length(col) plot(0, 0, type = "n", xlim = c(0, 1), ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "", ...) rect((0 : (n - 1)) / n, 0, (1 : n) / n, 1, col = col, border = border) } par(mfrow = c(1, 2)) pal(rainbow(6), main = "Saturated Rainbow") pal(desaturate(rainbow(6)), main = "Desaturated") ``` ```{r, class.source = "fold-hide"} specplot(rainbow(6), ldw = 4) ``` The hue, chroma, luminance ([HCL](https://en.wikipedia.org/wiki/HCL_color_space)) space allows separate control of: * _Hue_, the color. * _Chroma_, the amount of the color. * _Luminance_, or perceived brightness. HCL makes it easier to create perceptually uniform color palettes. A palette with constant chroma, evenly spaced hues and evenly spaced luminance values: ```{r, fig.width = 10, class.source = "fold-hide"} rain6 <- hcl(seq(0, 360 * 5 / 6, len = 6), 50, seq(60, 80, len = 6)) par(mfrow = c(1, 2)) pal(rain6, main = "Uniform Rainbow") pal(desaturate(rain6), main = "Desaturated") ``` ```{r, class.source = "fold-hide"} specplot(rain6, lwd = 4) ``` For a fully saturated red, varying only chroma to reduce the amount of color: ```{r, class.source = "fold-hide"} red_hcl <- list(h = 12.17395, c = 179.04076, l = 53.24059) specplot(hcl(red_hcl$h, red_hcl$c * seq(0, 1, len = 10), red_hcl$l), lwd = 4) ``` For a given hue, not all combinations of chroma and luminance are possible. In particular, for low luminance values the available chroma range is limited. The [`ggplot` book](https://ggplot2-book.org/) contains this visualization of the HCL space. * Hue is mapped to angle. * Chroma is mapped to radius. * Luminance is mapped to facets. The origins with zero chroma are shades of grey. ```{r, out.width = 500, echo = FALSE} knitr::include_graphics(IMG("ggplot2_hclspace.png")) ``` HCL is a transformation of the [CIEluv](https://en.wikipedia.org/wiki/CIELUV) color space designed for perceptual uniformity. The definition of the luminance takes into account the light sensitivity of a standard human observer at various wave lengths. Light sensitivity for different wave lengths in daylight conditions (photopic vision) and under dark adapted conditions (scotopic vision): ```{r, echo = FALSE} knitr::include_graphics(IMG("light-sensitivity.jpg")) ``` ### Munsell Color Space Another color space, similar to HCL, is the [Munsell system](https://en.wikipedia.org/wiki/Munsell_color_system) developed in the early 1900s. This system uses a Hue, Value, Chroma encoding. The [`munsell` package](https://github.com/cwickham/munsell) provides an R interface and is used in `ggplot`. Munsell specifications are of the form `"H V/C"`, such as `5R 5/10`. Possible hues are ```{r, message = FALSE} library(munsell, exclude = "desaturate") mnsl_hues() ``` `V` should be an integer between 0 and 10. `C` should be an even integer less than 24, but not all combinations are possible. Adjusting colors in the value, chroma, and hue dimensions: ```{r munsell-blues, echo = FALSE} my_blue <- "5PB 5/8" plot_mnsl(c( lighter(my_blue, 2), my_blue, darker(my_blue, 2), munsell::desaturate(my_blue, 2), my_blue, saturate(my_blue, 2), rygbp(my_blue, 2), my_blue, pbgyr(my_blue, 2))) ``` ```{r munsell-blues, eval = FALSE} ``` Creating scales: ```{r, warning=FALSE, fig.width = 8, fig.height = 4} plot_mnsl(sapply(0 : 6, darker, col = "5PB 7/4")) + facet_wrap(~ num, nrow = 1) ``` Examining available colors: ```{r, warning = FALSE} hue_slice("5R") ``` ```{r} value_slice(5) ``` Complementary colors: ```{r, warning=FALSE, fig.width = 8, fig.height = 4} complement_slice("5R") ``` ## Opponent Process Theory The _Opponent Process Model_ of vision says that the brain divides the visual signal among three opposing contrast pairs: * black and white; * red and green; * yellow and blue. The black/white pair corresponds to luminance in HCL Hue and chroma in HCL span the two chromatic axes. The luminance axis has higher resolution than the two chromatic axes. The major form of color vision deficiency reflects an inability to distinguish differences along the red/green axis. Impairment along the yellow/blue axis does occur as well but is much rarer. ## Contrast and Comparisons Vision reacts to differences, not absolutes. * Contrast is very important. * Context is very important. Small differences in shading or hue can be recognized when objects are contiguous but be much harder to see when they are separated. ![](https://openi.nlm.nih.gov/imgs/512/399/3852004/PMC3852004_1471-2105-14-S15-S12-2.png) _Simultaneous brightness contrast_: a grey patch on a dark background looks lighter than the same grey patch on a light background. ```{r, class.source = "fold-hide"} plot(0, 0, type = "n", xlim = c(0, 1), ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "") rect(0, 0, 0.5, 1, col = "lightgrey", border = NA) rect(0.5, 0, 1, 1, col = "darkgrey", border = NA) rect(0.2, 0.3, 0.3, 0.7, col = "grey", border = NA) rect(0.7, 0.3, 0.8, 0.7, col = "grey", border = NA) ``` An example we saw earlier: ![](`r IMG("chess1.png")`) ![](`r IMG("chess2.png")`) Some more are available [here](https://blog.revolutionanalytics.com/2018/08/luminance-illusion.html), including: ```{r, echo = FALSE} knitr::include_graphics(IMG("luminanim.gif")) ``` ```{r, eval = FALSE, echo = FALSE} plot(0, 0, type = "n", xlim = c(0, 1), ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "") rect(0, 0, 0.5, 1, col = hcl(0), border = NA) rect(0.5, 0, 1, 1, col = hcl(120), border = NA) pink1 <- rgb(1, 192 / 255, 220 / 255) rect(0.2, 0.3, 0.3, 0.7, col = pink1, border = NA) rect(0.7, 0.3, 0.8, 0.7, col = pink1, border = NA) ``` Using luminance or grey scale alone does not work well for encoding categorical variables against a key. Grey scale can be effective for showing continuous transitions in pseudo-color images. ```{r, class.source = "fold-hide"} filled.contour(volcano, color.palette = grey.colors) ``` Grey scale is less effective for segmented maps, or choropleth maps; only a few levels can be accurately decoded. ## Interactions with Size, Background and Proximity ```{r, echo = FALSE, eval = FALSE} library(latticeExtra) library(mapproj) data(USCancerRates) mapplot(rownames(USCancerRates) ~ log(rate.male), data = USCancerRates, ##colramp = grey.colors, border = NA, map = map("county", plot = FALSE, fill = TRUE, projection = "mercator")) ``` For small items more contrast and more saturated colors are needed: ```{r, fig.height = 7, class.source = "fold-hide"} x <- runif(6, 0.1, 0.9) y <- runif(6, 0.1, 0.9) cols <- c("red", "green", "blue", "yellow", "cyan", "magenta") f <- function(size = 1, black = FALSE) { plot(x, y, type = "n", xlim = c(0, 1), ylim = c(0, 1)) if (black) rect(0, 0, 1, 1, col = "black") text(x, y, cols, col = cols, cex = size) } opar <- par(mfrow = c(2, 2)) f(1) f(4) f(1, TRUE) f(4, TRUE) par(opar) ``` Variations in luminance are particularly helpful for seeing fine structure, such as small text or small symbols: ```{r, class.source = "fold-hide"} plot(0, type = "n", xlim = c(0, 1), ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "") rect(0, 0, 1, 1, col = hcl(0)) ## defaults: c = 35, l = 85 qbf <- "The quick brown fox jumps ..." text(0.5, 0.3, label = qbf, col = hcl(180)) ## hue text(0.5, 0.5, label = qbf, col = hcl(0, c = 70)) ## chroma text(0.5, 0.7, label = qbf, col = hcl(0, l = 50)) ## luminance ``` Chrominance (hue and chroma) differences alone are not sufficient for small items. Ware recommends a luminance contrast of at least 3:1 for small text; 10:1 is preferable. Small areas also need variation in more than hue: ```{r, echo = FALSE} plot(0, 0, type = "n", xlim = c(0, 1), ylim = c(0, 1), axes = FALSE, xlab = "", ylab = "") pal0 <- function(col, bottom = 0, top = 1) { n <- length(col) rect((0 : (n - 1)) / n, bottom, (1 : n) / n, top, col = col, border = "lightgrey") } pal0(rainbow_hcl(8), 0, 0.2) pal0(rainbow_hcl(8), 0.3, 0.31) pal0(rainbow(8), 0.5, 0.7) pal0(rainbow(8), 0.8, 0.81) ``` Contrasting borders can help for larger areas with similar luminance: * outlines on text; * borders on symbols; * borders on regions, e.g in [maps](https://colorbrewer2.org) ```{r, fig.width = 10, echo = FALSE} N <- nrow(trees) op <- palette(rainbow(N, end = 0.9)) f <- function(fg, main = "") with(trees, symbols(Height, Volume, circles = Girth / 16, inches = FALSE, bg = 1 : N, fg = fg, main = main)) par(mfrow = c(1, 2)) f(NA, "No Borders") f("gray30", "Grey Borders") palette(op) ``` ## Color Specification in R ```{r, include = FALSE} ncols <- length(colors()) ``` A large number of _named colors_ are available (currently `r ncols`). Some examples: ```{r} col2rgb("red") col2rgb("forestgreen") col2rgb("deepskyblue") col2rgb("firebrick") ``` These will show some details: ```{r, eval = FALSE} colors() demo(colors) ``` The available named colors follow a widely used [standard](https://www.w3schools.com/colors/colors_x11.asp). These colors include the 140 [_web colors_](https://htmlcolorcodes.com/color-names/) supported on modern browsers. Individual colors can also be specified using `rgb()` or `hcl()` or as hexadecimal specifications. ```{r} library(colorspace) hex2RGB("#FF0000") ``` Using color spaces: ```{r} rgb(1, 0, 0) rgb(255, 0, 0, max = 255) rgb2hsv(col2rgb("red")) ``` Converting to HCL: ```{r} rgb2hcl <- function(col) { ## ignores alpha col <- RGB(t(col[1 : 3, ]) / 255) col <- as(col, "polarLUV") col <- t(col@coords[, 3 : 1, drop = FALSE]) rownames(col) <- tolower(rownames(col)) col } col2hcl <- function(col) rgb2hcl(col2rgb(col)) col2hcl("red") col2hcl("green") col2hcl("blue") col2hcl("yellow") col2hcl("cyan") col2hcl("magenta") hcl(12.17, 179.04, 53.24) ``` Color pickers can help: * Google search: https://www.google.com/search?q=color+picker * `colourPicker()` in the `colourpicker` package. When a set of colors is needed to encode variable values it is usually best to use a suitable _palette_. ## Color Palettes _Color palettes_ are collections of colors that work well together. It is useful to distinguish three kinds of palettes: * qualitative/categorical palettes; * sequential palettes; * diverging palettes. Tools for selecting palettes include: * [ColorBrewer](https://colorbrewer2.org); available in the `RColorBrewer` package. * [HCL Wizard](https://hclwizard.org/); also available as `hclwizard` in the `colorspace` package. A [blog post](https://flowingdata.com/2018/04/12/visualization-color-picker-based-on-perception-research/) with some further options. Some [current US government work](https://designsystem.digital.gov/components/colors/) on color palettes; [more extensive notes](https://xdgov.github.io/data-design-standards/components/colors) and [code](https://github.com/uswds/uswds). R color palette functions: * `rainbow()` * `heat.colors()` * `terrain.colors()` * `topo.colors()` * `cm.colors()` * `grey.colors()` * `gray.colors()` These all take the number of colors as an argument, as well as some additional optional arguments. The `hcl.color()` function provides access to the palettes defined in the `colorspace` package. `colorRampPalette()` can be used to create a _palette function_ that interpolates between a set of colors using * RGB space or Lab (similar to HCL) space; * linear or spline interpolation. ```{r} rwb <- colorRampPalette( c("red", "white", "blue")) rwb(5) ``` ```{r palette-function-example, eval = FALSE} filled.contour(volcano, color.palette = rwb, asp = 1) ``` ```{r palette-function-example, echo = FALSE} ``` With more perceptually comparable extremes (from the Blue-Red palette of HCL Wizard): ```{r palette-function-muted1, eval = FALSE} rwb1 <- colorRampPalette( c("#8E063B", "white", "#023FA5")) filled.contour(volcano, color.palette = rwb1, asp = 1) ``` ```{r palette-function-muted1, echo = FALSE} ``` An alternative uses the `muted` function from package `scales`: ```{r palette-function-muted2, eval = FALSE} rwb2 <- colorRampPalette( c(scales::muted("red"), "white", scales::muted("blue"))) filled.contour(volcano, color.palette = rwb2, asp = 1) ``` ```{r palette-function-muted2, echo = FALSE} ``` Most base and `lattice` functions allow a vector of colors to be specified. Some, like `filled.contour()` and `levelplot()` allow a palette function to be provided. `ggplot` provides a framework for specifying palette functions to use with `scale_color_xyz()` and `scale_fill_xyz()` functions. Packages like `colorspace` and `viridis` provide additional `scale_color_xyz()` and `scale_fill_xyz()` functions. ## RColorBrewer Palettes The available palettes: ```{r brewer-palletes, eval = FALSE} library(RColorBrewer) display.brewer.all() ``` Palettes in the first group are _sequential_. The second group are _qualitative_. The third group are _diverging_. ```{r brewer-palletes, echo = FALSE, fig.width = 8, fig.height = 8} ``` The `"Blues"` palette: ```{r blues-palette, eval = FALSE} display.brewer.pal(9, "Blues") ``` ```{r blues-palette, echo = FALSE} ``` As RGB values: ```{r} brewer.pal(9, "Blues") ``` The palettes are limited to a maximum number of levels. To obtain more levels you can interpolate. ```{r} brewer.pal(10, "Blues") pbrbl <- colorRampPalette(brewer.pal(9, "Blues"), interpolate = "spline") pbrbl pbrbl(10) ``` ## Colorspace Palettes The `colorspace` package provides a wide range of pre-defined palettes: ```{r, fig.width = 10} library(colorspace) hcl_palettes(plot = TRUE) ``` A particular number of colors from one of these palettes can be obtained with ```{r} qualitative_hcl(4, palette = "Dark 3") ``` The functions `sequential_hcl()` and `diverging_hcl()` are analogous. For use with `ggplot2` the package provides scale functions like `scale_fill_discrete_qualitative()` and `scale_color_continuous_sequential()`. A [package vignette](https://cran.r-project.org/package=colorspace/vignettes/colorspace.html) provides more details and background. ## Viridis Palettes ```{r, message = FALSE, echo = FALSE} n_col <- 128 img <- function(obj, nam) { image(seq_along(obj), 1, as.matrix(seq_along(obj)), col = obj, main = nam, ylab = "", xaxt = "n", yaxt = "n", bty = "n") } library(viridis) par(mfrow = c(8, 1), mar = rep(1, 4)) img(rev(viridis(n_col)), "viridis") img(rev(magma(n_col)), "magma") img(rev(plasma(n_col)), "plasma") img(rev(inferno(n_col)), "inferno") img(rev(cividis(n_col)), "cividis") img(rev(mako(n_col)), "mako") img(rev(rocket(n_col)), "rocket") img(rev(turbo(n_col)), "turbo") ``` These are provided in package `viridisLite`. Palette functions are `viridis()`, `mako()`, etc.. They are also available via the `hcl.colors()` function. For use in `ggplot` they can be specified in the viridis color scale functions. ## Palettes in R Graphics ```{r, echo = FALSE} if (! file.exists("Playfair.dat")) download.file("http://www.stat.uiowa.edu/~luke/data/Playfair", "Playfair.dat") Playfair <- read.table("Playfair.dat") Playfair$city <- rownames(Playfair) rownames(Playfair) <- NULL ``` `ggplot` uses `scale_color_xyz()` or `scale_fill_xyz()`. For discrete scales the choices for `xyz` include * `hue` varies the hue (default for unordered factors); * `grey` uses grey scale; * `brewer` uses ColorBrewer palettes; * `manual` allows explicit specification. * `viridis_d` for an alternative palette family (default for orderer factors). For continuous scales the choices for `xyz` include * `gradient` interpolates between two colors, low and high; * `gradient2` interpolates between three colors, low, medium, high; * `gradientn` interpolates between a vector of colors; * `distiller` for ColorBrewer palettes. * `viridis_c` for an alternative palette family. Others are available in packages such as `colorspace`. The default qualitative and sequential discrete palettes: ```{r, fig.width = 10, fig.height = 4, class.source = "fold-hide"} library(gapminder) gap_2007 <- filter(gapminder, year == 2007) %>% slice_max(pop, n = 20) p <- mutate(gap_2007, country = reorder(country, pop)) %>% ggplot(aes(x = gdpPercap, y = lifeExp, fill = continent)) + scale_size_area(max_size = 10) + scale_x_log10() + geom_point(size = 4, shape = 21) + guides(fill = guide_legend(override.aes = list(size = 4))) p1 <- p + ggtitle("Hue") p2 <- p + scale_fill_viridis_d() + ggtitle("Viridis") library(patchwork) p1 + p2 ``` Discrete examples for `brewer`, `colorspace` and `manual`: ```{r, fig.width = 9, fig.height = 6, class.source = "fold-hide"} p1 <- p + scale_fill_brewer(palette = "Set1") + ggtitle("Brewer Set1") p2 <- p + scale_fill_brewer(palette = "Set2") + ggtitle("Brewer Set2") p3 <- p + scale_fill_discrete_qualitative("Dark 3") + ggtitle("Colorspace Dark 3") p4 <- p + scale_fill_manual(values = c(Africa = "red", Asia = "blue", Americas = "green", Europe = "grey")) + ggtitle("Manual") (p1 + p2) / (p3 + p4) ``` The default for continuous scales is `gradient` from a dark blue to a light blue: ```{r, class.source = "fold-hide"} V <- data.frame(x = rep(seq_len(nrow(volcano)), ncol(volcano)), y = rep(seq_len(ncol(volcano)), each = nrow(volcano)), z = as.vector(volcano)) p <- ggplot(V, aes(x, y, fill = z)) + geom_raster() + coord_fixed() p ``` Some alternatives: ```{r alt-continuous, echo = FALSE, fig.width = 12, fig.height = 9} ``` ```{r alt-continuous, eval = FALSE, class.source = "fold-hide"} p1 <- p + scale_fill_gradient2( low = "red", mid = "white", high = "blue", midpoint = median(volcano)) + ggtitle("Red-White-Blue Gradient") p2 <- p + scale_fill_viridis_c() + ggtitle("Viridis") p3 <- p + scale_fill_gradientn( colors = terrain.colors(8)) + ggtitle("Terrain") vbins <- seq(80, by = 20, length.out = 7) nc <- length(vbins) - 1 p4 <- ggplot(mutate(V, z = fct_rev(cut(z, vbins))), aes(x, y, fill = z)) + geom_raster() + scale_fill_manual(values = rev(terrain.colors(nc))) + ggtitle("Discretized Terrain") (p1 + p2) / (p3 + p4) ``` Discretizing a continuous range to a modest number of levels can make decoding values from a legend easier. ## Reduced Color Vision Color vision deficiency affects about 10% of males, a smaller percentage of females. The most common form is reduced ability to distinguish red and green. Some web sites provide tools to simulate how a visualization would look to a color vision deficient viewer. The R packages `dichromat`, `colorspace`, and `colorblindr` provide tools for simulating how colors would look to a color vision deficient viewer for three major types of color vision deficiency: * deuteranomaly (green cone cells defective); * protanomaly (red cone cells defective); * tritanomaly (blue cone cells defective). An article explaining the color vision impairment simulation is available [here](http://colorspace.r-forge.r-project.org/articles/color_vision_deficiency.html) Using some tools from packages `colorspace` and `colorblinder` we can simulate what a plot would look like in grey scale and to someone with some of the major types of color impairment. A plot with the default discrete color palette: ```{r, class.source = "fold-hide"} p <- ggplot(gap_2007, aes(gdpPercap, lifeExp, color = continent)) + geom_point(size = 4) + scale_x_log10() + guides(color = guide_legend(override.aes = list(size = 4))) p ``` ```{r cvd-examples, fig.width = 9.5, fig.height = 6.5, class.source = "fold-hide"} library(colorblindr) library(colorspace) library(grid) color_check <- function(p) { p1 <- edit_colors(p + ggtitle("Desaturated"), desaturate) p2 <- edit_colors(p + ggtitle("deutan"), deutan) p3 <- edit_colors(p + ggtitle("protan"), protan) p4 <- edit_colors(p + ggtitle("tritan"), tritan) gridExtra::grid.arrange(p1, p2, p3, p4, nrow = 2) } color_check(p) ``` For the Viridis palette: ```{r, class.source = "fold-hide"} pv <- p + scale_color_viridis_d() pv ``` ```{r, fig.width = 9.5, fig.height = 6.5, class.source = "fold-hide"} color_check(pv) ``` The `swatchplot()` function in the `colorspace` package can be used with the `cvd = TRUE` argument to simulate how specific palettes work for different color vision deficiencies: ```{r} colorspace::swatchplot(rainbow(6), cvd = TRUE) ``` ```{r} colorspace::swatchplot(hcl.colors(6), cvd = TRUE) ``` ## Two Issues to Watch Out For ### Missing Values It is common for default settings to not assign a color for missing values. In a choropleth map with (made-up) data where one state's value is missing this might not be noticed. ```{r, class.source = "fold-hide"} m <- map_data("state") d <- data.frame(region = unique(m$region), val = ordered(sample(1 : 4, 49, replace = TRUE))) m <- left_join(m, d, "region") pm <- ggplot(m) + geom_polygon(aes(long, lat, group = group, fill = val)) + coord_map() + ggthemes::theme_map() dnm <- mutate(m, val = replace(val, region == "michigan", NA)) pm %+% dnm ``` Unless the viewer is very familiar with US geography. Or is from Michigan. In a scatterplot there are even fewer cues: ```{r, fig.width = 10, fig.height = 4, class.source = "fold-hide"} gnc <- mutate(gap_2007, continent = replace(continent, country == "China", NA)) pv + pv %+% gnc ``` Specifying `na.value = "red"`, or some other color, will make sure `NA` values are visible: ```{r, fig.width = 14, fig.height = 5, class.source = "fold-hide"} (pm %+% dnm + scale_fill_viridis_d(na.value = "red") + theme(legend.position = "top")) + (pv %+% gnc + scale_color_viridis_d(na.value = "red")) ``` Using outlines can also help: ```{r, fig.width = 14, fig.height = 5, class.source = "fold-hide"} p1 <- pm %+% dnm + geom_polygon(aes(long, lat, group = group), fill = NA, color = "black", linewidth = 0.1) + theme(legend.position = "top") p2 <- pv %+% gnc + geom_point(shape = 21, fill = NA, color = "black", size = 4) p1 + p2 ``` A final plot might handle missing values differently, but for initial explorations it is a good idea to make sure they are clearly visible. ### Aligning Diverging Palettes Diverging palettes are very useful for showing deviations above or below a baseline. ```{r, fig.height = 3, fig.width = 7, class.source = "fold-hide"} par(mfrow = c(1, 2)) RColorBrewer::display.brewer.pal(7, "PRGn") RColorBrewer::display.brewer.pal(6, "PRGn") ``` For a diverging palette to work properly, the palette base line needs to be aligned with the data baseline. How to do this will depend on the palette, but you do need to keep this in mind when using a diverging palette. Just using `scale_fill_brewer` is not enough when the value range is not symmetric around the baseline: ```{r, class.source = "fold-hide"} m <- map_data("state") d <- data.frame(region = unique(m$region), val = ordered(sample((1 : 6) - 3, 49, replace = TRUE))) m <- left_join(m, d, "region") p <- ggplot(m) + geom_polygon(aes(long, lat, group = group, fill = val)) + coord_map() + ggthemes::theme_map() + theme(legend.position = "right") p + scale_fill_brewer(palette = "PRGn") ``` Setting the scale limits explicitly forces a 7-category symmetric scale that aligns the zero value with the middle color: ```{r, warning = FALSE} lims <- -3 : 3 p + scale_fill_brewer(palette = "PRGn", limits = lims) ``` This shows a category in the legend for -3 that does not appear in the map. This is often what you want. But if you want to drop the -3 category, one option is to use a manual scale: ```{r} vals <- RColorBrewer::brewer.pal(7, "PRGn") names(vals) <- lims p + scale_fill_manual(values = vals[-1]) ``` ## Bivariate Palettes It is possible to encode two variables in a palette. Some sample palettes: ```{r, echo = FALSE} knitr::include_graphics(IMG("bivpal.png")) ``` Bivariate palettes are sometimes used in [bivariate choropleth maps](https://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map/). Some recommendations from Cynthia Brewer are available [here](http://www.personal.psu.edu/cab38/ColorSch/Schemes.html). A [discussion](https://junkcharts.typepad.com/junk_charts/2023/04/bivariate-choropleths.html) of a recent example. Unless one variable is binary, and the palette is very well chosen, it is hard to decode a visualization using a binary palette without constantly referring to the key. ## Culture, Tradition, and Conventions Colors can have different meanings in different cultures and at different times. * A [visual representation](https://www.informationisbeautiful.net/visualizations/colours-in-cultures/) * A [blog post](https://www.huffpost.com/entry/what-colors-mean-in-other_b_9078674) at the Huffington Post. * A [similar post](https://www.shutterstock.com/blog/color-symbolism-and-meanings-around-the-world) at Shutterstock. Conventions can also give colors particular meanings: * red/green in traffic lights; * red/green colors in microarray heatmaps; * red states and blue states; * pink for breast cancer; * pink for girls, blue for boys; * black for mourning. ### Traffic Lights [Traffic lights](https://www.autoevolution.com/news/automotive-wiki-why-are-traffic-lights-red-yellow-and-green-42557.html) use red/green, even though this is a major axis of color vision deficiency. The convention comes from railroads. The red used generally contains some orange and the green contains blue to help with red/green color vision deficiency. Position provides an alternate encoding. Orientations do vary. ### Microarray Heatmaps - [Microarrays](https://en.wikipedia.org/wiki/DNA_microarray) are used for the analysis of gene-level changes and differences in bio-medical research. - Dyes are used that result in genes with a high response appearing red and genes with a low response appearing green. - In keeping with this physical characteristic of microarrays, a common visualization of the data is as a red/green heat map. ### Red States and Blue States It is now standard in the US to refer to Republican-leaning states as red states and Democrat leaning states as blue states. This is a fairly recent convention, dating back to the 2000 presidential election. Prior to 1980 it was somewhat more traditional to use red for more left-leaning Democrats. A map of the 1960 election results uses these more traditional colors. ```{r, echo = FALSE} knitr::include_graphics(IMG("e1960_ecmap.GIF")) ``` In 1996 the _New York Times_ used blue for Democrat, red for Republican, but the _Washington Post_ used the opposite color scheme. The long, drawn out process of the 2000 election may have contributed to fixing the color schema at the current convention. ## Notes Points need more saturation, luminance than areas. False color images may benefit from discretizing. Bivariate encodings (e.g. `x = hue, y = luminance`) are possible but tricky and not often a good idea. Best if at least one is binary. Providing a second encoding, e.g. shape, position can help for color vision deficient viewers and photocopying. In area plots and maps it is important to distinguish between base line values and missing values. If observed values only cover part of a possible range, it is sometimes appropriate to use a color coding that applies to the entire possible range. For diverging palettes, some care may be needed to make sure the neutral color and the neutral value are properly aligned. ## References > Few, Stephen. "Practical rules for using color in charts." Visual > Business Intelligence Newsletter 11 > (2008): 25. ([PDF](http://www.perceptualedge.com/articles/visual_business_intelligence/rules_for_using_color.pdf)) > Harrower, M. A. and Brewer, C. M. (2003). ColorBrewer.org: An online > tool for selecting color schemes for maps. > _The Cartographic Journal_, 40, 27--37. [ColorBrewer web > site](https://colorbrewer2.org). The `RColopBrewer` package provides > an R interface. > Ihaka, R. (2003). Colour for presentation graphics, in K. Hornik, > F. Leisch, and A. Zeileis (eds.), [_Proceedings of the 3rd_ > _International Workshop on Distributed Statistical_ > _Computing_](https://www.r-project.org/conferences/DSC-2003/Proceedings/), > Vienna, > Austria. [PDF](https://www.r-project.org/conferences/DSC-2003/Proceedings/Ihaka.pdf). > See also the `colorspace` package and > [vignette](https://cran.r-project.org/package=colorspace/vignettes/hcl-colors.pdf). > Lumley, T. (2006). Color coding and color blindness in statistical > graphics. _ASA Statistical Computing & Graphics Newsletter_, 17(2), > 4--7. [PDF](http://stat-computing.org/newsletter/issues/scgn-17-2.pdf). > Munzner, T. (2014), _Visualization Analysis and Design_, Chapter 10. > Lisa Charlotte Muth (2021). 4-part series of blog posts on choosing > color scales. [Part > 1](https://blog.datawrapper.de/which-color-scale-to-use-in-data-vis/); > [Part > 2](https://blog.datawrapper.de/quantitative-vs-qualitative-color-scales/); > [Part > 3](https://blog.datawrapper.de/diverging-vs-sequential-color-scales/); > [Part > 4](https://blog.datawrapper.de/classed-vs-unclassed-color-scales/). > Lisa Charlotte Muth (2022). A detailed guide to colors in data vis > style guides. [Blog > post](https://blog.datawrapper.de/colors-for-data-vis-style-guides/). > Treinish, Lloyd A. "Why Should Engineers and Scientists Be Worried > About Color?." IBM Thomas J. Watson Research Center, Yorktown > Heights, NY (2009): 46. ([pdf](https://www.researchgate.net/profile/Ahmed_Elhattab2/post/Please_suggest_some_good_3D_plot_tool_Software_for_surface_plot/attachment/5c05ba35cfe4a7645506948e/AS%3A699894335557644%401543879221725/download/Why+Should+Engineers+and+Scientists+Be+Worried+About+Color_.pdf)) > Ware, C. (2012), _Information Visualization: Perception for Design_, > 3rd ed, Chapters 3 > & 4. > Zeileis, A., Murrell, P. and Hornik, K. (2009). Escaping RGBland: > Selecting colors for statistical graphics, _Computational Statistics > & Data Analysis_, 53(9), 3259-–3270 > ([PDF](https://www.zeileis.org/papers/Zeileis+Hornik+Murrell-2009.pdf)). > Achim Zeileis, Paul Murrell (2019). HCL-Based Color Palettes in > `grDevices`. [R Blog > post](https://developer.r-project.org/Blog/public/2019/04/01/hcl-based-color-palettes-in-grdevices/index.html). > Achim Zeileis et al. (2020). “colorspace: A Toolbox for Manipulating > and Assessing Colors and Palettes.” Journal of Statistical Software, > 96(1), > 1-49. [doi:10.18637/jss.v096.i01](https://doi.org/10.18637/jss.v096.i01). ## Coloring Political Statements POLITIFACT reviews the accuracy of statements by politicians and publishes [summaries of the results](https://www.politifact.com/personalities/michele-bachmann/). A [2016 post](https://www.dailykos.com/stories/2016/8/7/1556666/-Three-lessons-from-the-rise-of-Donald-Trump) on [Daily Kos](https://www.dailykos.com/) included a [visualization](https://images.dailykos.com/images/283152/large/dataviz_robert_mann.png?1470323514) of the results for a number of politicians. Kaiser Fung posted a [critique](https://junkcharts.typepad.com/junk_charts/2017/04/what-does-lying-politicians-have-in-common-with-rainbow-colors.html) at [JunkCharts](https://junkcharts.typepad.com) and proposed an alternative. I scraped the data as of April 11, 2017, from POLITIFACT; they are available [here](https://stat.uiowa.edu/~luke/data/polfac.dat). ```{r, class.source = "fold-hide"} if (! file.exists("polfac.dat")) download.file("https://stat.uiowa.edu/~luke/data/polfac.dat", "polfac.dat") pft <- read.table("polfac.dat") vcp <- prop.table(as.matrix(pft), 1)[, 6 : 1] colnames(vcp) <- gsub("\\.", " ", colnames(vcp)) head(vcp) ``` The Daily Kos chart is ordered by the percentage of statements that are more false than true. A function to produce a bar chart with a specified color palette: ```{r, class.source = "fold-hide"} ## lattice version polbars <- function(col = cm.colors(6)) { barchart(vcp[order(rowSums(vcp[, 1 : 3])), ], auto.key = TRUE, par.settings = list(superpose.polygon = list(col = col))) } ## ggplot version gvcp <- as.data.frame(vcp) %>% rownames_to_column("Name") %>% pivot_longer(-1, names_to = "Grade", values_to = "prop") %>% mutate(Grade = fct_rev(fct_inorder(Grade))) nm <- mutate(gvcp, Grade = ordered(Grade)) %>% filter(Grade <= "Half True") %>% group_by(Name) %>% summarize(prop = sum(prop)) %>% arrange(desc(prop)) %>% pull(Name) gvcp <- mutate(gvcp, Name = factor(Name, nm)) pvcp <- ggplot(gvcp, aes(Name, prop, fill = Grade)) + geom_col(position = "fill", width = 0.7) + coord_flip() + theme(legend.position = "top", plot.margin = margin(r = 50), legend.text = element_text(size = 10)) + scale_y_continuous(labels = scales::percent, expand = c(0, 0)) + guides(fill = guide_legend(title = NULL, nrow = 1, reverse = TRUE)) + labs(x = "", y = "") polbars <- function(col = cm.colors(6)) pvcp + scale_fill_manual(values = rev(col)) polbars() ``` The original Daily Kos chart seems to use a slightly modified version of the Color Brewer `Spectral` palette, a diverging palette. ```{r, class.source = "fold-hide"} polbars(brewer.pal(6, "Spectral")) dkcols <- brewer.pal(6, "Spectral") dkcols[4] <- "lightgrey" polbars(dkcols) ``` The JunkCharts plot uses another diverging palette, close to the `Blue-Red ` palette available in `hclwizard`. ```{r, class.source = "fold-hide"} rwbcols <- c("#4A6FE3", "#8595E1", "#B5BBE3", "#E2E2E2", "#E6AFB9", "#E07B91", "#D33F6A") polbars(rwbcols) polbars(rev(rwbcols)) ``` Another diverging palette: ```{r, class.source = "fold-hide"} polbars(brewer.pal(7, "PiYG")) ``` A sequential palette: ```{r, class.source = "fold-hide"} polbars(rev(brewer.pal(6, "Oranges"))) ``` ## Reading Section [_Perception and Data Visualization_](https://socviz.co/lookatdata.html#perception-and-data-visualization) in [_Data Visualization_](https://socviz.co/). Chapter [_Color scales_](https://clauswilke.com/dataviz/color-basics.html) in [_Fundamentals of Data Visualization_](https://clauswilke.com/dataviz/). ## Exercises 1. A color can be specified in hexadecimal notation. Given such a color specification you can find out what it looks like by using it in a simple plot, or by using the Google color picker. Which of the following best describe the color `#B22222`? a. a shade of green b. a shade of blue c. orange d. a shade of red 2. The following shows how to view the colors in the `RColorBrewer` palette named `Reds` with 7 colors: ```{r, fig.cap = ""} library(RColorBrewer) display.brewer.pal(7, "Reds") ``` Which of the following `RColorBrewer` palettes is diverging? a. `Blues` b. `PuRd` c. `Set1` d. `RdGy`