Some Other Topics

Some topics we did not have time to look at:

Visualizing Uncertainty: Hurricanes

All estimates from data are associated with some degree of uncertainty.

Effectively communicating that uncertainty in visualizations is challenging and an active area of research.

The cone of uncertainty: (From Cairo (2019); images from a blog post by the author.)

The NHC forecast cone is designed so that two-thirds of historical official forecast errors over a 5-year sample fall within the cone for a particular time point..

When published in the media these visualizations are routinely misinterpreted something like this:

A more effective representation might be something like this, showing an ensemble of possible tracks:

An animated version may be more effective, if the presentation medium permits.

Developing better visualizations for hurricane forecasting, especially targeting the public, is an active area of research.

Visualizing Uncertainty: Chocolate Bars

Expert ratings, on a scale from 0 to 5, for chocolate bars manufactured in several countries:

The standard deviations of the data distributions are comparable, but the lengths of confidence intervals for the mean vary because of the different sample sizes:

The same plot with a reduced horizontal range:

A more elaborate display with confidence intervals at several levels:

Confidence densities, or confidence distributions, as proposed in

Adrian W. Bowman. Graphics for Uncertainty. J. R. Statist. Soc. A 182:1-16, 2018. Link

One drawback of all of these methods:

The least precise measurement draws the most attention.

These examples from Wilke’s book use the ungeviz package available on GitHub.

Another package providing some tools for uncertainty visualization is ggdist package.

Visualizing Uncertainty: Old Cars

Using the very old mtcars data set to illustrate estimating a smooth relationship:

A default geom_smooth shows an estimate along with a point-wise confidence band.

This may not give the best sense of the joint uncertainty: if the curve is higher on some places it may need to be lower in others.

Showing an ensemble of curves that all are plausible can be a better choice.

This approach was shown earlier for visualizing possible hurricane paths.

This ensemble is generated using a case-based bootstrap.

These plots are called ensemble plots (also spaghetti plots, for obvious reasons).

If animation is available, an alternative is to show the curves one at a time in an animation.

Again, a bootstrap is used to produce the estimates.

This is an example of a hypothetical outcomes plot, or HOP, as introduced in

Hullman, Jessica, Paul Resnick, and Eytan Adar. “Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering.” PLOS ONE 10, no. 11 (2015).

Data Quality and Integrity

A visualization can accurately reflect data but still be misleading if the data are faulty.

A NY Times article from May 2021 shows a choropleth map of the estimated share of adults who would “definitely” or “probably” get the COVID-19 vaccine.

Cutoffs: 49 60 65 70 75 80 91 %

The map may accurately reflect the estimates, but the estimates have obvious problems.

The data used for the map are available here.

Discussions on social media suggest that the state level data may be more reasonable:

Data Science Ethics

Some issues:

Some references:

Plot Annotation, Plot Ensembles, and Dashboards

Plot annotations can create popout and help focus the viewer’s attention.

They may be increasingly important as images are shared on line without context.

Here is an examples for the mpg data:

Plot Ensembles: Coffee

It is often useful to use several graphics to present an analysis.

Collections of related graphs are sometimes called ensemble graphics.

On line presentations of analyses involving multiple visualizations and, typically, some interactive features are also called dashboards.

To aid the viewer it is usually best to design these visualizations together, with common axis choices and color mappings.

Fig 12.1 in Unwin (2015) provides a simple example:

library(ggplot2)
library(GGally)
library(gridExtra)

coffee_thm <- theme(text = element_text(size = 14))

data(coffee, package = "pgmm")
coffee <- within(coffee, Type <- ifelse(Variety == 1,
                                        "Arabica", "Robusta"))
names(coffee) <- abbreviate(names(coffee), 8)
a <- ggplot(coffee, aes(x = Type)) + geom_bar(aes(fill = Type)) +
    scale_fill_manual(values = c("grey70", "red")) +
    guides(fill = "none") + ylab("") +
    coffee_thm
b <- ggplot(coffee, aes(x = Fat, y = Caffine, colour = Type)) +
    geom_point(size = 3) +
    scale_colour_manual(values = c("grey70", "red")) +
    coffee_thm
c <- ggparcoord(coffee[order(coffee$Type), ], columns = 3 : 14,
                groupColumn = "Type", scale = "uniminmax") +
    xlab("") + ylab("") +
    theme(legend.position = "none") +
    scale_colour_manual(values = c("grey", "red")) +
    theme(axis.ticks.y = element_blank(),
          axis.text.y = element_blank()) +
    coffee_thm
grid.arrange(arrangeGrob(a, b, ncol = 2, widths = c(1, 2)),
             c, nrow = 2)

A dashboard with three plots. A bar chart shows there are about 4 times as many Arabica samples ad Rubusta samples. A scatterplot of Caffeine against Fat content shows clear separation of the two groups. A parallel coordinates plot shows the 12 values measured on each group.

Data on the chemical composition of coffee samples collected from around the world, comprising 43 samples from 29 countries. Each sample is either of the Arabica or Robusta variety. Twelve of the thirteen chemical constituents reported in the study are given. The omitted variable is total chlorogenic acid; it is generally the sum of the chlorogenic, neochlorogenic and isochlorogenic acid values.

Streuli, H. (1973). Der heutige stand der kaffeechemie. In Association Scientifique International du Cafe, 6th International Colloquium on Coffee Chemisrty, Bogata, Columbia, pp. 61-72.

Making a Point and Telling a Story

In a report, make sure each plot has a point and makes its point.

Make sure to think about:

It is often good to make sure a figure can stand on its own without asking the reader to search the text for explanations.

Communicating with data is like telling a story, with a starting point, a journey, and an end.

Sometimes a single visualization can capture the full story.

More often, several visualizations will be needed.

Often it is good to:

With multiple visualizations it is good make sure that:

There is a chapter of Wilke, 2019 with more advice on this.

A recent book length treatment is

Deborah Nolan and Sara Stoudt (2021) Communicating with Data, Oxford Univerity Press.

Wrapping Up

Some of the areas we covered:

Visualization

Many different types of graphs.

  • Strengths, weaknesses.
  • Pitfalls.
  • Scalability.
  • Creating these graphs in R.

Perception

  • Channels and mappings; relative effectiveness.
  • Using to assess, design visualizations.
  • Effective use of color.

A little on interaction, animation.

Emphasis on techniques useful for exploration, scientific reporting.

Data Technologies

Reading different data formats.

Scraping data from the web.

Cleaning data.

Rearranging data for analysis.

Merging data from several sources.

Reproducible research tools

rmarkdown for integrating code and reporting.

Version control, git, GitLab.

Learning More

Class notes will remain available, in some form, at the class web site.

Some books to look at:

Some blogs to check out:

Keep a critical eye out for good (and not so good) uses of data visualization in the media.

---
title: "Final Notes"
output:
  html_document:
    toc: yes
    code_folding: show
    code_download: true
---

```{r setup, include = FALSE}
source(here::here("setup.R"))
options(htmltools.dir.version = FALSE)
library(ggplot2)
knitr::opts_chunk$set(collapse = TRUE, class.source = "fold-hide",
                      message = FALSE, fig.align = "center")
library(tidyverse)
theme_set(theme_minimal() +
          theme(text = element_text(size = 16)) +
          theme(panel.border = element_rect(color = "grey30", fill = NA)))
set.seed(12345)
```


## Some Other Topics

Some topics we did not have time to look at:

* Working with models ([Chapter 6 in Healy,
  2018](https://socviz.co/modeling.html); [Chapter 25 in
  R4DS](https://r4ds.had.co.nz/many-models.html)).

* [Visualizing missing
  values](http://naniar.njtierney.com/articles/naniar-visualisation.html).

* Visualizing uncertainty ([Chapter 16 of Wilke,
  2019](https://clauswilke.com/dataviz/visualizing-uncertainty.html)
  and [below](#visualizing-uncertainty))

* Plot annotation, plot ensembles, and dashboards. ([Part II of Wilke,
  2019](https://clauswilke.com/dataviz/proportional-ink.html);
  [Chapter 5 of Healy, 2018](https://socviz.co/workgeoms.html);
  [below](#plot-annotation-plot-ensembles-and-dashboards)).

* Data Science Ethics ([below](#data-science-ethics)).


## Visualizing Uncertainty: Hurricanes

All estimates from data are associated with some degree of uncertainty.

Effectively communicating that uncertainty in visualizations is
challenging and an active area of
[research](http://space.ucmerced.edu/chapter).

The _cone of uncertainty_: (From Cairo (2019); images from a [blog
post](http://www.thefunctionalart.com/2020/01/all-graphics-from-how-charts-lie-freely.html)
by the author.)

<!--
https://www.dropbox.com/sh/d1kb0jdrhkb43j9/AADTBfRvAh-mxmSxBRNZpLJja/5.CHAPTER5?dl=0&preview=PDF10.Tropicalstorm.pdf&subfolder_nav_tracking=1
-->
```{r, echo = FALSE}
knitr::include_graphics(IMG("PDF10.Tropicalstorm.png"))
```

The [NHC forecast cone](https://www.nhc.noaa.gov/aboutcone.shtml) is
designed so that two-thirds of historical official forecast errors over
a 5-year sample fall within the cone for a particular time point..

When published in the media these visualizations are routinely
misinterpreted something like this:

<!--
https://www.dropbox.com/sh/d1kb0jdrhkb43j9/AADTBfRvAh-mxmSxBRNZpLJja/5.CHAPTER5?dl=0&preview=PDF11.StormWRONGSize.pdf&subfolder_nav_tracking=1
-->

```{r, echo = FALSE}
knitr::include_graphics(IMG("PDF11.StormWRONGSize.png"))
```

A more effective representation might be something like this, showing
an _ensemble_ of possible tracks:

<!--
https://www.dropbox.com/sh/d1kb0jdrhkb43j9/AADTBfRvAh-mxmSxBRNZpLJja/5.CHAPTER5?dl=0&preview=PDF13.StormLines.pdf&subfolder_nav_tracking=1
-->

```{r, echo = FALSE}
knitr::include_graphics(IMG("PDF13.StormLines.png"))
```

An animated version may be more effective, if the presentation medium
permits.

Developing better visualizations for hurricane forecasting, especially
targeting the public, is an active area of research.


## Visualizing Uncertainty: Chocolate Bars

[Expert ratings](http://flavorsofcacao.com), on a scale from 0 to 5,
for chocolate bars manufactured in several countries:

```{r, echo = FALSE}
data(cacao, package = "dviz.supp")
library(colorspace)
countries <- c("U.S.A.", "Austria", "Belgium", "Canada", "Peru", "Switzerland")

col80 <- desaturate(darken("#0072B2", .2), .3)
col95 <- desaturate(lighten("#0072B2", .2), .3)
col99 <- desaturate(lighten("#0072B2", .4), .3)
colP <- col95
colM <- "#D55E00"

c1 <- filter(cacao, location %in% countries)
c1sums <- group_by(c1, location) %>%
    summarize(m = mean(rating),
              s = sd(rating),
              n = n()) %>%
    ungroup()

c1CI <- mutate(data.frame(level = c(0.8, 0.95, 0.99)),
               df = lapply(level,
                           function(lev)
                               with(c1sums, {
                                   h <- s * qt(1 - (1 - lev) / 2, n - 1) /
                                       sqrt(n)
                                   cbind(c1sums, data.frame(xmin = m - h,
                                                            xmax = m + h))
                               }))) %>%
    unnest("df")

ggplot(c1, aes(rating, reorder(location, rating))) +
    geom_point(position = position_jitter(height = 0.3, width = 0.05),
               size = 0.5, color = colP) +
    ##geom_point(aes(m, location), data = c1sums, size = 2.5, color = colM) +
    geom_segment(aes(x = m, xend = m,
                     y = as.integer(reorder(location, m)) - 0.3,
                     yend = as.integer(reorder(location, m)) + 0.3),
                 linewidth = 2, color = colM, data = c1sums) +
    ylab("") +
    ggtitle("Ratings for Chocolate Bars", "Bars are sample means.")
```

The standard deviations of the data distributions are comparable, but
the lengths of confidence intervals for the mean vary because of the
different sample sizes:

```{r, echo = FALSE}
p <- ggplot(filter(c1CI, level == 0.95),
            aes(m, reorder(location, m), xmin = xmin, xmax = xmax)) +
    geom_errorbarh(height = 0) +
    geom_point(size = 2.5, color = colM) +
    ylab("") +
    ggtitle("Confidence Intervals for the Mean", "Confidence level 95%")

p + scale_x_continuous(limits = c(1, 4), name = "mean rating")
```

The same plot with a reduced horizontal range:

```{r, echo = FALSE}
p + scale_x_continuous(limits = c(2.5, 3.8), name = "mean rating")
```

A more elaborate display with confidence intervals at several levels:

```{r, echo = FALSE}
#| warning: false
## based on code for Wilke's Fig. 16.7
arrange(c1CI, desc(level)) %>%
    mutate(level = paste0(100 * level, "%"),
           location = reorder(location, m)) %>%
    ggplot(aes(m, location, xmin = xmin, xmax = xmax)) +
    geom_errorbarh(aes(size = level, color = level), height = 0) +
    geom_errorbarh(aes(color = level), height = 0.1) +
    geom_point(size = 2.5, color = colM) +
    scale_x_continuous(limits = c(2.5, 3.8), name = "mean rating") +
    scale_size_manual(name = "confidence level",
                      values = c(`80%` = 2.25, `95%` = 1.5, `99%` = 0.75),
                      guide = guide_legend(direction = "horizontal",
                                           title.position = "top",
                                           label.position = "bottom")) +
    scale_color_manual(name = "confidence level",
                       values = c(`80%` = col80, `95%` = col95, `99%` = col99),
                       guide = guide_legend(direction = "horizontal",
                                            title.position = "top",
                                            label.position = "bottom")) +

    theme(legend.position = c(1, 0.01), legend.justification = c(1, 0)) +
    ylab("") +
    ggtitle("Confidence Intervals for the Mean")
```

Confidence densities, or confidence distributions, as proposed in

> Adrian W. Bowman. Graphics for Uncertainty. J. R. Statist. Soc. A
> 182:1-16, 2018. [Link](https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssa.12379)

```{r, echo = FALSE}
## based on code for Wilke's Fig. 16.9 (e)
library(ungeviz)
ggplot(filter(c1CI, level == 0.95),
       aes(x = m, y = reorder(location, m))) +
    stat_confidence_density(aes(moe = xmax - m, fill = after_stat(ndensity)),
                            height = 0.7, confidence = 0.95, alpha = NA,
                            na.rm = TRUE) +
    geom_segment(aes(x = m, xend = m,
                     y = as.integer(reorder(location, m)) - 0.35,
                     yend = as.integer(reorder(location, m)) + 0.35),
                 size = 2, color = colM) +
    scale_fill_gradient(low = "#81A7D600", high = "#345A7FD0") +
    scale_x_continuous(limits = c(2.5, 3.8), name = "mean rating") +
    ylab("")
```

One drawback of all of these methods:

> The least precise measurement draws the most attention.

These examples from Wilke's book use the [`ungeviz`
package](https://github.com/wilkelab/ungeviz) available on GitHub.

Another package providing some tools for uncertainty visualization is
[`ggdist` package](https://mjskay.github.io/ggdist/).


## Visualizing Uncertainty: Old Cars

Using the very old `mtcars` data set to illustrate estimating a smooth
relationship:

```{r, message = FALSE, echo = FALSE}
p <- ggplot(mtcars, aes(disp, mpg)) +
    geom_point()

p + geom_smooth()
```

A default `geom_smooth` shows an estimate along with a point-wise
confidence band.

This may not give the best sense of the joint uncertainty: if the curve
is higher on some places it may need to be lower in others.

Showing an _ensemble_ of curves that all are plausible can be a better
choice.

```{r, echo = FALSE, message = FALSE}
mts <- lapply(seq_len(10),
              function(i) mutate(sample_frac(mtcars, 1, replace = TRUE),
                                 sample = i)) %>%
    bind_rows

p2 <- p +
    geom_smooth(color = NA) +
    geom_smooth(aes(group = sample),
                se = FALSE, size = 0.3, color = "#3366FF", data = mts)
p2
```

This approach was shown earlier for visualizing possible hurricane paths.

This ensemble is generated using a _case-based bootstrap_.

These plots are called _ensemble plots_ (also spaghetti plots, for
obvious reasons).

If animation is available, an alternative is to show the curves one at
a time in an animation.

```{r, cache = TRUE, message = FALSE, echo = FALSE}
library(gganimate)
animate(p2 +
        transition_states(sample, transition_length = 2, state_length = 1))
```

Again, a bootstrap is used to produce the estimates.

This is an example of a _hypothetical outcomes plot_, or _HOP_, as
introduced in

> Hullman, Jessica, Paul Resnick, and Eytan Adar. "Hypothetical
> outcome plots outperform error bars and violin plots for inferences
> about reliability of variable ordering." PLOS ONE 10, no. 11 (2015).


## Data Quality and Integrity

A visualization can accurately reflect data but still be misleading if
the data are faulty.

A [NY Times
article](https://www.nytimes.com/2021/05/03/health/covid-herd-immunity-vaccine.html)
from May 2021 shows a choropleth map of the estimated share of adults
who would "definitely" or "probably" get the COVID-19 vaccine.

Cutoffs: 49  60   65  70  75  80  91 %

```{r, echo = FALSE, out.width = 500}
knitr::include_graphics(IMG("map-1050.png"))
```

The map may accurately reflect the estimates, but the estimates have
obvious problems.

The data used for the map are available
[here](https://aspe.hhs.gov/pdf-report/vaccine-hesitancy).

Discussions on social media suggest that the state level data may be
more reasonable:

<!-- ## nolint start: line_length -->
<!-- https://twitter.com/ct_bergstrom/status/1390509298388660231?s=11 -->

```{r, echo = FALSE, fig.width = 10, fig.height = 10}
library(tidyverse)
data <- read.csv("http://www.stat.uiowa.edu/~luke/data/VaccineHesitancy-2021-04-06.csv") %>%
    setNames(c("fips", "state", "Hesitant", "StronglyHesitant")) %>%
    mutate(across(3:4, parse_number)) %>%
    mutate(Willing = 100 - Hesitant - StronglyHesitant)

map <- usmap::us_map() %>%
    mutate(fips = as.numeric(fips))

map_data <- left_join(map, data, "fips")

ggplot(map_data,
       aes(x, y,
           group = group,
           fill = Willing)) +
    geom_polygon(color = "black") +
    coord_fixed() +
    ggthemes::theme_map() +
    scale_fill_distiller(palette = "Purples",
                         direction = 1,
                         labels = scales::label_percent(scale = 1),
                         guide = guide_colorbar(title.hjust = 0.5,
                                                title.position = "top")) +
    theme(legend.position = "top",
          legend.justification = "center",
          legend.title.align = 0.5,
          plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5)) +
    labs(title = "Uneven Willingness to Get Vaccinated Could Affect Herd Immunity",
         subtitle = "In some parts of the United States, inoculation rates may not reach the threshold needed\nto prevent the coronavirus from spreading easily.",
         caption = "Data: https://aspe.hhs.gov/pdf-report/vaccine-hesitancy",
         fill = "Estimated share of adults who would\n\"definitely\" or \"probably\" get the vaccine")
```
<!-- ## nolint end -->

<!-- not being able to center a long legend tilte seems to be a
current ggplot bug -->

<div id="data-science-ethics"></div>

## Data Science Ethics

Some issues:

* Data misrepresentation

* Data falsification

* Data privacy

* Data scraping and terms of use

* Algorithmic bias

Some references:

<!-- https://arxiv.org/abs/1908.06166 -->

* [Data science
  ethics](https://mdsr-book.github.io/mdsr2e/ch-ethics.html) chapter
  in: Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton
  (2021)  
  [_Modern Data Science with R, 2nd edition_](https://mdsr-book.github.io/mdsr2e/).

* [Data science ethics](https://datasciencebox.org/02-ethics.html) section of
  the online book
  [Data Science in a Box](https://datasciencebox.org/index.html)
  by Mine Çetinkaya-Rundel..

* Alberto Cairo (2019) _How Charts Lie: Getting Smarter about Visual
  Information_, W. W. Norton & Company.

<div id="plot-annotation-plot-ensembles-and-dashboards"></div>


## Plot Annotation, Plot Ensembles, and Dashboards

Plot annotations can create popout and help focus the viewer's
attention.

They may be increasingly important as images are shared on line
without context.

Here is an examples for the `mpg` data:

<!-- ## nolint start: line_length -->
```{r, echo = FALSE}
#| warning: false
library(ggforce)
ggplot(mpg, aes(displ, hwy)) +
    geom_point() +
    geom_mark_hull(aes(filter = class == "2seater"),
                   fill = "blue",
                   description = "2-Seaters have high displacement values, but also high fuel efficiency for their displacement.") +
    geom_mark_rect(aes(filter = hwy > 40),
                   fill = "green",
                   description = "These are Volkswagens") +
    geom_mark_circle(aes(filter = hwy == 12),
                     fill = "red",
                     description = "Three pickups and an suv.")
```
<!-- ## nolint end -->


## Plot Ensembles: Coffee

It is often useful to use several graphics to present an analysis.

Collections of related graphs are sometimes called _ensemble graphics_.

On line presentations of analyses involving multiple visualizations
and, typically, some interactive features are also called
_dashboards_.

To aid the viewer it is usually best to design these visualizations
together, with common axis choices and color mappings.

Fig 12.1 in Unwin (2015) provides a simple example:

<!-- ## nolint start: line_length -->
```{r, out.width= "65%", out.extra='style="float:right; padding:10px"'}
#| fig.width: 9
#| fig.height: 6.5
#| fig-retina: true
#| fig-alt: "A dashboard with three plots. A bar chart shows there are about 4 times as many Arabica samples ad Rubusta samples. A scatterplot of Caffeine against Fat content shows clear separation of the two groups. A parallel coordinates plot shows the 12 values measured on each group."

library(ggplot2)
library(GGally)
library(gridExtra)

coffee_thm <- theme(text = element_text(size = 14))

data(coffee, package = "pgmm")
coffee <- within(coffee, Type <- ifelse(Variety == 1,
                                        "Arabica", "Robusta"))
names(coffee) <- abbreviate(names(coffee), 8)
a <- ggplot(coffee, aes(x = Type)) + geom_bar(aes(fill = Type)) +
    scale_fill_manual(values = c("grey70", "red")) +
    guides(fill = "none") + ylab("") +
    coffee_thm
b <- ggplot(coffee, aes(x = Fat, y = Caffine, colour = Type)) +
    geom_point(size = 3) +
    scale_colour_manual(values = c("grey70", "red")) +
    coffee_thm
c <- ggparcoord(coffee[order(coffee$Type), ], columns = 3 : 14,
                groupColumn = "Type", scale = "uniminmax") +
    xlab("") + ylab("") +
    theme(legend.position = "none") +
    scale_colour_manual(values = c("grey", "red")) +
    theme(axis.ticks.y = element_blank(),
          axis.text.y = element_blank()) +
    coffee_thm
grid.arrange(arrangeGrob(a, b, ncol = 2, widths = c(1, 2)),
             c, nrow = 2)
```
<!-- ## nolint end -->

Data on the chemical composition of coffee samples collected from
around the world, comprising 43 samples from 29 countries. Each sample
is either of the Arabica or Robusta variety. Twelve of the thirteen
chemical constituents reported in the study are given.  The omitted
variable is total chlorogenic acid; it is generally the sum of the
chlorogenic, neochlorogenic and isochlorogenic acid values.

Streuli, H. (1973). Der heutige stand der kaffeechemie. In
_Association Scientifique International du Cafe, 6th International
Colloquium on Coffee Chemisrty_, Bogata, Columbia, pp.  61-72.


## Making a Point and Telling a Story

In a report, make sure each plot has a point and makes its point.

Make sure to think about:

* axis labels;

* titles and subtitles;

* captions;

* highlighting key features; <!-- gghighlight -->

* accessibility (e.g. color choice; alt-text).

It is often good to make sure a figure can stand on its own
without asking the reader to search the text for explanations.

Communicating with data is like telling a story, with a starting
point, a journey, and an end.

Sometimes a single visualization can capture the full story.

More often, several visualizations will be needed.

Often it is good to:

* start with a high level overview;

* show how to look at some particular cases, e.g. with a single plot;

* build up to a more complete analysis, e.g. with a multi-panel plot.

With multiple visualizations it is good make sure that:

* each one works well on its own;

* they work well together (e.g. use consistent styling, colors).

There is a [chapter of Wilke,
2019](https://clauswilke.com/dataviz/telling-a-story.html) with more
advice on this.

A recent book length treatment is

> Deborah Nolan and Sara Stoudt (2021) _Communicating with Data_,
> Oxford Univerity Press.


## Wrapping Up

Some of the areas we covered:


### Visualization

Many different types of graphs.

* Strengths, weaknesses.
* Pitfalls.
* Scalability.
* Creating these graphs in R.

Perception

* Channels and mappings; relative effectiveness.
* Using to assess, design visualizations.
* Effective use of color.

A little on interaction, animation.

Emphasis on techniques useful for exploration, scientific reporting.


### Data Technologies

Reading different data formats.

Scraping data from the web.

Cleaning data.

Rearranging data for analysis.

Merging data from several sources.


### Reproducible research tools

`rmarkdown` for integrating code and reporting.

Version control, `git`, `GitLab`.


## Learning More

Class notes will remain available, in some form, at the class web site.

Some books to look at:

* Alberto Cairo (2019) _How Charts Lie: Getting Smarter about Visual
  Information_, W. W. Norton & Company.

* Claus O. Wilke (2019) [_Fundamentals of Data
  Visualization_](https://clauswilke.com/dataviz/), O’Reilly,
  Inc. ([Book source on
  GitHub](https://github.com/clauswilke/dataviz); [supporting
  materials on GitHub](https://github.com/clauswilke/dviz.supp))

* Kieran Healy (2018) [_Data Visualization: A practical
  introduction_](https://socviz.co/), Princeton

* Winston Chang (2018) [_R Graphics Cookbook_, 2nd
  edition](https://r-graphics.org/), O’Reilly. ([Book source on
  GitHub](https://github.com/wch/rgcookbook))

Some blogs to check out:

* [Junk Charts](https://junkcharts.typepad.com/)

* [The Functional Art Blog](http://www.thefunctionalart.com/)

* [Flowing Data](https://flowingdata.com/)

Keep a critical eye out for good (and not so good) uses of data
visualization in the media.
