Visualizing Proportions

class: center, middle, title-slide

.title[
# Visualizing Proportions
]
.author[
### Luke Tierney
]
.institute[
### University of Iowa
]
.date[
### 2024-03-18
]

---

layout: true

## Categorical Data
---

Categorical data can be

* nominal, qualitative

* ordinal

For visualization, the main difference is that ordinal data suggests a
particular display order.

Purely categorical data can come in a range of formats.

The most common are

* raw data: individual observations;

* aggregated data: counts for each unique combination of levels;

* cross-tabulated data.

---

### Raw Data

.pull-left[
Raw data for a survey of individuals that records hair color, eye
color, and gender of 592 individuals might look like this:
]

.pull-right[

```r
head(raw)
##    Hair   Eye    Sex
## 1 Brown  Blue   Male
## 2 Brown Brown   Male
## 3 Brown Hazel   Male
## 4 Blond Green Female
## 5 Brown Brown Female
## 6 Brown Hazel   Male
```
]

---

### Aggregated Data

.pull-left[
One way to aggregate raw categorical data is to use `count` from `dplyr`:
]

.pull-right[

```r
library(dplyr)
agg <- count(raw, Hair, Eye, Sex)
head(agg)
##    Hair   Eye    Sex  n
## 1 Black Brown   Male 32
## 2 Black Brown Female 36
## 3 Black  Blue   Male 11
## 4 Black  Blue Female  9
## 5 Black Hazel   Male 10
## 6 Black Hazel Female  5
```
]

<!--
The `count_` function from `dplyr` allows the variables to use to be
read from the data:

```r
agg <- count_(raw, names(raw))
head(agg)
```

Apparently the "modern" way to do this is

```r
count(raw, !!! syms(names(raw)))
##     Hair   Eye    Sex  n
## 1  Black Brown   Male 32
## 2  Black Brown Female 36
## 3  Black  Blue   Male 11
## 4  Black  Blue Female  9
## 5  Black Hazel   Male 10
## 6  Black Hazel Female  5
## 7  Black Green   Male  3
## 8  Black Green Female  2
## 9  Brown Brown   Male 53
## 10 Brown Brown Female 66
## 11 Brown  Blue   Male 50
## 12 Brown  Blue Female 34
## 13 Brown Hazel   Male 25
## 14 Brown Hazel Female 29
## 15 Brown Green   Male 15
## 16 Brown Green Female 14
## 17   Red Brown   Male 10
## 18   Red Brown Female 16
## 19   Red  Blue   Male 10
## 20   Red  Blue Female  7
## 21   Red Hazel   Male  7
## 22   Red Hazel Female  7
## 23   Red Green   Male  7
## 24   Red Green Female  7
## 25 Blond Brown   Male  3
## 26 Blond Brown Female  4
## 27 Blond  Blue   Male 30
## 28 Blond  Blue Female 64
## 29 Blond Hazel   Male  5
## 30 Blond Hazel Female  5
## 31 Blond Green   Male  8
## 32 Blond Green Female  8
```
-->

---

### Cross-Tabulated Data

.pull-left[
Cross-tabulated data can be produced from aggregate data using `xtabs`:
]

.pull-right[

```r
xtabs(n ~ Hair + Eye + Sex, data = agg)
## , , Sex = Male
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8
## 
## , , Sex = Female
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8
```
]

---

.pull-left[
Cross-tabulated data can be produced from raw data using `table`:
]

.pull-right[

```r
xtb <- table(raw)
xtb
## , , Sex = Male
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    32   11    10     3
##   Brown    53   50    25    15
##   Red      10   10     7     7
##   Blond     3   30     5     8
## 
## , , Sex = Female
## 
##        Eye
## Hair    Brown Blue Hazel Green
##   Black    36    9     5     2
##   Brown    66   34    29    14
##   Red      16    7     7     7
##   Blond     4   64     5     8
```
]

---

.pull-left[
Both raw and aggregate data in this example are in _tidy_ form; the
cross-tabulated data is not.
{{content}}
]
--

Cross-tabulated data on `$p$` variables is arranged in a `$p$`-way array.
{{content}}
--

The cross-tabulated data can be converted to the tidy aggregate form
using `as.data.frame`:

.pull-right[

```r
class(xtb)
## [1] "table"
head(as.data.frame(xtb))
##    Hair   Eye  Sex Freq
## 1 Black Brown Male   32
## 2 Brown Brown Male   53
## 3   Red Brown Male   10
## 4 Blond Brown Male    3
## 5 Black  Blue Male   11
## 6 Brown  Blue Male   50
```
{{content}}
]
--

The variable `xtb` corresponds to the data set `HairEyeColor` in the
`datasets` package,

---

### Working With Categorical Variables

Categorical variables are usually represented as:

* character vectors

* factors.

Some advantages of factors:

* more control over ordering of levels

* levels are preserved when forming subsets

* levels can reflect possible values not present in the data

Most plotting and modeling functions will convert character vectors to
factors with levels ordered alphabetically.

---

Some standard R functions for working with factors include

* `factor` creates a factor from another type of variable
* `levels` returns the levels of a factor
* `reorder` changes level order to match another variable
* `relevel` moves a particular level to the first position as a base line
* `droplevels` removes levels not in the variable.

The `tidyverse` package `forcats` adds some more tools, including

* `fct_inorder` creates a factor with levels ordered by first appearance
* `fct_infreq` orders levels by decreasing frequency
* `fct_rev` reverses the levels
* `fct_recode` changes factor levels
* `fct_relevel` moves one or more levels
* `fct_c` merges two or more factors
* `fct_collapse` merge some factor levels

---
layout: true
## Bar Charts For Frequencies
---

### Basics

.pull-left[
A bar chart is often used to show the frequencies of a categorical
variable.
{{content}}
]
--

By default, `geom_bar` uses `stat = "count"` and maps its result to
the `y` aesthetic.
{{content}}
--

This is suitable for raw data:

.pull-right[
.hide-code[

```r
thm <- theme_minimal() +
    theme(text = element_text(size = 16))
ggplot(raw) +
    geom_bar(aes(x = Hair),
             fill = "deepskyblue3") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
For a nominal variable it is often better to order the bars by
decreasing frequency:
]

.pull-right[
.hide-code[

```r
library(forcats)
ggplot(mutate(raw,
              Hair = fct_infreq(Hair))) +
    geom_bar(aes(x = Hair),
             fill = "deepskyblue3") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
If the data have already been aggregated, then you need to either specify
`stat = "identity"` as well as the variable containing the counts as
the `y` aesthetic, or use `geom_col`:
]

.pull-right[
.hide-code[

```r
ggplot(agg) +
    geom_col(aes(x = Hair,
                 y = n),
             fill = "deepskyblue3") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
For aggregated data, reordering can be based on the computed counts
using

```r
agg_ord <-
    mutate(agg,
           Hair = reorder(Hair, -n, sum))
```
{{content}}
]
--

* `-n` is used to order largest to smallest;
{{content}}
--

* the default summary used by `reorder` is `mean`; `sum` is better here.

.pull-right[
.hide-code[

```r
ggplot(agg_ord) +
    geom_col(aes(x = Hair,
                 y = n),
             fill = "deepskyblue3") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-13-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left.width-45[
### Adding a Grouping Variable

Mapping the `Eye` variable to `fill` in `ggplot` produces a _stacked
bar chart_.
{{content}}
]
--

An alternative, specified with `position = "dodge"`, is a _side by
side_ bar chart, or a _clustered_ bar chart.
{{content}}
--

For the side by side chart in particular it may be useful to also
reorder the `Eye` color levels.

.pull-right.width-55[
.hide-code[

```r
ecols <- c(Brown = "brown2",
           Blue = "blue2",
           Hazel = "darkgoldenrod3",
           Green = "green4")
agg_ord <-
    mutate(agg,
           Hair = reorder(Hair, -n, sum),
           Eye = reorder(Eye, -n, sum))
p1 <- ggplot(agg_ord) +
    geom_col(aes(x = Hair,
                 y = n,
                 fill = Eye)) +
    scale_fill_manual(values = ecols) +
    thm
p2 <- ggplot(agg_ord) +
    geom_col(aes(x = Hair,
                 y = n,
                 fill = Eye),
             position = "dodge") +
    scale_fill_manual(values = ecols) +
    thm
(p1 + guides(fill = "none")) | p2
```

<img src="proportions_files/figure-html/unnamed-chunk-14-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left.width-45[
Faceting can be used to bring in additional variables:
]

.pull-right.width-55[
.hide-code[

```r
p1 + facet_wrap(~ Sex)
```

<img src="proportions_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" />
]
{{content}}
]
--

The counts shown here may not be the most relevant features for
understanding the joint distributions of these variables.

---
layout: true
## Pie Charts and Doughnut Charts
---

.pull-left.width-40[
_Pie charts_ go by many different names (from a [Twitter
thread](https://twitter.com/ElephantEating/status/1361039771414319106)):
]

.pull-right.width-60[
<img src="../img/pienames.jpeg" width="80%" style="display: block; margin: auto;" />
]

---

Pie charts can be viewed as stacked bar charts in polar coordinates:

.hide-code[

```r
hcols <- c(Black = "black", Brown = "brown4",
           Red = "brown1", Blond = "lightgoldenrod1")
p1 <- ggplot(agg_ord) +
    geom_col(aes(x = 1, y = n, fill = Hair), position = "fill") +
    coord_polar(theta = "y") +
    scale_fill_manual(values = hcols) +
    thm
p2 <- ggplot(agg_ord) +
    geom_col(aes(x = Hair, y = n, fill = Hair)) +
    scale_fill_manual(values = hcols) +
    thm
(p1 + guides(fill = "none")) | p2
```

<img src="proportions_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" />
]

The axes and grid lines are not helpful for the pie chart and can be
removed with some _theme_ settings.

---

Using faceting we can also separately show the distributions for men
and women:

.hide-code[

```r
pie_thm <- thm +
    theme(axis.title = element_blank(),
          axis.text = element_blank(),
          axis.ticks = element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          panel.border = element_blank())

p3 <- p1 + facet_wrap(~ Sex) + pie_thm
p3
```

<img src="proportions_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" />
]

---

_Doughnut charts_ are a variant that has recently become popular in
the media:

.hide-code[

```r
p4 <- p3 + xlim(0, 1.5)
p4
```

<img src="proportions_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" />
]

---

The center is often used for annotation:

.hide-code[

```r
p4 + geom_text(aes(x = 0, y = 0, label = Sex), size = 5) +
    theme(strip.background = element_blank(),
          strip.text = element_blank())
```

<img src="proportions_files/figure-html/unnamed-chunk-20-1.png" style="display: block; margin: auto;" />
]

---

An alternative to the polar coordinates approach uses
`geom_arc_bar` and `stat_pie` from package `ggforce`:

.hide-code[

```r
library(ggforce)
arrange(agg_ord, desc(Hair)) |>
    ggplot(aes(x0 = 0, y0 = 0, r0 = 0, r = 1, amount = n, fill = Hair)) +
    geom_arc_bar(stat = "pie", color = NA) +
    coord_fixed() +
    scale_fill_manual(values = hcols) +
    pie_thm +
    facet_wrap(~ Sex)
```

<img src="proportions_files/figure-html/unnamed-chunk-21-1.png" style="display: block; margin: auto;" />
]

---

For doughnut charts:

.hide-code[

```r
arrange(agg_ord, desc(Hair)) |>
    ggplot(aes(x0 = 0, y0 = 0, r0 = 0.4, r = 1, amount = n, fill = Hair)) +
    geom_arc_bar(stat = "pie", color = NA) +
    geom_text(aes(x = 0, y = 0, label = Sex), size = 5) +
    coord_fixed() +
    scale_fill_manual(values = hcols) +
    pie_thm +
    theme(strip.background = element_blank(),
          strip.text = element_blank()) +
    facet_wrap(~ Sex)
```

<img src="proportions_files/figure-html/unnamed-chunk-22-1.png" style="display: block; margin: auto;" />
]

---
layout: true
## Some Notes
---

Pie charts are effective for judging part/whole relationships.

Pie charts can be effective for comparing proportions to

* one half
* one quarter

Pie charts are not very effective for comparing proportions to each other.

3D pie charts are popular and a very bad idea. An example
([Fig. 6.61](https://www.dropbox.com/s/tlehzi3kb6ikbsz/6.61.3DIllustration.png?dl=0))
from Andy Kirk's book (2016),
[_Data Visualization: A Handbook for Data Driven Design_](http://book.visualisingdata.com/home):

---

Pie charts are widely used for political data.

* With the right ordering, pie charts are very good at showing
  which coalitions of parties can form a majority.

* When no one candidate earns a majority of the votes, pie charts
  do not show which candidate has earned a plurality very well.

* Good orientation and factor ordering can help.

.hide-code[

```r
elect <- geofacet::election |>
    group_by(candidate) |>
    summarize(votes = sum(votes))

p1 <- ggplot(elect) +
    geom_col(aes(x = 1, y = votes, fill = candidate), position = "fill") +
    coord_polar(theta = "y", start = -1) +
    xlim(c(-0.5, 1.5)) +
    scale_fill_manual(values = c(Trump = scales::muted("red", 50, 80),
                                 Clinton = scales::muted("blue", 50, 70),
                                 Other = "grey")) +
    pie_thm

p2 <- mutate(elect,
             candidate = factor(candidate,
                                c("Clinton", "Other", "Trump"))) |>
    ggplot() +
    geom_col(aes(x = 1, y = votes, fill = candidate), position = "fill") +
    coord_polar(theta = "y") +
    xlim(c(-0.5, 1.5)) +
    scale_fill_manual(values = c(Trump = scales::muted("red", 50, 80),
                                 Clinton = scales::muted("blue", 50, 70),
                                 Other = "grey")) +
    pie_thm

p3 <- ggplot(elect) +
    geom_col(aes(x = candidate,
                 y = 100 * (votes / sum(votes)),
                 fill = candidate)) +
    scale_fill_manual(values = c(Trump = scales::muted("red", 50, 80),
                                 Clinton = scales::muted("blue", 50, 70),
                                 Other = "grey")) +
    labs(y = "percent") +
    thm +
    theme(axis.text.x = element_blank(),
          axis.title.x = element_blank())

(p1 + guides(fill = "none")) + (p2 + guides(fill = "none")) + p3
```

<img src="proportions_files/figure-html/unnamed-chunk-24-1.png" style="display: block; margin: auto;" />
]

---
layout: true
## Some Alternatives
---

### Stacked Bar Charts

.pull-left[
Stacked bar charts with equal heights, or filled bar charts, are an
alternative for representing part-whole relationships.
{{content}}
]
--

* Top and bottom proportions are easy to compare.
{{content}}
--

* Comparing proportions to one half and one quarter is harder.

.pull-right[
.hide-code[

```r
ggplot(agg) +
    geom_col(aes(x = Sex, y = n, fill = Hair), position = "fill") +
    scale_fill_manual(values = hcols) +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-25-1.png" style="display: block; margin: auto;" />
]
]

---

### Waffle Charts

Another alternative is a _waffle chart_, sometimes also called a
_square pie chart_.

<img src="../img/waffle.png" width="50%" style="display: block; margin: auto;" />
--

The [`waffle`](https://github.com/hrbrmstr/waffle) package is one R
implementation of this idea.

Currently the development version on GitHub is needed for the
following examples.

---

Showing the counts:

.hide-code[

```r
library(waffle)
stopifnot(packageVersion("waffle") >= "1.0.1")
ggplot(arrange(agg, Hair), aes(values = n, fill = Hair)) +
    geom_waffle(n_rows = 18, flip = TRUE, color = "white", size = 0.33,
                na.rm = FALSE) +
    coord_equal() +
    facet_wrap(~ Sex) +
    scale_fill_manual(values = hcols) +
    theme_minimal() +
    theme_enhance_waffle()
```

<img src="proportions_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" />
]

---

Showing the proportions:

.hide-code[

```r
round_pct <- function(n) {
    pct <- 100 * (n / sum(n))
    nn <- floor(pct)
    if (sum(nn) < 100) {
        rem <- pct - nn
        idx <- sort(order(rem), decreasing = TRUE)[seq_len(100 - sum(nn))]
        nn[idx] <- nn[idx] + 1
    }
    nn
}

group_by(agg, Sex) |>
    mutate(pct = round_pct(n)) |>
    ungroup() |>
    ggplot(aes(values = pct, fill = Hair)) +
    geom_waffle(n_rows = 10, flip = TRUE, color = "white", size = 0.33,
                na.rm = FALSE) +
    coord_equal() +
    facet_wrap(~ Sex) +
    scale_fill_manual(values = hcols) +
    theme_minimal() +
    theme_enhance_waffle()
```

<img src="proportions_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" />
]

---
layout: true
## Population Pyramids
---

.pull-left[
Bar charts for two groups can be shown back to back.
]

.pull-right[
.hide-code[

```r
mutate(agg, Hair = reorder(Hair, n, sum)) |>
    ggplot(aes(x = ifelse(Sex == "Male", n, -n),
               y = Hair,
               fill = Sex)) +
    geom_col() +
    xlab("Count") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-31-1.png" style="display: block; margin: auto;" />
]
{{content}}
]
--

This is often used for showing age distributions by sex for
populations; the result is called a [_population
pyramid_](https://www.visualcapitalist.com/us-population-pyramid-1980-2050/).

---

.pull-left.width-40[
Age distribution data for many countries and years is available from a
[Census Bureau
website](https://www.census.gov/data-tools/demo/idb/).

Data files for 2020 for
[Germany](https://stat.uiowa.edu/~luke/data/germany-2020.csv) and
[Nigeria](https://stat.uiowa.edu/~luke/data/nigeria-2020.csv) are
available locally.

.hide-code[

```r
if (! file.exists("germany-2020.csv"))
    download.file("https://stat.uiowa.edu/~luke/data/germany-2020.csv",
                  "germany-2020.csv")
if (! file.exists("nigeria-2020.csv"))
    download.file("https://stat.uiowa.edu/~luke/data/nigeria-2020.csv",
                  "nigeria-2020.csv")

gm_pop <- read.csv("germany-2020.csv", skip = 1) |>
    filter(Age != "Total") |>
    mutate(Age = fct_inorder(Age))

ni_pop <- read.csv("nigeria-2020.csv", skip = 1) |>
    filter(Age != "Total") |>
    mutate(Age = fct_inorder(Age))
```
]
{{content}}
]

Combining the data sets allows a side by side comparison of the counts:

.pull-right.width-60[
.hide-code[

```r
library(tidyr)
pop2 <-
    bind_rows(mutate(gm_pop, Country = "Germany"),
              mutate(ni_pop, Country = "Nigeria")) |>
    select(Age,
           Male = Male.Population,
           Female = Female.Population, Country) |>
    pivot_longer(Male : Female,
                 names_to = "Sex",
                 values_to = "n")

ggplot(pop2) +
    geom_col(aes(x = ifelse(Sex == "Male", n, -n),
                 y = Age,
                 fill = Sex)) +
    facet_wrap(~ Country) +
    scale_x_continuous(
        labels = function(n) scales::comma(abs(n))) +
    xlab("Count") +
    thm +
    theme(legend.position = "top")
```

<img src="proportions_files/figure-html/unnamed-chunk-33-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left.width-40[
The different shapes are evident, but are harder to see than
they could be because of the difference in total population:

.hide-code[

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> Country </th>
   <th style="text-align:right;"> Population </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Germany </td>
   <td style="text-align:right;"> 80,159,662 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Nigeria </td>
   <td style="text-align:right;"> 214,028,302 </td>
  </tr>
</tbody>
</table>

]
{{content}}
]
--

Using a group mutate we can compute sex/age group percentages within
each country:

.pull-right.width-60[
.hide-code[

```r
group_by(pop2, Country) |>
    mutate(pct = 100 * n / sum(n)) |>
    ungroup() |>
    ggplot() +
    geom_col(aes(x = ifelse(Sex == "Male", pct, -pct),
                 y = Age,
                 fill = Sex)) +
    facet_wrap(~ Country) +
    scale_x_continuous(
        labels = function(x) scales::percent(abs(x / 100))) +
    xlab("Percent") +
    thm +
    theme(legend.position = "top")
```

<img src="proportions_files/figure-html/unnamed-chunk-35-1.png" style="display: block; margin: auto;" />
]
]

---
layout: false
## Multiple Categorical Variables

Visualizing the distribution of multiple categorical variables
involves visualizing counts and proportions.

Distributions can be viewed as

* joint distributions;

* conditional distributions.

When one variable (or several) can be viewed as a response and others
as predictors then it is common to focus on the conditional
distribution of the response given the predictors.

The most common approaches use variants of bar and area charts.

The resulting plots are often called [_mosaic
plots_](https://en.wikipedia.org/wiki/Mosaic_plot).

---
layout: true
## Two Data Sets
---

.pull-left.width-35[
### Hair and Eye Color

```r
HairEyeColorDF <-
    as.data.frame(HairEyeColor)
head(HairEyeColorDF)
##    Hair   Eye  Sex Freq
## 1 Black Brown Male   32
## 2 Brown Brown Male   53
## 3   Red Brown Male   10
## 4 Blond Brown Male    3
## 5 Black  Blue Male   11
## 6 Brown  Blue Male   50
```
]

.pull-right.width-65[
Marginal distributions of the variables:

.hide-code[

```r
p1 <- ggplot(HairEyeColorDF) +
    geom_col(aes(Sex, Freq), fill = "deepskyblue3") +
    thm
p2 <- mutate(HairEyeColorDF, Hair = reorder(Hair, -Freq, sum)) |>
    ggplot() +
    geom_col(aes(Hair, Freq), fill = "deepskyblue3") +
    thm
p3 <- ggplot(HairEyeColorDF) +
    geom_col(aes(Eye, Freq), fill = "deepskyblue3") +
    thm
p1 | p2 | p3
```

<img src="proportions_files/figure-html/unnamed-chunk-38-1.png" style="display: block; margin: auto;" />
]
]

---

### Arthritis Data

.pull-left[
The `vcd` package includes the data frame `Arthritis` with several
variables for 84 patients in a clinical trial for a treatment for
rheumatoid arthritis.
{{content}}
]
--

```r
data(Arthritis, package = "vcd")
head(Arthritis)
##   ID Treatment  Sex Age Improved
## 1 57   Treated Male  27     Some
## 2 46   Treated Male  29     None
## 3 77   Treated Male  30     None
## 4 17   Treated Male  32   Marked
## 5 36   Treated Male  46   Marked
## 6 23   Treated Male  58   Marked
```
{{content}}
--

* The `Improved` variable is the response.
{{content}}
--

* The predictors are `Treatment`, `Sex`, and `Age`.

.pull-right[
Counts for the categorical predictors:

```r
xtabs(~ Sex, Arthritis)
## Sex
## Female   Male 
##     59     25
```
{{content}}
]

```r
xtabs(~ Treatment, Arthritis)
## Treatment
## Placebo Treated 
##      43      41
```
{{content}}
--

```r
xtabs(~ Treatment + Sex, data = Arthritis)
##          Sex
## Treatment Female Male
##   Placebo     32   11
##   Treated     27   14
```

---

.pull-left.width-35[
Joint distribution of the predictors:
]

.pull-right.width-65[
.hide-code[

```r
ggplot(Arthritis) +
    geom_histogram(aes(x = Age),
                   binwidth = 10,
                   fill = "deepskyblue3",
                   color = "black") +
    facet_grid(Treatment ~ Sex) +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left.width-35[
Conditional distribuiton of age, given sex and treatment:
]

.pull-right.width-65[
.hide-code[

```r
ggplot(Arthritis) +
    geom_histogram(aes(x = Age,
                       y = after_stat(density)),
                   binwidth = 10,
                   fill = "deepskyblue3",
                   color = "black") +
    facet_grid(Treatment ~ Sex) +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-44-1.png" style="display: block; margin: auto;" />
]
]

---
layout: true
## Bar Charts
---

.pull-left.width-40[
### Hair and Eye Color

Default bar charts show the individual count or joint proportions.
{{content}}
]
--

For the hair-eye color aggregated data counts:

.pull-right.width-60[
.hide-code[

```r
ggplot(HairEyeColorDF) +
    geom_col(aes(x = Eye, y = Freq, fill = Sex)) +
    facet_wrap(~ Hair) +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-45-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left.width-60[
Joint proportions:

.hide-code[

```r
ggplot(mutate(HairEyeColorDF, Prop = Freq / sum(Freq))) +
    geom_col(aes(x = Eye, y = Prop, fill = Sex)) +
    facet_wrap(~ Hair) +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-46-1.png" style="display: block; margin: auto;" />
]
]

.pull-right.width-40[
* Differing frequencies of the hair colors are visible.

{{content}}
]
--

* Conditional distributions of eye color within hair color are
  harder to compare.

---

Showing conditional distributions requires computing proportions
within groups.

For the joint conditional distribution of sex and eye color given hair color:

.pull-left.width-60[

.hide-code[

```r
group_by(HairEyeColorDF, Hair) |>
    mutate(Prop = Freq / sum(Freq)) |>
    ungroup() |>
    ggplot() +
    geom_col(aes(x = Eye, y = Prop, fill = Sex)) +
    facet_wrap(~ Hair) +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-47-1.png" style="display: block; margin: auto;" />
]
]
--

.pull-right.width-40[
* It is easier to compare the skewness of the eye color
  distributions for black, brown, and red hair.
{{content}}
]
--

* Assessing the proportion of females or males withing the different
  groups is possible but challenging since it requires relative
  length comparisons.

---

To more clearly see the that the proportion of females among subjects
with blond hair and blue eyes is higher than for other hair/eye color
combinations we can look at the conditional distribution of sex given
hair and eye color.

.pull-left[
.hide-code[

```r
group_by(HairEyeColorDF, Hair, Eye) |>
    mutate(Prop = Freq / sum(Freq)) |>
    ungroup() |>
    ggplot() +
    geom_col(aes(x = Eye,
                 y = Prop,
                 fill = Sex)) +
    facet_wrap(~ Hair, nrow = 1) +
    thm +
    theme(axis.text.x =
              element_text(angle = 45,
                           hjust = 1))
```

<img src="proportions_files/figure-html/unnamed-chunk-48-1.png" style="display: block; margin: auto;" />
]
]
--

.pull-right[
This plot can also be obtained using `position = "fill"`.

.hide-code[

```r
ggplot(HairEyeColorDF) +
    geom_col(aes(x = Eye,
                 y = Freq,
                 fill = Sex),
             position = "fill") +
    facet_wrap(~ Hair, nrow = 1) +
    thm +
    theme(axis.text.x =
              element_text(angle = 45,
                           hjust = 1))
```
]
{{content}}
]
--

One drawback: This visualization no longer shows that some of the
hair/eye color combinations are more common than others.

---

.pull-left[
### Arthritis Data

For the raw arthritis data, `geom_bar` computes the aggregate counts
and produces a stacked bar chart by default:
]

.pull-right[
.hide-code[

```r
p <- ggplot(Arthritis, aes(x = Sex,
                           fill = Improved)) +
    facet_wrap(~ Treatment)
p + geom_bar() +
    scale_fill_brewer(palette = "Blues") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-50-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Specifying `position = "dodge"` produces a side-by-side plot:
]

.pull-right[
.hide-code[

```r
p + geom_bar(position = "dodge") +
    scale_fill_brewer(palette = "Blues") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-51-1.png" style="display: block; margin: auto;" />
]
{{content}}
]
--

There are no cases of male patients on placebo reporting `Some`
improvement, resulting in wider bars for the other options.

---

.pull-left[
One way to produce a zero height bar:

* aggregate with `count`, and

* use `complete` from `tidyr`

]

.pull-right[
.hide-code[

```r
library(tidyr)
comp_counts <-
    count(Arthritis,
          Treatment, Sex, Improved) |>
    complete(Treatment, Sex, Improved,
             fill = list(n = 0))
ggplot(comp_counts,
       aes(x = Sex, y = n, fill = Improved)) +
    geom_col(position = "dodge") +
    facet_wrap(~ Treatment) +
    scale_fill_brewer(palette = "Blues") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-52-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Another option is to use the `preserve = "single"` option with
`position_dodge`.
]

.pull-right[
.hide-code[

```r
p + geom_bar(position =
                 position_dodge(
                     preserve = "single")) +
    scale_fill_brewer(palette = "Blues") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-53-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Showing conditional distributions of `Improved` given different levels
of `Treatment` and `Sex`:
]

.pull-right[
.hide-code[

```r
group_by(comp_counts, Treatment, Sex) |>
    mutate(prop = n / sum(n)) |>
    ungroup() |>
    ggplot() +
    geom_col(aes(x = Sex,
                 y = prop,
                 fill = Improved),
             position = "dodge") +
    facet_wrap(~ Treatment) +
    scale_fill_brewer(palette = "Blues") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-54-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Stacked bar charts with height one are another option to make
these conditional distributions easier to compare:
]

.pull-right[
.hide-code[

```r
p + geom_bar(position = "fill") +
    scale_fill_brewer(palette = "Blues") +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-55-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Ordering of variables affects which comparisons are easier.
{{content}}
]
--

* A researcher might want to emphasize the differential response among
  males and females.
{{content}}
--

* A patient might prefer to be able to focus on whether the treatment
  is effective for them:

.pull-right[
.hide-code[

```r
ggplot(Arthritis, aes(x = Treatment, fill = Improved)) +
    geom_bar(position = "fill") +
    scale_fill_brewer(palette = "Blues") +
    thm +
    facet_wrap(~ Sex)
```

<img src="proportions_files/figure-html/unnamed-chunk-56-1.png" style="display: block; margin: auto;" />
]
]

---

Some notes;

* The stacked bar chart is effective for two categories, and a few
  more if they are ordered.

* Providing a visual indication of uncertainty in the estimates is a
  challenge. The standard errors in this case are around 0.1.

* The proportions of each treatment group that are male or female
  could be encoded in the bar widths.

* The resulting plot is called a _spine plot_.

* Basic `ggplot2` does not seem to make this easy.

---
layout: true
## Spine Plots
---

_Spine plots_ are a special case of [_mosaic
plots_](https://en.wikipedia.org/wiki/Mosaic_plot), and can be seen as
a generalization of stacked bar plots.

For a spine plot the proportions for the categories of a predictor
variable are encoded in the bar widths.

The `ggmosaic` package provides support for mosaic plots in the
`ggplot` framework. (It can be a little rough around the edges.)

Spine plots are provided by the base graphics function `spineplot` and
the `vcd` function `spine`.

`vcd` plots are built on the `grid` graphics system, like `lattice`
and `ggplot2` graphics.

---

.pull-left[
A spine plot for the distribution of `Improved` given `Sex` in the
`Treated` group:
]

.pull-right[
.hide-code[

```r
library(ggmosaic)
filter(Arthritis, Treatment == "Treated") |>
    mutate(Improved = fct_rev(Improved)) |>
    ggplot() +
    geom_mosaic(aes(x = product(Sex),
                    fill = Improved)) +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    facet_wrap(~ Treatment) +
    thm + labs(x = "", y = "Improved")
```

<img src="proportions_files/figure-html/unnamed-chunk-57-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Spine plots for `Treatment` groups using faceting:
]

.pull-right[
.hide-code[

```r
library(ggmosaic)
mutate(Arthritis,
       Improved = fct_rev(Improved)) |>
    ggplot() +
    geom_mosaic(aes(x = product(Sex),
                    fill = Improved)) +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    facet_wrap(~ Treatment) +
    thm + labs(x = "", y = "Improved")
```

<img src="proportions_files/figure-html/unnamed-chunk-58-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Spine plots for the arthritis data, faceted on `Sex`:
]

.pull-right[
.hide-code[

```r
library(ggmosaic)
mutate(Arthritis,
       Improved = fct_rev(Improved)) |>
    ggplot() +
    geom_mosaic(aes(x = product(Treatment),
                    fill = Improved)) +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    facet_wrap(~ Sex) +
    thm + labs(x = "", y = "Improved")
```

<img src="proportions_files/figure-html/unnamed-chunk-59-1.png" style="display: block; margin: auto;" />
]
{{content}}
]
--
This no longer shows the Female/Male imbalance.

---

.pull-left[
For aggregate counts use the weight aesthetic:
]

.pull-right[
.hide-code[

```r
mutate(HairEyeColorDF, Sex = fct_rev(Sex)) |>
    ggplot() +
    geom_mosaic(aes(weight = Freq,
                    x = product(Hair),
                    fill = Sex)) +
    thm + labs(x = "Hair", y = "")
```

<img src="proportions_files/figure-html/unnamed-chunk-60-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Spine plots of `Sex` within `Eye` color, faceted on `Hair` color:

.hide-code[

```r
mutate(HairEyeColorDF, Sex = fct_rev(Sex)) |>
    ggplot() +
    geom_mosaic(aes(weight = Freq,
                    x = product(Eye),
                    fill = Sex)) +
    thm + labs(x = "Eye", y = "") +
    facet_wrap(~ Hair,
               nrow = 1,
               scales = "free_x") +
    theme(legend.position = "top",
          axis.text.y = element_blank(),
          axis.text.x =
              element_text(angle = 45,
                           hjust = 1)) +
    scale_y_continuous(expand = c(0, 0))
```

<img src="proportions_files/figure-html/unnamed-chunk-61-1.png" style="display: block; margin: auto;" />
]
]

.pull-right[
The relative sizes of the groups on the `x` (eye color) axis are shown
within the facets.
{{content}}
]
--

The sizes of the faceted variable (hair color) groups are not reflected.
{{content}}
--

_Double decker plots_ try to address this.

---
layout: true
## Doubledecker Plots
---

_Doubledecker plots_ can be viewed as a generalization of spine plots
to multiple predictors.

Package `vcd` provides the `doubledecker` function.

This function can use a formula interface.

.pull-left[
.hide-code[

```r
arth_pal <-
    RColorBrewer::brewer.pal(3, "Blues")
arth_gp <- grid::gpar(fill = arth_pal)
vcd::doubledecker(Improved ~ Treatment + Sex,
                  data = Arthritis,
                  gp = arth_gp,
                  margins = c(2, 5, 4, 2))
```

<img src="proportions_files/figure-html/unnamed-chunk-62-1.png" style="display: block; margin: auto;" />
]
]

.pull-right[
.hide-code[

```r
vcd::doubledecker(Improved ~ Sex + Treatment,
                  data = Arthritis,
                  gp = arth_gp,
                  margins = c(2, 5, 4, 2))
```

<img src="proportions_files/figure-html/unnamed-chunk-63-1.png" style="display: block; margin: auto;" />
]
]

---

Using `ggmosaic`:

.pull-left[
.hide-code[

```r
mutate(Arthritis,
       Improved = fct_rev(Improved)) |>
    ggplot() +
    geom_mosaic(
        aes(x = product(Sex, Treatment),
            fill = Improved),
        divider = ddecker()) +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    thm +
    theme(axis.text.x =
              element_text(angle = 15,
                           hjust = 1)) +
    labs(x = "", y = "")
```

<img src="proportions_files/figure-html/unnamed-chunk-64-1.png" style="display: block; margin: auto;" />
]
]

.pull-right[
.hide-code[

```r
mutate(Arthritis,
       Improved = fct_rev(Improved)) |>
    ggplot() +
    geom_mosaic(
        aes(x = product(Treatment, Sex),
            fill = Improved),
        divider = ddecker()) +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    thm +
    theme(axis.text.x =
              element_text(angle = 15,
                           hjust = 1)) +
    labs(x = "", y = "")
```

<img src="proportions_files/figure-html/unnamed-chunk-65-1.png" style="display: block; margin: auto;" />
]
]

---
layout: true
## Mosaic Plots
---

_Mosaic plots_ recursively partition the axes to represent counts of
categorical variables as rectangles.

.pull-left[
* Base graphics provides `mosaicplot`;
{{content}}
]
--

* `vcd` provides `mosaic`.
{{content}}
--

Both support a formula interface.
{{content}}
--

A Mosaic plot for the predictors `Sex` and `Treatment`:

.pull-right[
.hide-code[

```r
vcd::mosaic(~ Sex + Treatment,
            data = Arthritis)
```

<img src="proportions_files/figure-html/unnamed-chunk-66-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
Adding `Improved` to the joint distribution:

.hide-code[

```r
vcd::mosaic(~ Sex + Treatment + Improved,
            data = Arthritis)
```

<img src="proportions_files/figure-html/unnamed-chunk-67-1.png" style="display: block; margin: auto;" />
]
]

.pull-right[
.hide-code[
Identifying `Improved` as the response:

```r
vcd::mosaic(Improved ~ Sex + Treatment,
            data = Arthritis)
```

<img src="proportions_files/figure-html/unnamed-chunk-68-1.png" style="display: block; margin: auto;" />
]
]

---

Matching the doubledecker plots:

.pull-left[
.hide-code[

```r
vcd::mosaic(
         Improved ~ Treatment + Sex,
         data = Arthritis,
         split_vertical = c(TRUE, TRUE, FALSE))
```

<img src="proportions_files/figure-html/unnamed-chunk-69-1.png" style="display: block; margin: auto;" />
]
]

.pull-right[
.hide-code[

```r
vcd::mosaic(
         Improved ~ Sex + Treatment,
         data = Arthritis,
         split_vertical = c(TRUE, TRUE, FALSE))
```

<img src="proportions_files/figure-html/unnamed-chunk-70-1.png" style="display: block; margin: auto;" />
]
]

---

Some variants using `ggmosaic`:

.pull-left[
.hide-code[

```r
ggplot(mutate(Arthritis, Sex = fct_rev(Sex))) +
    geom_mosaic(
        aes(x = product(Treatment,
                        Sex))) +
    coord_flip() +
    labs(x = "", y = "")
```

<img src="proportions_files/figure-html/unnamed-chunk-71-1.png" style="display: block; margin: auto;" />
]
]

.pull-right[
.hide-code[

```r
ggplot(mutate(Arthritis,
              Sex = fct_rev(Sex),
              Improved = fct_rev(Improved))) +
    geom_mosaic(aes(x = product(Improved,
                                Treatment,
                                Sex))) +
    coord_flip()
```

<img src="proportions_files/figure-html/unnamed-chunk-72-1.png" style="display: block; margin: auto;" />
]
]

---

A mosaic plot for all bivariate marginals:

.hide-code[

```r
pairs(xtabs(~ Sex + Treatment + Improved, data = Arthritis))
```

<img src="proportions_files/figure-html/unnamed-chunk-73-1.png" style="display: block; margin: auto;" />
]

---
layout: true
## Spinograms and CD Plots
---

<!-- building a spinogram from scratch, more or less:

```r
Arth <- mutate(Arthritis,
               AgeBin = cut(Age, seq(20, by = 10, len = 7)),
               Improved = fct_rev(Improved))
d <- filter(Arth, Treatment == "Treated") |>
    count(AgeBin) |>
    mutate(brk = (cumsum(lag(n, default = 0)) + 0.5 * n) / sum(n))
p <- ggplot(d)

p + geom_col(aes(x = AgeBin, y = n))

p2 <- p +
    geom_col(aes(x = 0.5, y = n, color = fct_rev(AgeBin)),
             position = "fill", fill = NA) +
    guides(color = "none") +
    scale_y_continuous(breaks = d$brk,
                       labels = d$AgeBin)
p2

p2 + coord_flip()
```

.pull-left[
_Spinograms_ and _CD plots_ show the conditional distribution of a
categorical variable given the value of a numeric variable.
{{content}}
]
--

* Spinograms use the same binning as a histogram and then create a
  spine plot.
{{content}}
--

* CD plots use a smoothing or density estimation approach.

.pull-right[
A spinogram for `Improved` against `Age`:

.hide-code[

```r
ArthT <- filter(Arthritis,
                Treatment == "Treated") |>
    mutate(Improved = fct_rev(Improved))
arthT_gp <-
    grid::gpar(fill = rev(arth_gp$fill))
vcd::spine(Improved ~ Age,
           data = ArthT,
           gp = arthT_gp,
           breaks = 5)
```

<img src="proportions_files/figure-html/unnamed-chunk-74-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
An analogous plot created with `ggmosaic` by binning the `Age` variable:
]

.pull-right[
.hide-code[

```r
Arth <-
    mutate(Arthritis,
           AgeBin = cut(Arthritis$Age,
                        seq(20, by = 10,
                            len = 7)),
           Improved = fct_rev(Improved))
filter(Arth, Treatment == "Treated") |>
    count(Improved, AgeBin) |>
    ggplot() +
    geom_mosaic(aes(weight = n,
                    x = product(AgeBin),
                    fill = Improved)) +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    theme_minimal() +
    theme(axis.title = element_blank())
```

<img src="proportions_files/figure-html/unnamed-chunk-75-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left.width-45[
A facet grid can be used to create spinograms for each of the
`Sex`/`Treatment` combinations:
]

.pull-right.width-55[
.hide-code[

```r
ggplot(count(Arth, Improved, Sex, Treatment, AgeBin)) +
    geom_mosaic(aes(weight = n,
                    x = product(AgeBin),
                    fill = Improved)) +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    theme_minimal() +
    facet_grid(Treatment ~ Sex) +
    theme(axis.title = element_blank()) +
    theme(axis.text.x = element_text(angle = 35,
                                     hjust = 1),
          axis.text.y = element_blank())
```

<img src="proportions_files/figure-html/unnamed-chunk-76-1.png" style="display: block; margin: auto;" />
]
]

---

A [spinogram](https://flowingdata.com/2021/08/02/decline-of-u-s-vaccination-rate-compared-against-europes/) in the media (NYT, August 2021):

---

Some plots in a [Twitter thread](https://twitter.com/jburnmurdoch/status/1503420660869214213?s=20&t=RZxOXiHZ0eXkV6WXMIuVVA):

.pull-left[
<img src="../img/covid-spinogram.png" width="90%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="../img/covid-mort.png" width="90%" style="display: block; margin: auto;" />
]

---

CD plots estimate the conditional density of the `x` variable given the
levels of `y`, weighted by the marginal proportions of `y` and use
these to estimate cumulative probabilities.

* The slice at a particular `x` level visualizes the conditional
  distribution of `y` given `x` at that level.

* `geom_density` with `position = stack` is one way to create a CD
  plot.

* The `cd_plot` function from the `vcd` package produces a CD plot
  using `grid` graphics.

* The `cdplot` function from the base `graphics` package provides the
  same plots using base graphics.

<!-- Building a CD plot in steps:
```r
d <- filter(Arthritis, Treatment == "Treated", Sex == "Female")

## use y = after_stat(count) so area is number of rows
p0 <- ggplot(d, aes(x = Age, y = after_stat(count))) +
    geom_density(bw = 5, fill = "grey")
p0

## separate count-weighted densities for each group with alpha blending
p1 <- ggplot(d, aes(x = Age, y = after_stat(count), fill = Improved)) +
    geom_density(bw = 5, alpha = 0.5) +
    scale_fill_brewer(palette = "Blues") +
    guides(fill = "none")
p1

## easier to see the separate densities with faceting
p1 + facet_wrap(~ Improved)

## stack the densities
ggplot(d, aes(x = Age, y = after_stat(count), fill = Improved)) +
    geom_density(position = "stack", bw = 5) +
    scale_fill_brewer(palette = "Blues") +
    guides(fill = "none")

## rescale to height 1 with position = "fill"
p2 <- ggplot(d, aes(x = Age, y = after_stat(count), fill = Improved)) +
    geom_density(position = "fill", bw = 5) +
    scale_fill_brewer(palette = "Blues") +
    guides(fill = "none")
p2

p2 + geom_vline(xintercept = 40, lty = 2)

p2 + geom_vline(xintercept = 60, lty = 2)
```
-->

---

.pull-left[
CD plots for the `Treated` group:
]

.pull-right[
.hide-code[

```r
filter(Arthritis, Treatment == "Treated") |>
    ggplot(aes(x = Age, fill = Improved)) +
    geom_density(position = "fill", bw = 5) +
    scale_fill_brewer(palette = "Blues") +
    facet_wrap(~ Sex, ncol = 1) +
    thm
```

<img src="proportions_files/figure-html/unnamed-chunk-80-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
CD plots for all combinations end up with one group of size one and
one of size zero, which produces a non-useful plot for one
combination:
]

.pull-right[
.hide-code[

```r
count(Arthritis, Treatment, Sex, Improved) |>
    complete(Treatment, Sex, Improved,
             fill = list(n = 0)) |>
    filter(n < 2)
## # A tibble: 2 × 4
##   Treatment Sex   Improved     n
##   <fct>     <fct> <ord>    <int>
## 1 Placebo   Male  Some         0
## 2 Placebo   Male  Marked       1

ggplot(Arthritis,
       aes(x = Age, fill = Improved)) +
    geom_density(position = "fill", bw = 5) +
    scale_fill_brewer(palette = "Blues") +
    facet_grid(Treatment ~ Sex) +
    thm
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
```

<img src="proportions_files/figure-html/unnamed-chunk-81-1.png" style="display: block; margin: auto;" />
]
]

---
layout: true
## Uncertainty Representation
---

.pull-left[
Categorical data are often analyzed by fitting models representing
conditional independence structures.
{{content}}
]
--

* Plotting residuals from these models can help assess how well they
  fit.
{{content}}
--

* `vcd::mosaic` supports using color to represent magnitude of residuals
  for comparing to a simple independence model.
{{content}}
--

For the `Arthritis` data, observed counts and expected counts under an
independence model assuming `Treatment` and `Improved` are independent
can be visualized as mosaic plots:

.pull-right[
.hide-code[

```r
## there are easier ways do do this ...
v <- count(Arthritis, Treatment, Improved)
pT <- group_by(v, Treatment) |>
    summarize(n = sum(n)) |>
    mutate(pT = n / sum(n)) |>
    select(-n)
pI <- group_by(v, Improved) |>
    summarize(n = sum(n)) |>
    mutate(pI = n / sum(n)) |>
    select(-n)
v <- left_join(v, pT, "Treatment") |>
    left_join(pI, "Improved") |>
    mutate(p = pT * pI,
           Treatment = fct_rev(Treatment))

po <- ggplot(v) +
    geom_mosaic(aes(weight = n, x = product(Improved, Treatment),
                    fill = Improved)) +
    scale_fill_brewer(palette = "Blues") +
    guides(fill = "none") +
    labs(title = "Observed Proportions") +
    thm +
    coord_flip() +
    theme(axis.text.y = element_text(angle = 90, hjust = 0))

pe <- ggplot(v) +
    geom_mosaic(aes(weight = p, x = product(Improved, Treatment),
                    fill = Improved)) +
    scale_fill_brewer(palette = "Blues") +
    guides(fill = "none") +
    labs(title = "Expected Proportions") +
    thm +
    coord_flip() +
    theme(axis.text.y = element_text(angle = 90, hjust = 0))

po + pe
```

<img src="proportions_files/figure-html/unnamed-chunk-82-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
A plot for assessing the fit of the residuals between the observed and
expected data under a model assuming independence of `Treatment` and
`Improved` produces:
]

.pull-right[
.hide-code[

```r
vcd::mosaic(~ Treatment + Improved,
            data = Arthritis,
            gp = vcd::shading_max)
```

<img src="proportions_files/figure-html/unnamed-chunk-83-1.png" style="display: block; margin: auto;" />
]
]
---

.pull-left[
Another visualization of the residuals is the _association plot_
produced by `assoc`:
]

.pull-right[
.hide-code[

```r
vcd::assoc(~ Treatment + Improved,
           data = Arthritis,
           gp = vcd::shading_max)
```

<img src="proportions_files/figure-html/unnamed-chunk-84-1.png" style="display: block; margin: auto;" />
]
]

---
layout: true
## References
---

> The vignette [_Residual-Based Shadings in
> vcd_](https://cran.r-project.org/package=vcd/vignettes/residual-shadings.pdf)
> in the `vcd` package.

> Zeileis, Achim, David Meyer, and Kurt Hornik. "Residual-based
> shadings for visualizing (conditional) independence." Journal of
> Computational and Graphical Statistics 16, no. 3 (2007): 507-525.

> The vignette [_Working with categorical data with R and the vcd and
  vcdExtra
  packages_](https://www.datavis.ca/courses/VCD/vcd-tutorial.pdf)
  in the `vcdExtra` package.

Several other experimental mosaic plot implementations are available
for `ggplot`.

---
layout: true
## Some Other Visualizations
---

.pull-left[
### Tree Maps
{{content}}
]
--

Tree maps show hierarchically structured (or tree-tructured) data.
{{content}}

* Each branch is represented by a rectangle.
{{content}}
--

* Leaf node tiles have areas proportional to the value of a variable.
{{content}}
--

* Tiles are often colored to reflect the value of another variable.
{{content}}

The package `treemapify` provides a `ggplot`-based implementation.
{{content}}

The data set `G20` includes some variables on the G-20 member countries:

.pull-right.scroll-box-20[
.hide-code[

```r
library(treemapify)
select(G20, region,
       country, gdp_mil_usd, hdi) |>
    knitr::kable(format = "html") |>
    kableExtra::kable_styling(
                    full_width = FALSE)
```

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> region </th>
   <th style="text-align:left;"> country </th>
   <th style="text-align:right;"> gdp_mil_usd </th>
   <th style="text-align:right;"> hdi </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Africa </td>
   <td style="text-align:left;"> South Africa </td>
   <td style="text-align:right;"> 384315 </td>
   <td style="text-align:right;"> 0.629 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> North America </td>
   <td style="text-align:left;"> United States </td>
   <td style="text-align:right;"> 15684750 </td>
   <td style="text-align:right;"> 0.937 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> North America </td>
   <td style="text-align:left;"> Canada </td>
   <td style="text-align:right;"> 1819081 </td>
   <td style="text-align:right;"> 0.911 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> North America </td>
   <td style="text-align:left;"> Mexico </td>
   <td style="text-align:right;"> 1177116 </td>
   <td style="text-align:right;"> 0.775 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> South America </td>
   <td style="text-align:left;"> Brazil </td>
   <td style="text-align:right;"> 2395968 </td>
   <td style="text-align:right;"> 0.730 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> South America </td>
   <td style="text-align:left;"> Argentina </td>
   <td style="text-align:right;"> 474954 </td>
   <td style="text-align:right;"> 0.811 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asia </td>
   <td style="text-align:left;"> China </td>
   <td style="text-align:right;"> 8227037 </td>
   <td style="text-align:right;"> 0.699 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asia </td>
   <td style="text-align:left;"> Japan </td>
   <td style="text-align:right;"> 5963969 </td>
   <td style="text-align:right;"> 0.912 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asia </td>
   <td style="text-align:left;"> South Korea </td>
   <td style="text-align:right;"> 1155872 </td>
   <td style="text-align:right;"> 0.909 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asia </td>
   <td style="text-align:left;"> India </td>
   <td style="text-align:right;"> 1824832 </td>
   <td style="text-align:right;"> 0.554 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Asia </td>
   <td style="text-align:left;"> Indonesia </td>
   <td style="text-align:right;"> 878198 </td>
   <td style="text-align:right;"> 0.629 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Eurasia </td>
   <td style="text-align:left;"> Russia </td>
   <td style="text-align:right;"> 2021960 </td>
   <td style="text-align:right;"> 0.788 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Eurasia </td>
   <td style="text-align:left;"> Turkey </td>
   <td style="text-align:right;"> 794468 </td>
   <td style="text-align:right;"> 0.722 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Europe </td>
   <td style="text-align:left;"> European Union </td>
   <td style="text-align:right;"> 16414483 </td>
   <td style="text-align:right;"> 0.876 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Europe </td>
   <td style="text-align:left;"> Germany </td>
   <td style="text-align:right;"> 3400579 </td>
   <td style="text-align:right;"> 0.920 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Europe </td>
   <td style="text-align:left;"> France </td>
   <td style="text-align:right;"> 2608699 </td>
   <td style="text-align:right;"> 0.893 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Europe </td>
   <td style="text-align:left;"> United Kingdom </td>
   <td style="text-align:right;"> 2440505 </td>
   <td style="text-align:right;"> 0.875 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Europe </td>
   <td style="text-align:left;"> Italy </td>
   <td style="text-align:right;"> 2014079 </td>
   <td style="text-align:right;"> 0.881 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Middle East </td>
   <td style="text-align:left;"> Saudi Arabia </td>
   <td style="text-align:right;"> 727307 </td>
   <td style="text-align:right;"> 0.782 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Oceania </td>
   <td style="text-align:left;"> Australia </td>
   <td style="text-align:right;"> 1541797 </td>
   <td style="text-align:right;"> 0.938 </td>
  </tr>
</tbody>
</table>

]
]

---

.pull-left[
A simple tree with only one level, the individual countries:

<center>
<div class="nomnoml html-widget html-fill-item" id="htmlwidget-b69fb3f0b4ecbc67e40d" style="width:252px;height:360px;"></div>
<script type="application/json" data-for="htmlwidget-b69fb3f0b4ecbc67e40d">{"x":{"code":"\n#fill: #FEFEFF\n#lineWidth: 1\n#zoom: 4\n#direction: right\n\n#padding: 25\n#fontsize: 18\n#linewidth: 2\n#direction: down\n\n[Root] -> [Germany]\n[Root] -> [France]\n[Root] -> [Japan]\n[Root] -> [China]\n[Root] -> [...]","svg":false,"png":null},"evals":[],"jsHooks":[]}</script>
</center>

]

.pull-right[
A corresponding tree map based on `gdp_mil_usd`:

.hide-code[

```r
ggplot(G20, aes(area = gdp_mil_usd)) +
    geom_treemap() +
    geom_treemap_text(aes(label = country),
                      color = "white")
```

<img src="proportions_files/figure-html/unnamed-chunk-88-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
A tree grouping by region:

<center>
<div class="nomnoml html-widget html-fill-item" id="htmlwidget-a1652eca84cc2b469639" style="width:252px;height:360px;"></div>
<script type="application/json" data-for="htmlwidget-a1652eca84cc2b469639">{"x":{"code":"\n#fill: #FEFEFF\n#lineWidth: 1\n#zoom: 4\n#direction: right\n\n#padding: 25\n#fontsize: 18\n#linewidth: 2\n#direction: down\n\n[Root] -> [Europe]\n[Root] -> [Asia]\n[Root] -> [...]\n[Europe] -> [Germany]\n[Europe] -> [France]\n[Europe] -> [....]\n[Asia] -> [Japan]\n[Asia] -> [China]\n[Asia] -> [.....]\n","svg":false,"png":null},"evals":[],"jsHooks":[]}</script>
</center>
]

.pull-right[
A corresponding tree map:

.hide-code[

```r
ggplot(G20, aes(area = gdp_mil_usd,
                subgroup = region)) +
    geom_treemap() +
    geom_treemap_text(aes(label = country),
                      color = "white") +
    geom_treemap_subgroup_border(
        color = "red") +
    geom_treemap_subgroup_text(color = "red")
```

<img src="proportions_files/figure-html/unnamed-chunk-90-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
A tree map showing GDP values for the G-20 members, grouped by region,
with fill mapped to the country's Human Development Index:
]

.pull-right[
.hide-code[

```r
ggplot(G20, aes(area = gdp_mil_usd,
                fill = hdi,
                subgroup = region)) +
    geom_treemap() +
    geom_treemap_text(aes(label = country),
                      color = "white") +
    geom_treemap_subgroup_border() +
    geom_treemap_subgroup_text(
        color = "lightgrey")
```

<img src="proportions_files/figure-html/unnamed-chunk-91-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
A treemap representing the distribution of eye color within hair color:
]

.pull-right[
.hide-code[

```r
group_by(agg, Eye, Hair) |>
    summarize(n = sum(n)) |>
    ungroup() |>
    ggplot(aes(area = n,
               subgroup = Hair)) +
    geom_treemap(aes(fill = Eye),
                 color = "white") +
    geom_treemap_subgroup_text() +
    geom_treemap_subgroup_border(
        color = "black", size = 6) +
    geom_treemap_text(aes(label = Eye),
                      color = "grey90") +
    scale_fill_manual(values = ecols) +
    guides(fill = "none")
```

<img src="proportions_files/figure-html/unnamed-chunk-92-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
A treemap representing proportions for `Improved` within `Treatment`
within `Sex` for the Arthritis data:
]

.pull-right[
.hide-code[

```r
count(Arth, Treatment, Improved, Sex) |>
    ggplot(aes(area = n,
               subgroup = Sex, fill = Improved,
               subgroup2 = Treatment)) +
    geom_treemap() +
    geom_treemap_subgroup_text() +
    scale_fill_brewer(palette = "Blues",
                      direction = -1) +
    geom_treemap_subgroup_border() +
    geom_treemap_subgroup2_text(place = "top",
                                size = 20)
```

<img src="proportions_files/figure-html/unnamed-chunk-93-1.png" style="display: block; margin: auto;" />
]
]

---

### Alluvial plots

These are also known as

* _parallel sets_, or
--

* _Sankey diagrams_.

They can be viewed as a parallel coordinates plot for categorical data.

Several implementations are available, including:

* `geom_parallel_sets` from `ggforce`;
--

* `geom_sankey` from [`ggsankey`](https://github.com/davidsjoberg/ggsankey);
--

* `geom_alluvium` from `ggalluvial`.

---

.pull-left[
Hair/Eye color using the `ggforce` package:

.hide-code[

```r
pal <- RColorBrewer::brewer.pal(3, "Set1")
HDF <- mutate(HairEyeColorDF,
              Sex = fct_rev(Sex))
library(ggforce)
sHDF <- gather_set_data(HDF, 3 : 1)
sHDF <- mutate(sHDF, x = fct_inorder(as.factor(x))) #**** simplify this?
ggplot(sHDF, aes(x, id = id,
                 split = y,
                 value = Freq)) +
    geom_parallel_sets(aes(fill = Sex),
                       alpha = 0.5,
                       axis.width = 0.1) +
    geom_parallel_sets_axes(
        axis.width = 0.1) +
    geom_parallel_sets_labels(
        colour = 'white') +
    scale_fill_manual(
        values = c(Male = pal[2],
                   Female = pal[1])) +
    theme_void() + guides(fill = "none")
```

<img src="proportions_files/figure-html/unnamed-chunk-95-1.png" style="display: block; margin: auto;" />
]
]

.pull-right[
Arthritis data with `ggforce`:

.hide-code[

```r
sArth <- mutate(Arth,
                Improved = factor(Improved,
                                  ordered = FALSE)) |>
    count(Improved, Treatment, Sex) |>
    gather_set_data(3 : 1)
sArth <- mutate(sArth,
                x = fct_inorder(factor(x)),
                Sex = fct_rev(Sex))
ggplot(sArth, aes(x,
                  id = id,
                  split = y,
                  value = n)) +
    geom_parallel_sets(aes(fill = Sex),
                       alpha = 0.5,
                       axis.width = 0.1) +
    geom_parallel_sets_axes(axis.width = 0.1) +
    geom_parallel_sets_labels(
        colour = 'white') +
    scale_fill_manual(
        values = c(Male = pal[2],
                   Female = pal[1])) +
    theme_void() + guides(fill = "none")
```

<img src="proportions_files/figure-html/unnamed-chunk-97-1.png" style="display: block; margin: auto;" />
]
]

---

.pull-left[
### Stream Graphs
{{content}}
]
--

[Stream
  graphs](https://www.visualisingdata.com/2010/08/making-sense-of-streamgraphs/)
  are a generalization of stacked bar charts plotted against a numeric
  variable.
{{content}}
--

In some cases the origins of the bars are shifted to improve some
aspect of the overall visualization.
{{content}}
--

An early example is the [Baby Name
Voyager](https://www.bewitched.com/namevoyager.html).
(A more recent variant is also
[available](https://namerology.com/baby-name-grapher/).)
{{content}}
--

A [NY Times
visualization](https://archive.nytimes.com/www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.html?_r=0)
of movie box office results is another example. ([Blog post with a
static
version](https://flowingdata.com/2008/02/25/ebb-and-flow-of-box-office-receipts-over-past-20-years/)).
{{content}}
--

Some R implementations on GitHub:
{{content}}
--

* [`ggTimeSeries`](https://github.com/AtherEnergy/ggTimeSeries)
* [`streamgraph`](https://hrbrmstr.github.io/streamgraph/) (uses D3)
* [`ggstream`](https://github.com/davidsjoberg/ggstream).

.pull-right[
A stream graph for movie genres (these are not mutually exclusive):

.hide-code[

```r
## install with: remotes::install_github("hrbrmstr/streamgraph")
library(streamgraph)
library(tidyverse)
genres <- c("Action", "Animation", "Comedy",
            "Drama", "Documentary", "Romance")
mymovies <- select(ggplot2movies::movies,
                   year, one_of(genres))
mymovies_long <- pivot_longer(
    mymovies, -year,
    names_to = "genre",
    values_to = "value")
movie_counts <- count(mymovies_long,
                      year, genre)
streamgraph(movie_counts, "genre", "n", "year")
```

<div id="htmlwidget-e18944d90b916087016f" class="streamgraph html-widget" style="width:432px;height:360px;"></div>
<div id="htmlwidget-e18944d90b916087016f-legend" style="width:432" class="streamgraph html-widget-legend"><center><label style='padding-right:5px' for='htmlwidget-e18944d90b916087016f-select'></label><select id='htmlwidget-e18944d90b916087016f-select' style='visibility:hidden;'></select></center></div>
<script type="application/json" data-for="htmlwidget-e18944d90b916087016f">{"x":{"data":{"key":["Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance","Action","Animation","Comedy","Documentary","Drama","Romance"],"value":[1,1,1,1,1,1,9,9,9,9,9,9,3,3,3,3,3,3,13,13,13,13,13,13,9,9,9,9,9,9,5,5,5,5,5,5,9,9,9,9,9,9,16,16,16,16,16,16,28,28,28,28,28,28,9,9,9,9,9,9,37,37,37,37,37,37,42,42,42,42,42,42,17,17,17,17,17,17,17,17,17,17,17,17,12,12,12,12,12,12,24,24,24,24,24,24,30,30,30,30,30,30,26,26,26,26,26,26,22,22,22,22,22,22,34,34,34,34,34,34,32,32,32,32,32,32,54,54,54,54,54,54,54,54,54,54,54,54,49,49,49,49,49,49,37,37,37,37,37,37,41,41,41,41,41,41,52,52,52,52,52,52,43,43,43,43,43,43,53,53,53,53,53,53,52,52,52,52,52,52,48,48,48,48,48,48,50,50,50,50,50,50,79,79,79,79,79,79,94,94,94,94,94,94,83,83,83,83,83,83,109,109,109,109,109,109,184,184,184,184,184,184,288,288,288,288,288,288,346,346,346,346,346,346,412,412,412,412,412,412,421,421,421,421,421,421,482,482,482,482,482,482,464,464,464,464,464,464,484,484,484,484,484,484,484,484,484,484,484,484,463,463,463,463,463,463,484,484,484,484,484,484,503,503,503,503,503,503,521,521,521,521,521,521,507,507,507,507,507,507,446,446,446,446,446,446,426,426,426,426,426,426,375,375,375,375,375,375,432,432,432,432,432,432,439,439,439,439,439,439,478,478,478,478,478,478,486,486,486,486,486,486,491,491,491,491,491,491,518,518,518,518,518,518,513,513,513,513,513,513,539,539,539,539,539,539,501,501,501,501,501,501,522,522,522,522,522,522,497,497,497,497,497,497,556,556,556,556,556,556,528,528,528,528,528,528,495,495,495,495,495,495,478,478,478,478,478,478,466,466,466,466,466,466,513,513,513,513,513,513,503,503,503,503,503,503,517,517,517,517,517,517,530,530,530,530,530,530,579,579,579,579,579,579,594,594,594,594,594,594,651,651,651,651,651,651,625,625,625,625,625,625,586,586,586,586,586,586,646,646,646,646,646,646,637,637,637,637,637,637,634,634,634,634,634,634,625,625,625,625,625,625,619,619,619,619,619,619,665,665,665,665,665,665,617,617,617,617,617,617,609,609,609,609,609,609,632,632,632,632,632,632,681,681,681,681,681,681,661,661,661,661,661,661,689,689,689,689,689,689,698,698,698,698,698,698,749,749,749,749,749,749,792,792,792,792,792,792,792,792,792,792,792,792,957,957,957,957,957,957,944,944,944,944,944,944,944,944,944,944,944,944,899,899,899,899,899,899,888,888,888,888,888,888,948,948,948,948,948,948,1016,1016,1016,1016,1016,1016,1199,1199,1199,1199,1199,1199,1248,1248,1248,1248,1248,1248,1390,1390,1390,1390,1390,1390,1568,1568,1568,1568,1568,1568,1705,1705,1705,1705,1705,1705,1927,1927,1927,1927,1927,1927,2048,2048,2048,2048,2048,2048,2121,2121,2121,2121,2121,2121,2168,2168,2168,2168,2168,2168,2158,2158,2158,2158,2158,2158,1945,1945,1945,1945,1945,1945,349,349,349,349,349,349],"date":["1893-01-01","1893-01-01","1893-01-01","1893-01-01","1893-01-01","1893-01-01","1894-01-01","1894-01-01","1894-01-01","1894-01-01","1894-01-01","1894-01-01","1895-01-01","1895-01-01","1895-01-01","1895-01-01","1895-01-01","1895-01-01","1896-01-01","1896-01-01","1896-01-01","1896-01-01","1896-01-01","1896-01-01","1897-01-01","1897-01-01","1897-01-01","1897-01-01","1897-01-01","1897-01-01","1898-01-01","1898-01-01","1898-01-01","1898-01-01","1898-01-01","1898-01-01","1899-01-01","1899-01-01","1899-01-01","1899-01-01","1899-01-01","1899-01-01","1900-01-01","1900-01-01","1900-01-01","1900-01-01","1900-01-01","1900-01-01","1901-01-01","1901-01-01","1901-01-01","1901-01-01","1901-01-01","1901-01-01","1902-01-01","1902-01-01","1902-01-01","1902-01-01","1902-01-01","1902-01-01","1903-01-01","1903-01-01","1903-01-01","1903-01-01","1903-01-01","1903-01-01","1904-01-01","1904-01-01","1904-01-01","1904-01-01","1904-01-01","1904-01-01","1905-01-01","1905-01-01","1905-01-01","1905-01-01","1905-01-01","1905-01-01","1906-01-01","1906-01-01","1906-01-01","1906-01-01","1906-01-01","1906-01-01","1907-01-01","1907-01-01","1907-01-01","1907-01-01","1907-01-01","1907-01-01","1908-01-01","1908-01-01","1908-01-01","1908-01-01","1908-01-01","1908-01-01","1909-01-01","1909-01-01","1909-01-01","1909-01-01","1909-01-01","1909-01-01","1910-01-01","1910-01-01","1910-01-01","1910-01-01","1910-01-01","1910-01-01","1911-01-01","1911-01-01","1911-01-01","1911-01-01","1911-01-01","1911-01-01","1912-01-01","1912-01-01","1912-01-01","1912-01-01","1912-01-01","1912-01-01","1913-01-01","1913-01-01","1913-01-01","1913-01-01","1913-01-01","1913-01-01","1914-01-01","1914-01-01","1914-01-01","1914-01-01","1914-01-01","1914-01-01","1915-01-01","1915-01-01","1915-01-01","1915-01-01","1915-01-01","1915-01-01","1916-01-01","1916-01-01","1916-01-01","1916-01-01","1916-01-01","1916-01-01","1917-01-01","1917-01-01","1917-01-01","1917-01-01","1917-01-01","1917-01-01","1918-01-01","1918-01-01","1918-01-01","1918-01-01","1918-01-01","1918-01-01","1919-01-01","1919-01-01","1919-01-01","1919-01-01","1919-01-01","1919-01-01","1920-01-01","1920-01-01","1920-01-01","1920-01-01","1920-01-01","1920-01-01","1921-01-01","1921-01-01","1921-01-01","1921-01-01","1921-01-01","1921-01-01","1922-01-01","1922-01-01","1922-01-01","1922-01-01","1922-01-01","1922-01-01","1923-01-01","1923-01-01","1923-01-01","1923-01-01","1923-01-01","1923-01-01","1924-01-01","1924-01-01","1924-01-01","1924-01-01","1924-01-01","1924-01-01","1925-01-01","1925-01-01","1925-01-01","1925-01-01","1925-01-01","1925-01-01","1926-01-01","1926-01-01","1926-01-01","1926-01-01","1926-01-01","1926-01-01","1927-01-01","1927-01-01","1927-01-01","1927-01-01","1927-01-01","1927-01-01","1928-01-01","1928-01-01","1928-01-01","1928-01-01","1928-01-01","1928-01-01","1929-01-01","1929-01-01","1929-01-01","1929-01-01","1929-01-01","1929-01-01","1930-01-01","1930-01-01","1930-01-01","1930-01-01","1930-01-01","1930-01-01","1931-01-01","1931-01-01","1931-01-01","1931-01-01","1931-01-01","1931-01-01","1932-01-01","1932-01-01","1932-01-01","1932-01-01","1932-01-01","1932-01-01","1933-01-01","1933-01-01","1933-01-01","1933-01-01","1933-01-01","1933-01-01","1934-01-01","1934-01-01","1934-01-01","1934-01-01","1934-01-01","1934-01-01","1935-01-01","1935-01-01","1935-01-01","1935-01-01","1935-01-01","1935-01-01","1936-01-01","1936-01-01","1936-01-01","1936-01-01","1936-01-01","1936-01-01","1937-01-01","1937-01-01","1937-01-01","1937-01-01","1937-01-01","1937-01-01","1938-01-01","1938-01-01","1938-01-01","1938-01-01","1938-01-01","1938-01-01","1939-01-01","1939-01-01","1939-01-01","1939-01-01","1939-01-01","1939-01-01","1940-01-01","1940-01-01","1940-01-01","1940-01-01","1940-01-01","1940-01-01","1941-01-01","1941-01-01","1941-01-01","1941-01-01","1941-01-01","1941-01-01","1942-01-01","1942-01-01","1942-01-01","1942-01-01","1942-01-01","1942-01-01","1943-01-01","1943-01-01","1943-01-01","1943-01-01","1943-01-01","1943-01-01","1944-01-01","1944-01-01","1944-01-01","1944-01-01","1944-01-01","1944-01-01","1945-01-01","1945-01-01","1945-01-01","1945-01-01","1945-01-01","1945-01-01","1946-01-01","1946-01-01","1946-01-01","1946-01-01","1946-01-01","1946-01-01","1947-01-01","1947-01-01","1947-01-01","1947-01-01","1947-01-01","1947-01-01","1948-01-01","1948-01-01","1948-01-01","1948-01-01","1948-01-01","1948-01-01","1949-01-01","1949-01-01","1949-01-01","1949-01-01","1949-01-01","1949-01-01","1950-01-01","1950-01-01","1950-01-01","1950-01-01","1950-01-01","1950-01-01","1951-01-01","1951-01-01","1951-01-01","1951-01-01","1951-01-01","1951-01-01","1952-01-01","1952-01-01","1952-01-01","1952-01-01","1952-01-01","1952-01-01","1953-01-01","1953-01-01","1953-01-01","1953-01-01","1953-01-01","1953-01-01","1954-01-01","1954-01-01","1954-01-01","1954-01-01","1954-01-01","1954-01-01","1955-01-01","1955-01-01","1955-01-01","1955-01-01","1955-01-01","1955-01-01","1956-01-01","1956-01-01","1956-01-01","1956-01-01","1956-01-01","1956-01-01","1957-01-01","1957-01-01","1957-01-01","1957-01-01","1957-01-01","1957-01-01","1958-01-01","1958-01-01","1958-01-01","1958-01-01","1958-01-01","1958-01-01","1959-01-01","1959-01-01","1959-01-01","1959-01-01","1959-01-01","1959-01-01","1960-01-01","1960-01-01","1960-01-01","1960-01-01","1960-01-01","1960-01-01","1961-01-01","1961-01-01","1961-01-01","1961-01-01","1961-01-01","1961-01-01","1962-01-01","1962-01-01","1962-01-01","1962-01-01","1962-01-01","1962-01-01","1963-01-01","1963-01-01","1963-01-01","1963-01-01","1963-01-01","1963-01-01","1964-01-01","1964-01-01","1964-01-01","1964-01-01","1964-01-01","1964-01-01","1965-01-01","1965-01-01","1965-01-01","1965-01-01","1965-01-01","1965-01-01","1966-01-01","1966-01-01","1966-01-01","1966-01-01","1966-01-01","1966-01-01","1967-01-01","1967-01-01","1967-01-01","1967-01-01","1967-01-01","1967-01-01","1968-01-01","1968-01-01","1968-01-01","1968-01-01","1968-01-01","1968-01-01","1969-01-01","1969-01-01","1969-01-01","1969-01-01","1969-01-01","1969-01-01","1970-01-01","1970-01-01","1970-01-01","1970-01-01","1970-01-01","1970-01-01","1971-01-01","1971-01-01","1971-01-01","1971-01-01","1971-01-01","1971-01-01","1972-01-01","1972-01-01","1972-01-01","1972-01-01","1972-01-01","1972-01-01","1973-01-01","1973-01-01","1973-01-01","1973-01-01","1973-01-01","1973-01-01","1974-01-01","1974-01-01","1974-01-01","1974-01-01","1974-01-01","1974-01-01","1975-01-01","1975-01-01","1975-01-01","1975-01-01","1975-01-01","1975-01-01","1976-01-01","1976-01-01","1976-01-01","1976-01-01","1976-01-01","1976-01-01","1977-01-01","1977-01-01","1977-01-01","1977-01-01","1977-01-01","1977-01-01","1978-01-01","1978-01-01","1978-01-01","1978-01-01","1978-01-01","1978-01-01","1979-01-01","1979-01-01","1979-01-01","1979-01-01","1979-01-01","1979-01-01","1980-01-01","1980-01-01","1980-01-01","1980-01-01","1980-01-01","1980-01-01","1981-01-01","1981-01-01","1981-01-01","1981-01-01","1981-01-01","1981-01-01","1982-01-01","1982-01-01","1982-01-01","1982-01-01","1982-01-01","1982-01-01","1983-01-01","1983-01-01","1983-01-01","1983-01-01","1983-01-01","1983-01-01","1984-01-01","1984-01-01","1984-01-01","1984-01-01","1984-01-01","1984-01-01","1985-01-01","1985-01-01","1985-01-01","1985-01-01","1985-01-01","1985-01-01","1986-01-01","1986-01-01","1986-01-01","1986-01-01","1986-01-01","1986-01-01","1987-01-01","1987-01-01","1987-01-01","1987-01-01","1987-01-01","1987-01-01","1988-01-01","1988-01-01","1988-01-01","1988-01-01","1988-01-01","1988-01-01","1989-01-01","1989-01-01","1989-01-01","1989-01-01","1989-01-01","1989-01-01","1990-01-01","1990-01-01","1990-01-01","1990-01-01","1990-01-01","1990-01-01","1991-01-01","1991-01-01","1991-01-01","1991-01-01","1991-01-01","1991-01-01","1992-01-01","1992-01-01","1992-01-01","1992-01-01","1992-01-01","1992-01-01","1993-01-01","1993-01-01","1993-01-01","1993-01-01","1993-01-01","1993-01-01","1994-01-01","1994-01-01","1994-01-01","1994-01-01","1994-01-01","1994-01-01","1995-01-01","1995-01-01","1995-01-01","1995-01-01","1995-01-01","1995-01-01","1996-01-01","1996-01-01","1996-01-01","1996-01-01","1996-01-01","1996-01-01","1997-01-01","1997-01-01","1997-01-01","1997-01-01","1997-01-01","1997-01-01","1998-01-01","1998-01-01","1998-01-01","1998-01-01","1998-01-01","1998-01-01","1999-01-01","1999-01-01","1999-01-01","1999-01-01","1999-01-01","1999-01-01","2000-01-01","2000-01-01","2000-01-01","2000-01-01","2000-01-01","2000-01-01","2001-01-01","2001-01-01","2001-01-01","2001-01-01","2001-01-01","2001-01-01","2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-01-01","2002-01-01","2003-01-01","2003-01-01","2003-01-01","2003-01-01","2003-01-01","2003-01-01","2004-01-01","2004-01-01","2004-01-01","2004-01-01","2004-01-01","2004-01-01","2005-01-01","2005-01-01","2005-01-01","2005-01-01","2005-01-01","2005-01-01"]},"markers":null,"annotations":null,"offset":"silhouette","interactive":true,"interpolate":"cardinal","palette":"Spectral","text":"black","tooltip":"black","x_tick_interval":10,"x_tick_units":"year","x_tick_format":"%Y","y_tick_count":5,"y_tick_format":",g","top":20,"right":40,"bottom":30,"left":50,"legend":false,"legend_label":"","fill":"brewer","label_col":"black","x_scale":"date","sort":true,"order":"none"},"evals":[],"jsHooks":[]}</script>
]
]

---
layout: false
## Reading

Chapters [_Visualizing
  proportions_](https://clauswilke.com/dataviz/visualizing-proportions.html)
  and [_Visualizing nested
  proportions_](https://clauswilke.com/dataviz/nested-proportions.html)
  in [_Fundamentals of Data
  Visualization_](https://clauswilke.com/dataviz/).

---
layout: false

## Interactive Tutorial

An interactive [`learnr`](https://rstudio.github.io/learnr/) tutorial
for these notes is [available](../tutorials/proportions.Rmd).

You can run the tutorial with

```r
STAT4580::runTutorial("proportions")
```

You can install the current version of the `STAT4580` package with

```r
remotes::install_gitlab("luke-tierney/STAT4580")
```

You may need to install the `remotes` package from CRAN first.

---
layout: true
## Exercises
---

1) Figure A shows a bar char of the flights leaving NYC airports in
2013 for each day of the week. Figure B shows the market share of five
major internet browsers in 2015.

For which of these bar charts would it be better to reorder the
categories so the bars are ordered from largest to smallest?
    
*   a. Yes for Figure A. No for Figure B.
*   b. No for Figure A. Yes for Figure B.
*   c. Yes for both.
*   d. No for both.

---

2) Consider the stacked bar chart `p1` and the spine plot `p2` for the
hair and eye color data produced by the following code:

```r
library(dplyr)
library(ggplot2)
library(ggmosaic)
ecols <- c(Brown = "brown2", Blue = "blue2",
           Hazel = "darkgoldenrod3", Green = "green4")
HairEyeColorDF <- as.data.frame(HairEyeColor)
p0 <- ggplot(HairEyeColorDF) +
    scale_fill_manual(values = ecols) +
    theme_minimal()

p1 <- p0 + geom_col(aes(x = Hair, y = Freq / sum(Freq), fill = Eye))

p2 <- p0 + geom_mosaic(aes(x = product(Hair), fill = Eye, weight = Freq))
```

Use the two plots to answer: Which hair color has the highest
proportion of individuals with green eyes?

*   a. Black
*   b. Brown
*   c. Red
*   d. Blond

Which plot makes it easiest to answer this question?

---

3) Use the plots of the previous question to answer: The proportion of
individuals with red hair is closest to:

*   a. 5%
*   b. 8%
*   c. 12%
*   c. 20%

Which plot makes it easiest to answer this question?

<!--
pareto chart

- pie charts
  - bar charts
  - stacked bar charts
  - grouped bar charts
  - population pyramids
  - waffle charts, square pie charts

- joint and conditional distributions
  - spine plots
  - spinograms?
  - double decker plots
  - cd_plots
  - mosaic plots
  - tree maps
  - sankey diagrams, alluvial charts, parallel sets
  - chord diagrams
  - stream graphs
-->

//adapted from Emi Tanaka's gist at //https://gist.github.com/emitanaka/eaa258bb8471c041797ff377704c8505