---
#############################################################
# #
# In RStudio click on "Run Document" to run this tutorial #
# #
#############################################################
title: "Visualizing Distributions"
author: "Luke Tierney"
output: learnr::tutorial
runtime: shiny_prerendered
---
```{r setup, include = FALSE}
library(learnr)
library(tidyverse)
knitr::opts_chunk$set(echo = FALSE, comment = "", warning = FALSE)
<>
<>
```
```{r stop_when_browser_closes, context = "server"}
# stop the app when the browser is closed (or, unfortunately, refreshed)
session$onSessionEnded(stopApp)
```
## Life Expectancy by Continent
These exercises will use `gapminder` data:
```{r load-packages, echo = TRUE, eval = FALSE}
library(ggplot2)
library(dplyr)
library(gapminder)
```
In particular, you will look at distributions of life expectancy
values for the continents in two years covered by the data set, 1952
and 2007. Oceania is dropped since only two countries are included,
and converting `year` to a factor will simplify some a the plots a
bit.
```{r prepare-data, echo = TRUE, eval = FALSE}
gap <- filter(gapminder,
continent != "Oceania",
year %in% c(1952, 2007)) %>%
mutate(year = factor(year))
```
## Density Plots for Two Years
Create a density plot of the combined life expectancy values for the
two years, showing the densities for the two years in different
colors.
```{r density-year, exercise = TRUE}
```
```{r density-year-solution}
ggplot(gap, aes(x = lifeExp, color = year)) +
geom_density()
```
The plot would look a better if the `x` axis range was a little wider.
Use `xlim()` to make the `x` axis range go from 20 to 90 years.
```{r density-year-wide, exercise = TRUE}
```
```{r density-year-wide-solution}
ggplot(gap, aes(x = lifeExp, color = year)) +
geom_density() +
xlim(c(10, 90))
```
You can also fill the areas under the curves. Reducing the alpha level
may help.
```{r density-year-fill, exercise = TRUE}
```
```{r density-year-fill-solution}
ggplot(gap, aes(x = lifeExp, fill = year)) +
geom_density(alpha = 0.8) +
xlim(c(20, 90))
```
Instead of using color you can place the densities for the two years
in different facets.
```{r density-facet, exercise = TRUE}
```
```{r density-facet-solution}
ggplot(gap, aes(x = lifeExp)) +
geom_density() +
facet_wrap(~year) +
xlim(c(10, 90))
```
Placing the two density plots in a column can make it easier to see
how features have changed between the two years.
```{r density-column, exercise = TRUE}
```
```{r density-column-solution}
ggplot(gap,
aes(x = lifeExp)) +
geom_density() +
facet_wrap(~year, ncol = 1) +
xlim(c(20, 90))
```
## Separating by Continent
Use `geom_boxplot` to show box plots for each continent and year
combination. Start with `continent` on the `y` axis and `year` mapped
to color.
```{r box-continent-year, exercise = TRUE}
```
```{r box-continent-year-solution}
ggplot(gap, aes(x = lifeExp, y = continent, fill = year)) +
geom_boxplot()
```
Try reversing the mapping of `year` and `continent`.
Another option is to use violin plots. Give that a try.
```{r violin-continent-year, exercise = TRUE}
```
```{r violin-continent-year-solution}
ggplot(gap, aes(x = lifeExp, y = continent, fill = factor(year))) +
geom_violin()
```
## Density Plots for Continents and Years
You can show the densities for all continents in the data with faceting,
and use fill color to identify the years within each continent.
```{r density-continent-year, exercise = TRUE}
```
```{r density-continent-year-solution}
ggplot(gap,
aes(x = lifeExp, fill = factor(year))) +
geom_density() +
facet_wrap(~ continent, ncol = 1) +
xlim(c(20, 90))
```
An alternative is to use density ridges.
```{r ridges-continent-year, exercise = TRUE}
```
```{r ridges-continent-year-solution}
library(ggridges)
ggplot(gap,
aes(x = lifeExp,
y = continent,
group = interaction(continent, factor(year)),
fill = factor(year))) +
geom_density_ridges()
```
## Exercises
### Exercise 1
Consider the code
```{r galton-hist-bins, exercise = TRUE}
library(ggplot2)
data(Galton, package = "HistData")
ggplot(Galton, aes(x = parent)) +
geom_histogram(---, fill = "grey", color = "black")
```
```{r galton-hist-bins-question, echo = FALSE}
question(
paste("Which of the following replacements for `---` produces a",
"histogram with bins that are one inch wide and start at",
"whole integers?"),
answer("`binwidth = 1`"),
answer("`binwidth = 1, center = 66.5`", correct = TRUE),
answer("`binwidth = 2, center = 66`"),
answer("`center = 66`"),
random_answer_order = TRUE,
allow_retry = TRUE
)
```
### Exercise 2
Consider the code
```{r density-fiddles, exercise = TRUE}
library(ggplot2)
ggplot(faithful, aes(x = eruptions)) + geom_density(---)
```
```{r density-fiddles-question, echo = FALSE}
question(
paste("Which of the following replacements for `---` produces a density",
"plot with the area under the density in blue and no black border?"),
answer("`color = \"lightblue\"`"),
answer("`fill = \"black\", color = \"lightblue\"`"),
answer("`fill = \"lightblue\", color = NA`", correct = TRUE),
answer("`fill = NA, color = \"black\"`"),
random_answer_order = TRUE,
allow_retry = TRUE
)
```
### Exercise 3
Consider the code
```{r violin-fiddles, exercise = TRUE}
library(ggplot2)
library(gapminder)
p <- ggplot(gapminder, aes(y = continent, x = lifeExp))
```
```{r violin-fiddles-question, echo = FALSE}
question(
paste("Which of the following produces violin plots without trimming",
"at the smallest and largest observations, and including a line",
"at the median?"),
answer("`p + geom_violin(trim = FALSE)`"),
answer("`p + geom_violin(trim = TRUE, show_median = TRUE)`"),
answer("`p + geom_violin(trim = FALSE, draw_quantiles = 0.5)`",
correct = TRUE),
answer("`p + geom_violin(trim = TRUE, show_quantiles = 0.5)`"),
random_answer_order = TRUE,
allow_retry = TRUE
)
```
### Exercise 4
Density ridges can also show quantiles, but the details of how to
request this are different. Consider this code:
```{r density-ridges-fiddles, exercise = TRUE}
library(ggplot2)
library(ggridges)
library(gapminder)
ggplot(gapminder, aes(x = lifeExp, y = year, group = year)) +
geom_density_ridges(---)
```
```{r density-ridges-fiddles-question, echo = FALSE}
question(
paste("Which of the following replacements for `---` produces density",
"ridges with lines showing the locations of the medians?"),
answer("`quantiles = 0.5`"),
answer("`quantile_lines = TRUE, quantiles = 0.5`", correct = TRUE),
answer("`quantile_lines = TRUE`"),
answer("`draw_quantiles = 0.5`"),
random_answer_order = TRUE,
allow_retry = TRUE
)
```