--- ############################################################# # # # In RStudio click on "Run Document" to run this tutorial # # # ############################################################# title: "Visualizing Distributions" author: "Luke Tierney" output: learnr::tutorial runtime: shiny_prerendered --- ```{r setup, include = FALSE} library(learnr) library(tidyverse) knitr::opts_chunk$set(echo = FALSE, comment = "", warning = FALSE) <> <> ``` ```{r stop_when_browser_closes, context = "server"} # stop the app when the browser is closed (or, unfortunately, refreshed) session$onSessionEnded(stopApp) ``` ## Life Expectancy by Continent These exercises will use `gapminder` data: ```{r load-packages, echo = TRUE, eval = FALSE} library(ggplot2) library(dplyr) library(gapminder) ``` In particular, you will look at distributions of life expectancy values for the continents in two years covered by the data set, 1952 and 2007. Oceania is dropped since only two countries are included, and converting `year` to a factor will simplify some a the plots a bit. ```{r prepare-data, echo = TRUE, eval = FALSE} gap <- filter(gapminder, continent != "Oceania", year %in% c(1952, 2007)) %>% mutate(year = factor(year)) ``` ## Density Plots for Two Years Create a density plot of the combined life expectancy values for the two years, showing the densities for the two years in different colors. ```{r density-year, exercise = TRUE} ``` ```{r density-year-solution} ggplot(gap, aes(x = lifeExp, color = year)) + geom_density() ``` The plot would look a better if the `x` axis range was a little wider. Use `xlim()` to make the `x` axis range go from 20 to 90 years. ```{r density-year-wide, exercise = TRUE} ``` ```{r density-year-wide-solution} ggplot(gap, aes(x = lifeExp, color = year)) + geom_density() + xlim(c(10, 90)) ``` You can also fill the areas under the curves. Reducing the alpha level may help. ```{r density-year-fill, exercise = TRUE} ``` ```{r density-year-fill-solution} ggplot(gap, aes(x = lifeExp, fill = year)) + geom_density(alpha = 0.8) + xlim(c(20, 90)) ``` Instead of using color you can place the densities for the two years in different facets. ```{r density-facet, exercise = TRUE} ``` ```{r density-facet-solution} ggplot(gap, aes(x = lifeExp)) + geom_density() + facet_wrap(~year) + xlim(c(10, 90)) ``` Placing the two density plots in a column can make it easier to see how features have changed between the two years. ```{r density-column, exercise = TRUE} ``` ```{r density-column-solution} ggplot(gap, aes(x = lifeExp)) + geom_density() + facet_wrap(~year, ncol = 1) + xlim(c(20, 90)) ``` ## Separating by Continent Use `geom_boxplot` to show box plots for each continent and year combination. Start with `continent` on the `y` axis and `year` mapped to color. ```{r box-continent-year, exercise = TRUE} ``` ```{r box-continent-year-solution} ggplot(gap, aes(x = lifeExp, y = continent, fill = year)) + geom_boxplot() ``` Try reversing the mapping of `year` and `continent`. Another option is to use violin plots. Give that a try. ```{r violin-continent-year, exercise = TRUE} ``` ```{r violin-continent-year-solution} ggplot(gap, aes(x = lifeExp, y = continent, fill = factor(year))) + geom_violin() ``` ## Density Plots for Continents and Years You can show the densities for all continents in the data with faceting, and use fill color to identify the years within each continent. ```{r density-continent-year, exercise = TRUE} ``` ```{r density-continent-year-solution} ggplot(gap, aes(x = lifeExp, fill = factor(year))) + geom_density() + facet_wrap(~ continent, ncol = 1) + xlim(c(20, 90)) ``` An alternative is to use density ridges. ```{r ridges-continent-year, exercise = TRUE} ``` ```{r ridges-continent-year-solution} library(ggridges) ggplot(gap, aes(x = lifeExp, y = continent, group = interaction(continent, factor(year)), fill = factor(year))) + geom_density_ridges() ``` ## Exercises ### Exercise 1 Consider the code ```{r galton-hist-bins, exercise = TRUE} library(ggplot2) data(Galton, package = "HistData") ggplot(Galton, aes(x = parent)) + geom_histogram(---, fill = "grey", color = "black") ``` ```{r galton-hist-bins-question, echo = FALSE} question( paste("Which of the following replacements for `---` produces a", "histogram with bins that are one inch wide and start at", "whole integers?"), answer("`binwidth = 1`"), answer("`binwidth = 1, center = 66.5`", correct = TRUE), answer("`binwidth = 2, center = 66`"), answer("`center = 66`"), random_answer_order = TRUE, allow_retry = TRUE ) ``` ### Exercise 2 Consider the code ```{r density-fiddles, exercise = TRUE} library(ggplot2) ggplot(faithful, aes(x = eruptions)) + geom_density(---) ``` ```{r density-fiddles-question, echo = FALSE} question( paste("Which of the following replacements for `---` produces a density", "plot with the area under the density in blue and no black border?"), answer("`color = \"lightblue\"`"), answer("`fill = \"black\", color = \"lightblue\"`"), answer("`fill = \"lightblue\", color = NA`", correct = TRUE), answer("`fill = NA, color = \"black\"`"), random_answer_order = TRUE, allow_retry = TRUE ) ``` ### Exercise 3 Consider the code ```{r violin-fiddles, exercise = TRUE} library(ggplot2) library(gapminder) p <- ggplot(gapminder, aes(y = continent, x = lifeExp)) ``` ```{r violin-fiddles-question, echo = FALSE} question( paste("Which of the following produces violin plots without trimming", "at the smallest and largest observations, and including a line", "at the median?"), answer("`p + geom_violin(trim = FALSE)`"), answer("`p + geom_violin(trim = TRUE, show_median = TRUE)`"), answer("`p + geom_violin(trim = FALSE, draw_quantiles = 0.5)`", correct = TRUE), answer("`p + geom_violin(trim = TRUE, show_quantiles = 0.5)`"), random_answer_order = TRUE, allow_retry = TRUE ) ``` ### Exercise 4 Density ridges can also show quantiles, but the details of how to request this are different. Consider this code: ```{r density-ridges-fiddles, exercise = TRUE} library(ggplot2) library(ggridges) library(gapminder) ggplot(gapminder, aes(x = lifeExp, y = year, group = year)) + geom_density_ridges(---) ``` ```{r density-ridges-fiddles-question, echo = FALSE} question( paste("Which of the following replacements for `---` produces density", "ridges with lines showing the locations of the medians?"), answer("`quantiles = 0.5`"), answer("`quantile_lines = TRUE, quantiles = 0.5`", correct = TRUE), answer("`quantile_lines = TRUE`"), answer("`draw_quantiles = 0.5`"), random_answer_order = TRUE, allow_retry = TRUE ) ```