--- ############################################################# # # # In RStudio click on "Run Document" to run this tutorial # # # ############################################################# title: "Visualizing Proportions" author: "Luke Tierney" output: learnr::tutorial runtime: shiny_prerendered --- ```{r setup, include = FALSE} library(learnr) library(tidyverse) library(palmerpenguins) knitr::opts_chunk$set(echo = FALSE, comment = "", warning = FALSE) ``` ```{r stop_when_browser_closes, context = "server"} # stop the app when the browser is closed (or, unfortunately, refreshed) session$onSessionEnded(stopApp) ``` ## More Palmer Penguins For some more examination of the Palmer penguins data start with a bar chart of the number of birds on each island, and use a stacked bar chart to show the breakdown by sex. ```{r island-sex-bar, exercise = TRUE} ``` ```{r island-sex-bar-solution} ggplot(penguins) + geom_bar(aes(x = island, fill = sex)) ``` The value of `sex` is missing for a number of birds. If we want to focus on comparing the number of males and females it may make sense to omit these birds from the sample. Using `na.omit()` is one way to do this. To consider both island and species use faceting. for example with a bar chart of the counts per species in separate facets for the islands. Again start with a stacked bar chart to show the breakdown by sex. ```{r species-sex-island-bar, exercise = TRUE} ``` ```{r species-sex-island-bar-solution} na.omit(penguins) %>% ggplot() + geom_bar(aes(x = species, fill = sex)) + facet_wrap(~ island, nrow = 1) ``` An alternative to the stacked bar chart is a clustered, or side-by-side, bar chart. You can create one using `position = "dodge"`, Try this out and think about which approach works better if you want to make it easy to compare the proportions of males and females. ```{r species-sex-island-bar-dodge, exercise = TRUE} ``` ```{r species-sex-island-bar-dodge-solution} na.omit(penguins) %>% ggplot() + geom_bar(aes(x = species, fill = sex), position = "dodge") + facet_wrap(~ island, nrow = 1) ``` A stacked bar chart showing the counts per island broken down by species is produced by ```{r island-species-stack, exercise = TRUE} ggplot(penguins) + geom_bar(aes(x = island, fill = species)) ``` Modify this to produce a clustered bar chart. ```{r island-species-dodge, exercise = TRUE} ``` ```{r island-species-dodge-solution} ggplot(penguins) + geom_bar(aes(x = island, fill = species), position = "dodge") ``` This may look a little odd since each island's sample only contains some of the species. One option is to use `position_dodge()` with `preserve = "single"`. ```{r island-species-preserve, exercise = TRUE} ``` ```{r island-species-preserve-solution} ggplot(penguins) + geom_bar(aes(x = island, fill = species), position = position_dodge(preserve = "single")) ``` Another option is to create the counts with `count()` and add zero counts using `complete()`. ```{r island-species-complete, exercise = TRUE} ``` ```{r island-species-complete-solution} count(penguins, island, species) %>% complete(island, species, fill = list(n = 0)) %>% ggplot() + geom_col(aes(x = island, y = n, fill = species), position = "dodge") ``` ## Exercises ### Exercise 1 Figure A shows a bar char of the flights leaving NYC airports in 2013 for each day of the week. Figure B shows the market share of five major internet browsers in 2015. ```{r, message = FALSE, echo = FALSE, fig.height = 4, fig.width = 8} library(lubridate) library(dplyr) library(nycflights13) library(ggplot2) library(patchwork) thm <- theme_minimal() + theme(text = element_text(size = 15)) p1 <- mutate(flights, date = make_date(year, month, day), wday = wday(date, label = TRUE, abbr = TRUE)) %>% ggplot(aes(x = wday)) + geom_bar(fill = "deepskyblue3") + scale_y_continuous(expand = c(0, 0)) + labs(title = "Flights from NYC in 2013", subtitle = "By Day of the Week", caption = "Figure A", x = NULL, y = "Number of Flights") + thm browsers2015 <- data.frame(Browser = c("Opera", "Safari", "Firefox", "Chrome", "IE"), share = c(2, 22, 21, 27, 29)) p2 <- ggplot(browsers2015, aes(x = Browser, y = share)) + geom_col(fill = "deepskyblue3") + scale_y_continuous(expand = c(0, 0)) + labs(title = "Browser Market Share", subtitle = "2015", caption = "Figure B", x = NULL, y = "Percent") + thm p1 | p2 ``` ```{r reorder-bars-question} question( paste("For which of these bar charts would it be better to reorder the", "categories so the bars are ordered from largest to smallest?"), answer("Yes for Figure A. No for Figure B."), answer("No for Figure A. Yes for Figure B.", correct = TRUE), answer("Yes for both."), answer("No for both."), random_answer_order = TRUE, allow_retry = TRUE ) ``` ### Exercise 2 Consider the following stacked bar chart and spine plot data: ```{r} library(dplyr) library(ggplot2) library(ggmosaic) ecols <- c(Brown = "brown2", Blue = "blue2", Hazel = "darkgoldenrod3", Green = "green4") HairEyeColorDF <- as.data.frame(HairEyeColor) p0 <- ggplot(HairEyeColorDF) + scale_fill_manual(values = ecols) + theme_minimal() p1 <- p0 + geom_col(aes(x = Hair, y = Freq / sum(Freq), fill = Eye)) p2 <- p0 + geom_mosaic(aes(x = product(Hair), fill = Eye, weight = Freq)) p1 + p2 ``` ```{r bar-spine-question} question( "Which hair color has the highest proportion of individuals with green eyes?", answer("Black"), answer("Brown"), answer("Red", correct = TRUE), answer("Blond"), random_answer_order = TRUE, allow_retry = TRUE ) ``` ### Exercise 3 Use the plots of the previous question to answer: ```{r red-hair-question} question( "The proportion of individuals with red hair is closest to:", answer("5%"), answer("8%"), answer("12%", correct = TRUE), answer("20%"), random_answer_order = TRUE, allow_retry = TRUE ) ```