## General Issues

• Make sure your file names and file references use identical spelling, including upper/lower case. Your code will fail on a case-sensitive file system if you don’t.

• Make sure to commit your work to your local repository and push your commits to GitHub. We can only see what is on GitHub, not what is on your computer. You can check what we see by going to the GitHub web interface.

• Include your name and the data in the header of your .Rmd file using author: and date: tags. You can use an inline chunk to have the date computed when the document is rendered. Your header should look something like this:

---
title: "Assignment 3"
author: "Fred Frog"
date: "r Sys.Date()"
output: html_document
---

• Any graph you show should be discussed in your narrative.

• Any code you show should be discussed in your narrative.

• If you do not need to discuss a piece of code in the narrative, use echo FALSE to avoid showing it.

• Your .Rmd file, and possibly supporting .R files, contain the code for your analysis.

• If you need to update your code, or if a collaborator needs to update your code, that work will be done in your .Rmd file.

• You should make sure the code in your .Rmd file is readable.

• Following the coding standards helps with this.

• Please indent by 4 spaces for each level. I find this the most readable option.

• If you read a data file in your code make sure that you read it in a way that will work for someone else using your repository. If you want to read from a local file:

• Read the file with a relative path name, assuming your working directory will be the directory containing your Rmd file.

## 1. Life Expectancy Distribution by Continent

The subset of the data for years since 1990 can be extracted using the filter function from dplyr:

library(gapminder)
library(dplyr)
gap1990 <- filter(gapminder, year >= 1990)

A faceted display using ggplot and facet_wrap:

library(ggplot2)
ggplot(gap1990, aes(x = lifeExp)) + geom_density() + facet_wrap(~continent)

## 2. Boxplots of Life Expectancy by Continent

Boxplots for the same data:

ggplot(gap1990) + geom_boxplot(aes(x = continent, y = lifeExp))

## 3. Ridgeline Plots of Life Expectancy

Density ridges for the 12 years show that overall life exectancy distributions have shifted upwards.

ggplot(gapminder) +
geom_density_ridges(aes(x = lifeExp, y = year, group = year))
## Picking joint bandwidth of 3.88

The distribution shape has changed from skewed right in 1952 to skewed left in 2007. Adding lines at the medians emphasises this shift:

ggplot(gapminder) +
geom_density_ridges(aes(x = lifeExp, y = year, group = year),
quantile_lines = TRUE, quantiles = 2)
## Picking joint bandwidth of 3.88

Separating the distributions by continent shows some striking differences:

ggplot(mutate(gapminder, continent = reorder(continent, -lifeExp))) +
geom_density_ridges(aes(x = lifeExp, y = year,
group = interaction(year, continent),
fill = continent), scale = 1.3, alpha = 0.8)
## Picking joint bandwidth of 2.24

Life expectancy is highest among European countries, with a steady increase over the years and consistently low variability among countries. Variability in life expectancy among the Americas has decreased and overall levels have increased, but remain below those for Europe. Life expectancy among countries in Asia has improved overall, but variability among the countries remains substantially higher than among European countries. Variability among African countries has increased, with some at life expectancy levels comparable to the Americas but the bulk remaining quite a bit lower.

## 4. Find a Better Visualization

The original:

Some issues:

• The white bars are supposed to represent the numbers, but are not using a zero base line – the bar for Obama’s 79 % whould be nearly twice as long as the bar for Trump’s 40 %.
• The blue and red bars are distracting at best, misleading at worst. They could represent the complementary proportion, but the lengths are wrong relative to the white bars and to each other.
• The placement of the GMA logo adds to the confusion.

A simple bar chart with a zero base line:

d <- data.frame(pres = c("Obama", "Carter", "Clinton",
"G.W. Bush", "Reagan", "G.H.W Bush", "Trump"),
appr = c(79, 78, 68, 65, 58, 56, 40),
party = c("D", "D", "D", "R", "R", "R", "R"),
year = c(2009, 1977, 1993, 2001, 1981, 1989, 2017))
d <- mutate(d, pres = reorder(pres, appr))

p <- ggplot(d, aes(x = pres, y = appr, fill = party)) +
geom_col() + coord_flip()
p

• In recent years it has become common to represent Democrats as blue, Republicans as red.

• The default colors are close to red and blue, but their use is opposite to current convention.

This can be changed using scale_fill_manual:

p + scale_fill_manual(values = c(R = "red", D = "blue")) 

• Pure colors are very intense when used in larger areas.

• Pure warm colors, like red, are more intense than pure cool colors, like blue.

We can reduce the saturation and the value in the HSV color representation to obtain less intense colors; this is commonly used in red state/blue state maps:

myred <- hsv(0, 0.6, 0.8)
myblue <- hsv(2 / 3, 0.6, 0.8)
p + scale_fill_manual(values = c(R = myred, D = myblue)) 

Some enhancements:

p + scale_fill_manual(values = c(R = myred, D = myblue)) + theme_void() +
geom_text(aes(y = 3, label = pres),
size = 8, hjust = "left", color = "white") +
geom_text(aes(y = appr - 3, label = appr),
size = 8, hjust = "right", color = "white")

Some notes:

• A dot chart is a reasonable alternative in this case.

• Horizontal bar charts are the norm in these settings since they allow horizontal labels of reasonable size.

• Party is a nominal or categorical attribute, not a numeric attribute.