Files and Folders

It is essential that you name your folders and files exactly as specified. We run checks like

cd HW2
Rscript -e 'rmarkdown::render("hw2.Rmd")'

from the top of a clone of your repository. If the folders and files are not named exactly as specified these checks will fail and your work will not be graded.

Rmarkdown Usage and Coding Style

Make sure you are using Rmarkdown properly, with explanatory texts surrounding short code chunks. In particular you should not have just one big code chunk.

Your rendered HTML page should be a report with text supporting numerical and graphical results. Code only needs to be visible if you are explaining how to do something (which is a goal of the class notes).

Your Rmarkdown code and your R code should be readable, and the R code should follow the coding standards. This makes maintaining your code and document easier.

Name and Date

Make sure your Rmarkdown file header contains a name: field with your name. A a date: field with an appropriate date is also helpful.

Your header should look something like this:

title: "HW1"
output: html_document
name: "Your  Name"
date: "February 1, 2019"

You can also use one of these as the date line to produce the current date when the document is knit:

date: "`r Sys.Date()`"
date: "`r format(Sys.Date(), '%B %e, %Y')`"

Handling Data Files

If your Rmarkdown document makes use of an external data file you need to make sure it can be accessed when someone you give your repository to renders your file. There are several options:

Relying on retrieving a file from the network means it may change or be removed. In some cases this will be what you want, in others maybe not.

3. Find a Better Visualization

The use of a non-zero baseline in the visualization

is misleading since the viewers attention is drawn to the length of the bars, which suggest a much larger relative change than actually present in the data. Using a zero baseline accurately reflects the relative changes:

You could also use a dot plot or a line plot. For a dot plot starting the value axis at the origin is not as important, but doing so still makes sense as the comparison of primary interest is the relative change (ratio data).

Whether the numbers themselves make sense is also worth considering: the values seem very high.

4. Average Life Expectancies

The subset of the data for years since 1990 can be extracted with the base function subset or with the filter function from dplyr:

gap1990 <- filter(gapminder, year >= 1990)

The dplyr functions group_by and summarize can then be used to compute the average life expectancies for each continent:

s <- summarize(group_by(gap1990, continent),
               avg_lifeExp = mean(lifeExp))

One way to display this nicely in an Rmarkdown document is to use kable from the knitr package:

knitr::kable(s, digits = 2)
continent avg_lifeExp
Africa 53.84
Americas 71.69
Asia 68.63
Europe 76.07
Oceania 78.90

The `kableExtra package allows table formatting to be customized in lots of ways. There are a number of other packages for making nice-looking tables.

A dot plot and a bar chart:

p1 <- ggplot(s) + geom_point(aes(x = avg_lifeExp, y = continent))
p2 <- ggplot(s) +
    geom_bar(aes(x = continent, y = avg_lifeExp),
             stat = "identity", width = 0.2) +
grid.arrange(p1, p2, nrow = 1)

The bar chart emphasizes the ratio comparisons: average life expectancy for Africa is about 2/3 of the value for Oceania; the relative differences among Asia, Europe, the Americas, and Oceania are much smaller.

Without Africa and Oceania the relative differences are small and a common way to express comparisons is to say that average life expectancy in Europe is about 4 years higher than in the Americas. This comparison is made easier by a dot chart.