## Add a Directory HW4

Add a directory HW4 to your repository. This is where you will put the rest of your work for this assignment.

Be sure to commit and push your changes.

## 1. Life Expectancy and GDP Per Capita

Using the gapminder data from the gapminer package create a set of scatter plots of life expectancy against GDP per capita for each of the years 1977, 1987, 1997, and 2007. In these plots use appropriate aesthetics to encode the continent and population size. The scale_size_area function may be useful.

## 2. Fuel Economy

The mpg data set in the ggplot2 package provides data on fuel economy and various other characteristics of vehicles tested by the EPA.

• Make a scatter plot of the city fuel economy level against the engine displacement.

• In a second plot use color or symbol shape to reflect the number of cylinders in each vehicle’s engine. It is best to treat the number of cylinders as a categorical variable.

• In a third plot use color or symbol shape to reflect whether the vehicle has a manual or an automatic transmission. You will need to extract this from the trans variable; using the substr function is one way to do this.

• Using a bar chart show how the split between manual and automatic transmissions varies between vehicles with different numbers of cylinders.

In each case comment on what you see in the plots.

## 3. Fuel Economy Again

The mpg data set is rather old: the newest model is from 2008. Newer data is available from the EPA. A compressed CSV file for the years 1984–2018 is available locally and can be downloaded and read in with

library(readr)
if (! file.exists("vehicles.csv.zip"))
"vehicles.csv.zip")
newmpg <- read_csv("vehicles.csv.zip", guess_max = 100000)

Please do not commit the vehicles.csv.zip file to your repository as it is quite large.

The data set contains over 80 variables. Read the documentation for the data to identify the variables that correspond to the variables you used from the mpg data set. Then recreate your plots form the previous problem for vehicles from the model years 2009 to the present.

Comment on any interesting features and any difference you see compared to the older data.

The fct_recode function from package forcats may be useful. (Examples of the use of fct_recode are in the Factors chapter of R for Data Science)

## Submit Your Work

Write up your work in an Rmarkdown document called hw4.Rmd in your HW4 folder, and commit it to your local repository. You can commit the hw4.html as well but you do not need to.

Submit your work by pushing your local repository changes to your remote repository.

## Make Sure Your Work Is Reproducible

One of the goals of using git, GitHub, and Rmarkdown is for you to practice creating a framework that you can hand to someone else to reproduce your analysis. This means, among other things, that

• you should not make use of files from your computer outside of your repository;

• you should not rely on being able to change the working directory to your home directory;

• you should assume that your code might be run an a case-sensitive file system.

If you are working on Windows or a Mac a good test is to go to our Linux systems, check out your repository, and check that your hw4.Rmd file can be rendered successfully to produce hw4.html. You can do this using the RStudio menus or from the R command line with

rmarkdown::render("hw4.Rmd")

Make sure your working directory is HW4 for this.

You can also run this command in a shell in your HW4 directory:

Rscript -e 'rmarkdown::render("hw4.Rmd")'

Make sure you use the right combination of single and double quotes.