Guidelines

You will submit your homework as an R Markdown (.Rmd) file by committing to your git repository and pushing to GitLab. We will knit this file to produce the .html output file (you do not need to submit the .html, but you should make sure that it can be produced successfully).

We will review both your .Rmd file and the .html file. To receive full credit:

• You must submit your .Rmd file on time. It must be named exactly as specified, and it must knit without errors to produce a .html file.

• The .html file should read as a well written report, with all results and graphs supported by text explaining what they are and, when appropriate, what conclusions can be drawn. Your report should not contain any extraneous material, such as leftovers from a template.

• The R code in your .Rmd file must be clear, readable, and follow the coding standards.

• The text in your .Rmd file must be readable and use R markdown properly, as shown in the class template file.

Create a new folder called HW2 in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s dir.create function, or using a shell.

In this folder, create a new Rmarkdown file called hw2.Rmd. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using git in a shell you will need to use git add before git commit).

In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.

1. Average Infant Mortality Rate by Continent in 2015

The dslabs package contains a somewhat larger version of the GapMinder data as data frame gapminder. In this problem you will use the gapminder data from dslabs to create a table of the average infant mortality rate, in deaths per 1000, for each continent for the year 2015, the most recent year with reasonably complete data in the data set.

Your solution report should show a nicely formatted table along with a description of what is being shown. Your solution should not show any code. Be sure to use an appropriate number of digits in your table.

You can use data(gapminder, package = "dslabs") to make the data available as a variable in your workspace. If you are working on your own computer you may need to install the dslabs package first (do not do this in your .Rmd file). You can use the dplyr function filter for selecting the subset of data for 2015, and group_by and summarize for computing the averages for each continent. As in the previous assignment, you can use knitr::kable and kableExtra::kable_styling to produce a nicely formatted table for you report.

As there are missing values you may want to use na.rm = TRUE in computing means.

2. Average Infant Mortality Rate by Continent Over the Years

In this problem you will use the same data as in the previous problem to create a line graph of the average infant mortality rate against years, with a separate line for each continent.

Your solution report should show the graph along with a description of what is being shown. Comment on any interesting features you see. Your solution should not show any code.

To compute averages for each year and continent you will need to group by both year and continent. For the graph, mapping the color aesthetic in geom_line to continent will produce separate lines, along with a legend.

Again you may want to use na.rm = TRUE in computing means and also with geom_line.

3. Mauna Loa Atmospheric CO$$_2$$ Concentration

The NOAA Earth System Research Laboratory provides data on monthly mean carbon dioxide (CO$$_2$$) measured at the Mauna Loa Observatory in Hawaii. Measurements are in parts per million. In this problem you will read in the monthly data, compute yearly averages, and show the results as a line graph.

Use the data file available at

https://www.stat.uiowa.edu/~luke/data/co2-2021.csv

Your solution report should show the graph, explain what data the graph is showing, and point out any important features you see in the graph. Your solution should not show any code.

In you code you will need to do the following:

• Read the data file. You can use read.csv to read directly from the URL. There are some missing values, coded as ***. You can read the data with read.csv and then convert the character columns to numeric. Since the missing value code is known, you can also call read.csv with na.strings = "***" to have the missing values taken care of during reading.

• The data are in wide format. You need to convert it to a long, or tidy, format with three variables: year, month, and co2. You can do this using pivot_longer.

• You can compute yearly averages using group_by and summarize. Since there are missing values, you may want to call the mean function with na.rm = TRUE to compute means based on the available data.

• You can create the plot using geom_line.

The data processing steps needed here are similar to the ones for the global average surfaces example.

Create an HTML File and Commit Your Work

You can create an HTML file in RStudio using the Knit tab on the editor window. You can also use the R command

rmarkdown::render("hw2.Rmd")

with your working directory set to HW2.

Commit your changes to your hw2.Rmd file to your local git repository. You do not heed to commit your HTML file.

Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully