## Guidelines

You will submit your homework as an R Markdown (.Rmd) file by committing to your git repository and pushing to GitLab. We will knit this file to produce the .html output file (you do not need to submit the .html, but you should make sure that it can be produced successfully).

We will review both your .Rmd file and the .html file. To receive full credit:

• You must submit your .Rmd file on time. It must be named exactly as specified, and it must knit without errors to produce a .html file.

• The .html file should read as a well written report, with all results and graphs supported by text explaining what they are and, when appropriate, what conclusions can be drawn. Your report should not contain any extraneous material, such as leftovers from a template.

• The R code in your .Rmd file must be clear, readable, and follow the coding standards.

• The text in your .Rmd file must be readable and use R markdown properly, as shown in the class template file.

Create a new folder called HW8 in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s dir.create function, or using a shell.

In this folder, create a new Rmarkdown file called hw8.Rmd. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using git in a shell you will need to use git add before git commit).

In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.

## 1. Abrasion Loss in Rubber Samples

Data was collected in an experiment to investigate how the resistance of rubber to abrasion is affected by the hardness of the rubber and its tensile strength. For each of 30 rubber samples the hardness and tensile strength were measured, the sample was subjected to an abrasion test, and the amount of rubber lost was recorded.

The data can be read with

read.csv("http://www.stat.uiowa.edu/~luke/data/abrasion.csv")

The variables are hardness (in degree Shore), tensile.strength (in kg per square meter), and abrasion.loss (in in gram per hour).

• Create a scatterplot matrix for the data and comment on any features you see. [You can use the base function pairs(), the function splom() from package lattice, or the function ggpairs() from package GGally. Make sure to take into account that abrasion.loss is the response variable. ]

• Create a faceted coplot of abrasion loss against tensile strength conditioned on 4 levels of hardness. You can use cut_number(hardness, 4) to determine the conditioning facets. Describe the relationship between abrasion loss and the two explanatory factors that this reveals. Showing the muted full data in the panels may help. Showing a smooth or linear regression fit may help as well.

## 2. Arrival and Departure Delays

This problem uses the New York City 2013 flights data to explore how arrival delay at the destination is related to departure delay. It provides some practice in creating useful scatter plots for a large data set.

• To start, since there are over 300,000 flights in the data set it is useful to work with a sample. Use sample_frac() from dplyr to select a 10% sample of the rows of the flights data frame.

• Create a scatterplot of arrival delay against departure delay for the sample and comment on what you see. Use appropriate point size and alpha levels to illustrate your conclusions.

• Now focus on the flights in the sample with departure delays of at most 30 minutes. Create a scatterplot of arrival delay against departure delay for this data set. Use an appropriate combination of point size, alpha level, and possibly jittering, to illustrate your conclusions. This plot may need different settings than the previous one. Comment on what you see. You may find it helpful to use an additional plot or two to illustrate your findings.

• Add marginal density plots and 2D density contours to your plot for departure delays of at most 30 minutes. Comment on what you see and whether these additions are helpful. [ggMarginal in package ggExtra adds marginal density plots. The number of contours shown by geom_density can be changed with the bins argument.]

## 3. Wind Speed, Time of Day, and Departure Delays

This problem again uses the New York City 2013 flights data.

The weather table provides hourly weather data for the three NYC airports for the year 2013. In this question you will look at the relationship between wind speed, time of day, and departure delays.

• Use a join operation to merge the weather data into the flights table. You will need multiple variables for your key. The flights data help page suggests one possibility in the documentation for the time_hour variable. Check that your key is a proper primary key for the weather table (uniquely identified rows, no missing values).

• To screen out seasonal effects consider only flights departing in June, July, and August. Remove the rows where wind speed and departure delay are missing. [Using drop_na() from tidyr is one way to do this.] Also remove rows where wind speed is greater that 30 mph [There are only a few departures with wind speeds above 30 mph, and dropping them leads to better plot aspect ratios].

• A scatter plot of dep_delay againstwind_speed is not very helpful because of over-plotting and outliers. As there are only a modest number of distinct levels for wind speed, one option is to compute average delays at each wind speed level and plot the result. It can also help to add a smooth. [Since the number of cases contributing to each average varies, it is a good idea to compute the number of cases as n = n() in the summarize() step and create the smooth with geom_smooth(aes(weight = n))]. Construct such a plot and comment on what you see.

• Time of day might also affect average departure delays. Compute average departure delays for each hour and wind_speed value combination and construct a coplot of average departure delay against hour, conditioned on wind_speed using 6 bins for wind_speed. Again it is helpful to include a smooth computed with appropriate weights. Comment on what your see.

## Create an HTML File and Commit Your Work

You can create an HTML file in RStudio using the Knit tab on the editor window. You can also use the R command

rmarkdown::render("hw8.Rmd")

with your working directory set to HW8.

Commit your changes to your hw8.Rmd file to your local git repository. You do not heed to commit your HTML file.

Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully