Guidelines

You will submit your homework as an R Markdown (.Rmd) file by committing to your git repository and pushing to GitLab. We will knit this file to produce the .html output file (you do not need to submit the .html, but you should make sure that it can be produced successfully).

We will review both your .Rmd file and the .html file. To receive full credit:

Create a new folder called HW5 in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s dir.create function, or using a shell.

In this folder, create a new Rmarkdown file called hw5.Rmd. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using git in a shell you will need to use git add before git commit).

In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.

1. New York City Airport Names

This problem refers to the data provided in the nycflights13 package. Airport codes for the three New York City airports can be computed from the origin variable in the flights table packages using unique(). Use filter() and select() on the airports table to create a table containing the airport codes and the airport names for these three airports and show the result as a nicely formatted table.

2. Average and Median Departure Delays

Continuing with the nycflights13 data, using the flights table compute average and median departure delays for each of the three New York City airports, omitting missing values. Present the results as a nicely formatted table and comment on the results.

3. Air Time Distributions

Use density plots to compare the distributions of the air time (as recorded in the air_time variable) for flights originating from each of the three New York City airports. What differences do you see?

There are several options for displaying the densities:

Consider all three approaches and comment on their advantages and disadvantages.

The default bandwidth used by geom_density() and geom_density_ridges() may be too narrow; a larger bandwidth of, say, 50 may be better. The bw argument can be used to specify a different bandwidth. These examples specify a narrower bandwidth for the barley data:

library(ggplot2)
data(barley, package = "lattice")
ggplot(barley, aes(x = yield)) +
    geom_density(bw = 1) +
    facet_wrap(~site, ncol = 1)
library(ggridges)
ggplot(barley, aes(x = yield, y = site)) +
    geom_density_ridges(aes(height = after_stat(density)),
                        stat = "density", bw = 1)

4. Highway Fuel Economy Over the Years, Revisited

In Problem 4 of Assignment 4 you created a strip plot showing highway fuel economy values for each of the years from 2000 through 2022. Compare your result to three other options:

Comment on the advantages and disadvanteges of each approach in this case.

Create an HTML File and Commit Your Work

You can create an HTML file in RStudio using the Knit tab on the editor window. You can also use the R command

rmarkdown::render("hw5.Rmd")

with your working directory set to HW5.

Commit your changes to your hw5.Rmd file to your local git repository. You do not heed to commit your HTML file.

Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully