You will submit your homework as an R Markdown (
.Rmd) file by committing to your
git repository and pushing to GitLab. We will knit this file to produce the
.html output file (you do not need to submit the
.html, but you should make sure that it can be produced successfully).
We will review both your
.Rmd file and the
.html file. To receive full credit:
You must submit your
.Rmd file on time. It must be named exactly as specified, and it must knit without errors to produce a
.html file should read as a well written report, with all results and graphs supported by text explaining what they are and, when appropriate, what conclusions can be drawn. Your report should not contain any extraneous material, such as leftovers from a template.
The R code in your
.Rmd file must be clear, readable, and follow the coding standards.
The text in your
.Rmd file must be readable and use R markdown properly, as shown in the class template file.
Create a new folder called
HW5 in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s
dir.create function, or using a shell.
In this folder, create a new Rmarkdown file called
hw5.Rmd. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using
git in a shell you will need to use
git add before
In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.
This problem refers to the data provided in the
nycflights13 package. Airport codes for the three New York City airports can be computed from the
origin variable in the
flights table packages using
select() on the
airports table to create a table containing the airport codes and the airport names for these three airports and show the result as a nicely formatted table.
Continuing with the
nycflights13 data, using the
flights table compute average and median departure delays for each of the three New York City airports, omitting missing values. Present the results as a nicely formatted table and comment on the results.
Use density plots to compare the distributions of the air time (as recorded in the
air_time variable) for flights originating from each of the three New York City airports. What differences do you see?
There are several options for displaying the densities:
alphato distinguish the distributions;
Consider all three approaches and comment on their advantages and disadvantages.
The default bandwidth used by
geom_density_ridges() may be too narrow; a larger bandwidth of, say, 50 may be better. The
bw argument can be used to specify a different bandwidth. These examples specify a narrower bandwidth for the
library(ggplot2) data(barley, package = "lattice") ggplot(barley, aes(x = yield)) + geom_density(bw = 1) + facet_wrap(~site, ncol = 1) library(ggridges) ggplot(barley, aes(x = yield, y = site)) + geom_density_ridges(aes(height = after_stat(density)), stat = "density", bw = 1)
Comment on the advantages and disadvanteges of each approach in this case.
You can create an HTML file in RStudio using the
Knit tab on the editor window. You can also use the R command
with your working directory set to
Commit your changes to your
hw5.Rmd file to your local git repository. You do not heed to commit your HTML file.
Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully