You will submit your homework as an R Markdown (.Rmd
) file by committing to your git
repository and pushing to GitLab. We will knit this file to produce the .html
output file (you do not need to submit the .html
, but you should make sure that it can be produced successfully).
We will review both your .Rmd
file and the .html
file. To receive full credit:
You must submit your .Rmd
file on time. It must be named exactly as specified, and it must knit without errors to produce a .html
The .html
file should read as a well written report, with all results and graphs supported by text explaining what they are and, when appropriate, what conclusions can be drawn. Your report should not contain any extraneous material, such as leftovers from a template.
The R code in your .Rmd
file must be clear, readable, and follow the coding standards.
The text in your .Rmd
file must be readable and use R markdown properly, as shown in the class template file.
Create a new folder called HW6
in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s dir.create
function, or using a shell.
In this folder, create a new Rmarkdown file called hw6.Rmd
. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using git
in a shell you will need to use git add
before git commit
In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.
The data set heights
in package dslabs
contains self-reported heights for a number of female and male students. You can load the data set with
data(heights, package = "dslabs")
Construct a density plot showing the densities of the height distributions for males and for females. Also construct an eCDF plot showing the empirical cumulative distributions for the heights of the two groups. (You can do this using stat_ecdf
and mapping x
to height
and color
to sex
Comment on what features are easier to see in one plot or the other.
For the EPA data used in the last two assignments compute the average highway gas mileage and average city gas mileage for each manufacturer’s vehicles for the year 2024 in the data set. Select the manufacturers with the top five average highway gas mileage values and show the results as a nicely formatted table. The rows should be arranged in descending order of the average highway gas mileage value. [You can compute the results using filter
, grouped summarize
, slice_max
and arrange
Do not commit the
file to your repository as it is quite large. Use the approach shown in Assignment 4 instead.
For the nycflights13
data identify the top four destinations with the most flights to them from New York City in 2013. For each of these four destinations find the proportion of flights that originate from each of the three New York City airports. Show the results as a faceted bar chart, with one panel for each of the four destinations. [After computing the top destinations with count
and slice_max
you can use filter
or semi_join
to select the flights to those destinations, and then find the proportions with a count
followed by a grouped mutate
For the nycflights13
data find the destinations for which there are only flights in the months June, July, and August from the three New York City ariports. Present the result in a nicely formatted table that shows the three-letter airport code, the airport name from the airports
table, and the number of flights to each of these destinations. [One approach: After using filter
to select the summer flights and another filter
to select the non-summer flights you can use anti_join
to find the summer flights with destinations only flown to in summer and then semi_join
to find the corresponding entries in the airports
table. You can then bring in counts from a counts table with a left_join
For the nycflights13
data identify the destination airports at an altitude of more than 5,000 feet and compute how many flights there were to each from New York City in 2013. Present the results as a nicely formatted table.
You can create an HTML file in RStudio using the Knit
tab on the editor window. You can also use the R command
with your working directory set to HW6
Commit your changes to your hw6.Rmd
file to your local git repository. You do not heed to commit your HTML file.
Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully