## Add a Directory HW6

Add a directory HW6 to your repository. This is where you will put the rest of your work for this assignment.

Be sure to commit and push your changes.

## 1. Fleet City Gas Mileage

For the EPA fuel efficiency data data used in HW4:

• Identify the 5 manufacturers with the highest average city gas mileage for their non-electric vehicles in 2018.

• For these manufacturers show graphically how the average city gas mileage of their non-electric vehicles has changed from 2009 through 2018.

## 2. Arrival Delays and Cancellations

Using the flights data set in the nycflights13 package examine when you should leave if you want to make it to your destination on time:

• Find the average arrival delay for each departure hour and each of the three airports and plot the results. What time of departure seems to be more favorable? The hour variable contains the departure hour; you could also compute it as sched_dep_time %/% 100.

• Find the proportion of flights scheduled to leave in each departure hour and from each air port that are canceled. Assume that canceled flights are the ones where both departure time and arrival time are missing. Plot the results and comment on what you see.

• Do your conclusions change if you consider flights of more and less than 1000 miles separately?

## 3. Departure Delays and Wind Speed

This problem uses the data in the nycflights13 package.

The weather table provides hourly weather data for the three NYC airports for the year 2013. In this question you will look at the relationship between wind speed and departure delays.

• Examine the wind_speed variable in the weather table. There are some extreme values that do not seem plausible. Replace these with NA values.

• Use a join operation to merge the weather data into the flights table. You will need multiple variables for your key. The flights data help page suggests one possibility in the documentation for the time_hour variable. Check that your key is a proper primary key for the weather table (uniquely identified rows, no missing values).

• A scatter plot of dep_delay againstwind_speed is not very helpful because of over-plotting and outliers. As there are only a modest number of distinct levels for wind speed, one option is to compute average delays at each wind speed and plot the result. Construct such a plot and comment on what you see.

• The number of departures used to compute the averages at each wind speed varies quite a bit. You can encode the number of departures represented by each point in the point size. Construct such a plot; does it change your conclusion?

Write up your work in an Rmarkdown document called hw6.Rmd in your HW6 folder, and commit it to your local repository. You can commit the hw6.html file as well but you do not need to.

## Make Sure Your Work Is Reproducible

One of the goals of using git, GitHub, and Rmarkdown is for you to practice creating a framework that you can hand to someone else to reproduce your analysis. This means, among other things, that

• you should not make use of files from your computer outside of your repository;

• you should not rely on being able to change the working directory to your home directory;

• you should assume that your code might be run an a case-sensitive file system.

If you are working on Windows or a Mac a good test is to go to our Linux systems, check out your repository, and check that your hw6.Rmd file can be rendered successfully to produce hw6.html. You can do this using the RStudio menus or from the R command line with

rmarkdown::render("hw6.Rmd")

Make sure your working directory is HW6 for this.

You can also run this command in a shell in your HW6 directory:

Rscript -e 'rmarkdown::render("hw6.Rmd")'

Make sure you use the right combination of single and double quotes.