You will submit your homework as an R Markdown (.Rmd
) file by committing to your git
repository and pushing to GitLab. We will knit this file to produce the .html
output file (you do not need to submit the .html
, but you should make sure that it can be produced successfully).
We will review both your .Rmd
file and the .html
file. To receive full credit:
You must submit your .Rmd
file on time. It must be named exactly as specified, and it must knit without errors to produce a .html
file.
The .html
file should read as a well written report, with all results and graphs supported by text explaining what they are and, when appropriate, what conclusions can be drawn. Your report should not contain any extraneous material, such as leftovers from a template.
The R code in your .Rmd
file must be clear, readable, and follow the coding standards.
The text in your .Rmd
file must be readable and use R markdown properly, as shown in the class template file.
Create a new folder called HW7
in your repository. Use exactly this spelling with upper case letters. You can do this in the RStudio IDE, with R’s dir.create
function, or using a shell.
In this folder, create a new Rmarkdown file called hw7.Rmd
. Again use exactly this spelling. RStudio will give you a template, or you can use the one available here. Commit your new file to your repository. (If you are using git
in a shell you will need to use git add
before git commit
).
In this file present your answers to the following problems. Your presentation should follow the pattern and guidelines in the class template file.
For an article on the 2020 presidential election results in Iowa and surrounding states you are asked to suggest a plot that shows the proportion of votes going to each of the candidates. The plot should primarily allow comparisons of the proportions going to the candidates in different states, but should also reflect the varying vote totals in these states. Some possible choices are:
A stacked bar chart with states mapped to the x
axis and candidates mapped to fill
color.
A filled bar chart with the same mappings but the bars scaled to have height one to make comparing proportions easier.
A spine plot with states on the x
axis and fill
color mapped to candidate.
Show the plots, and explain which of these plots would be the better choice and why.
The data can be assembles by this code:
if (! file.exists("election2020.csv"))
download.file("http://www.stat.uiowa.edu/~luke/data/election2020.csv",
"election2020.csv")
library(dplyr)
library(ggplot2)
election2020 <- read.csv("election2020.csv")
state_abb <- data.frame(state = state.name, abb = state.abb)
election <- left_join(election2020, state_abb, "state")
nearby_states <- c("IA", "IL", "WI", "MN", "SD", "MO", "NE")
election_nearby <- filter(election, abb %in% nearby_states) |>
mutate(candidate = factor(candidate, c("Biden", "Other", "Trump")))
The plots can be creates by filling in the ---
in this code:
p <- ggplot(election_nearby, aes(x = state, y = votes, fill = candidate)) +
scale_fill_manual(values = c(Trump = scales::muted("red"),
Biden = scales::muted("blue"),
Other = "grey")) +
labs(x = "") +
theme_minimal()
p_bar <- p + geom_col(---)
p_fill <- p + geom_col(---)
library(ggmosaic)
p_spine <- p +
geom_mosaic(aes(---))
The ggplotly
function in the plotly
package allows you to add tooltips to points in a plot created with ggplot2
. There is an example in the Interaction section of the notes on the ggplot
.
The code below produces a plot of life expectancy against GDP per capita for four years for the gapminder
data. Modify this code to show the country name in a tooltip with a white background.
library(dplyr)
library(ggplot2)
library(gapminder)
gap <- filter(gapminder, year %% 10 == 7 & year >= 1977)
p <- ggplot(gap, aes(x = gdpPercap, y = lifeExp,
color = continent,
size = pop,
text = country)) +
geom_point() +
scale_size_area(max_size = 8) +
scale_x_log10() +
guides(size = "none") +
theme_bw() +
facet_wrap(~year)
p
This problem uses the data in the nycflights13
package.
The airports
table contains longitude and latitude for each airport. This can be used to construct a map. A map of the locations of a few airports can be constructed using
library(ggplot2)
library(dplyr)
library(nycflights13)
ap <- filter(airports, faa %in% c("ATL", "DEN", "JFK", "MSP", "ORD", "SFO"))
ggplot(ap, aes(x = lon, y = lat)) +
borders("state") +
geom_point(size = 3) +
coord_map() +
theme_void()
For the first three months of 2013, compute the number of flights, the average arrival delay, and the proportion of canceled flights to each of the destinations. Assume a flight is canceled it its departure time and arrival time are both missing.
Focus on the top 50 destinations in terms of the number of flights from NYC during the first three months of 2013.
Create a map with a point at each of these destinations, and encode the proportion of canceled flights in the point’s size. Comment on what you see.
In addition to the location and proportion of canceled flights, whether the average arrival delay is more or less than 20 minutes could be encoded using color or shape. Try both approaches, comment on what you see and on the advantages and disadvantages of each approach.
You can create an HTML file in RStudio using the Knit
tab on the editor window. You can also use the R command
rmarkdown::render("hw7.Rmd")
with your working directory set to HW7
.
Commit your changes to your hw7.Rmd
file to your local git repository. You do not heed to commit your HTML file.
Submit your work by pushing your local repository changes to your remote repository on the UI GitLab site. After doing this, it is a good idea to check your repository on the UI GitLab site to make sure everything has been submitted successfully