General Issues
Make sure you name your files as requested, including matching
the specified use of upper and lower case. This matters on file systems
that are case-sensitive.
Make sure to commit your work to your local repository and push
your commits to GitLab. We can only see what is on GitLab, not what is
on your computer. You can check what we see by going to the GitLab web
interface.
Include your name and the date in the header of your
.Rmd file using author: and date:
tags.
Your HTML file should be a report of your findings.
Any graph you show should be discussed in your
narrative.
Any code you show should be discussed in your narrative.
If you do not need to discuss a piece of code in the narrative,
use echo FALSE to avoid showing it.
1. New York City Airport Names
The names and airport codes for the three New York City airports in
the nycflights13 data are shown in the following table:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(nycflights13)
nyc_faa <- unique(flights$origin)
tbl <- select(airports, faa, name) |> filter(faa %in% nyc_faa)
names(tbl) <- c("Code", "Name")
kbl <- knitr::kable(tbl, format = "html")
kableExtra::kable_styling(kbl, full_width = FALSE)
|
Code
|
Name
|
|
EWR
|
Newark Liberty Intl
|
|
JFK
|
John F Kennedy Intl
|
|
LGA
|
La Guardia
|
3. Air Time Distributions
Four possible visualizations without much fine tuning:
library(ggplot2)
library(ggridges)
library(patchwork)
thm <- theme_minimal() + theme(text = element_text(size = 16))
p0 <- ggplot(flights, aes(x = air_time)) + thm
p1 <- p0 +
geom_density(aes(color = origin), bw = 50) +
ggtitle("Color")
p2 <- p0 +
geom_density(aes(fill = origin), alpha = 0.4, bw = 50) +
ggtitle("Fill with Alpha Blending")
p3 <- p0 +
geom_density(bw = 50) + facet_wrap(~ origin, ncol = 1) +
ggtitle("Facets")
p4 <- p0 +
geom_density_ridges(aes(y = origin, height = after_stat(density)),
stat = "density", bw = 50) +
scale_y_discrete(limits = c("LGA", "JFK", "EWR")) +
ggtitle("Ridgeline")
(p1 | p2) / (p3 | p4)## + plot_layout(guides = "collect")
## Warning: Removed 9430 rows containing non-finite outside the scale range
## (`stat_density()`).
## Removed 9430 rows containing non-finite outside the scale range
## (`stat_density()`).
## Removed 9430 rows containing non-finite outside the scale range
## (`stat_density()`).
## Removed 9430 rows containing non-finite outside the scale range
## (`stat_density()`).

Neither single-plot view works particularly well in this case. For
the plot using fill with alpha blending the
overlap is too large to allow the densities to be distinguished easily.
The plot mapping origin to color works
somewhat better but the lines are still hard to follow. The faceted plot
and the ridgeline plot are visually quite similar and both work fairly
well.
Flights out of La Guardia are mostly shorter, with very few taking
over 300 minutes. Somewhat more long flights originate from Newark, and
considerably more long flights originate from JFK.
4. Self-Reported Heights
Density plots and empirical cumulative distribution plots for the two
groups:
library(patchwork)
data(heights, package = "dslabs")
thm <- theme_minimal() + theme(text = element_text(size = 16))
p1 <- ggplot(heights, aes(x = height, fill = sex)) +
geom_density(alpha = 0.6) +
thm + theme(legend.position = "top")
p2 <- ggplot(heights, aes(x = height, color = sex)) +
stat_ecdf() +
thm + theme(legend.position = "top")
p1 | p2

The density plots make it easier to identify general characteristics
of the distribution: close to symmetric, not too far from
bell-shaped.
Reading off percentiles, such as medians of quartiles, is easier in
the eCDF plots.
LS0tCnRpdGxlOiAiQXNzaWdubWVudCA1IE5vdGVzIgpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIHRvYzogeWVzCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICBjb2RlX2ZvbGRpbmc6ICJoaWRlIgotLS0KCmBgYHtyIGdsb2JhbF9vcHRpb25zLCBpbmNsdWRlID0gRkFMU0V9CmtuaXRyOjpvcHRzX2NodW5rJHNldChjb2xsYXBzZSA9IFRSVUUpCmBgYAoKIyMgR2VuZXJhbCBJc3N1ZXMKCiogTWFrZSBzdXJlIHlvdSBuYW1lIHlvdXIgZmlsZXMgYXMgcmVxdWVzdGVkLCBpbmNsdWRpbmcgbWF0Y2hpbmcgdGhlCiAgc3BlY2lmaWVkIHVzZSBvZiB1cHBlciBhbmQgbG93ZXIgY2FzZS4gVGhpcyBtYXR0ZXJzIG9uIGZpbGUgc3lzdGVtcwogIHRoYXQgYXJlIGNhc2Utc2Vuc2l0aXZlLgoKKiBNYWtlIHN1cmUgdG8gY29tbWl0IHlvdXIgd29yayB0byB5b3VyIGxvY2FsIHJlcG9zaXRvcnkgYW5kIHB1c2ggeW91cgogIGNvbW1pdHMgdG8gR2l0TGFiLiBXZSBjYW4gb25seSBzZWUgd2hhdCBpcyBvbiBHaXRMYWIsIG5vdCB3aGF0IGlzIG9uCiAgeW91ciBjb21wdXRlci4gWW91IGNhbiBjaGVjayB3aGF0IHdlIHNlZSBieSBnb2luZyB0byB0aGUgR2l0TGFiIHdlYgogIGludGVyZmFjZS4KIAoqIEluY2x1ZGUgeW91ciBuYW1lIGFuZCB0aGUgZGF0ZSBpbiB0aGUgaGVhZGVyIG9mIHlvdXIgYC5SbWRgIGZpbGUKICB1c2luZyBgYXV0aG9yOmAgYW5kIGBkYXRlOmAgdGFncy4KCiogWW91ciBIVE1MIGZpbGUgc2hvdWxkIGJlIGEgcmVwb3J0IG9mIHlvdXIgZmluZGluZ3MuCgogICAgKiBBbnkgZ3JhcGggeW91IHNob3cgc2hvdWxkIGJlIGRpc2N1c3NlZCBpbiB5b3VyIG5hcnJhdGl2ZS4KCiAgICAqIEFueSBjb2RlIHlvdSBzaG93IHNob3VsZCBiZSBkaXNjdXNzZWQgaW4geW91ciBuYXJyYXRpdmUuCgogICAgKiBJZiB5b3UgZG8gbm90IG5lZWQgdG8gZGlzY3VzcyBhIHBpZWNlIG9mIGNvZGUgaW4gdGhlIG5hcnJhdGl2ZSwKICAgICAgdXNlIGBlY2hvIEZBTFNFYCB0byBhdm9pZCBzaG93aW5nIGl0LgoKCiMjIDEuIE5ldyBZb3JrIENpdHkgQWlycG9ydCBOYW1lcwoKVGhlIG5hbWVzIGFuZCBhaXJwb3J0IGNvZGVzIGZvciB0aGUgdGhyZWUgTmV3IFlvcmsgQ2l0eSBhaXJwb3J0cyBpbgp0aGUgYG55Y2ZsaWdodHMxM2AgZGF0YSBhcmUgc2hvd24gaW4gdGhlIGZvbGxvd2luZyB0YWJsZToKCmBgYHtyfQpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KG55Y2ZsaWdodHMxMykKbnljX2ZhYSA8LSB1bmlxdWUoZmxpZ2h0cyRvcmlnaW4pCnRibCA8LSBzZWxlY3QoYWlycG9ydHMsIGZhYSwgbmFtZSkgfD4gZmlsdGVyKGZhYSAlaW4lIG55Y19mYWEpCm5hbWVzKHRibCkgPC0gYygiQ29kZSIsICJOYW1lIikKa2JsIDwtIGtuaXRyOjprYWJsZSh0YmwsIGZvcm1hdCA9ICJodG1sIikKa2FibGVFeHRyYTo6a2FibGVfc3R5bGluZyhrYmwsIGZ1bGxfd2lkdGggPSBGQUxTRSkKYGBgCgoKIyMgMi4gQXZlcmFnZSBhbmQgTWVkaWFuIERlcGFydHVyZSBEZWxheXMKCmBgYHtyfQp0YmwgPC0KICAgIGdyb3VwX2J5KGZsaWdodHMsIG9yaWdpbikgfD4KICAgIHN1bW1hcml6ZShhdmdfZGVwX2RlbGF5ID0gbWVhbihkZXBfZGVsYXksIG5hLnJtID0gVFJVRSksCiAgICAgICAgICAgICAgbWVkX2RlcF9kZWxheSA9IG1lZGlhbihkZXBfZGVsYXksIG5hLnJtID0gVFJVRSkpIHw+CiAgICB1bmdyb3VwKCkKbmFtZXModGJsKSA8LSBjKCJPcmlnaW4iLCAiQXZlcmFnZSBEZWxheSIsICJNZWRpYW4gRGVsYXkiKQprYmwgPC0ga25pdHI6OmthYmxlKHRibCwgZm9ybWF0ID0gImh0bWwiLCBkaWdpdHMgPSAxKQprYWJsZUV4dHJhOjprYWJsZV9zdHlsaW5nKGtibCwgZnVsbF93aWR0aCA9IEZBTFNFKQpgYGAKCkFpcmxpbmVzIHdvcmsgdmVyeSBoYXJkIHRvIGhhdmUgZmxpZ2h0cyBsZWF2ZSBvbiB0aW1lLiBJbiBmYWN0IHRoZQptYWpvcml0eSBhdCBhbGwgdGhyZWUgYWlycG9ydHMgbGVmdCBlYXJseSBhbmQgc28gdGhlIG1lZGlhbiBkZWxheXMgYXJlCm5lZ2F0aXZlLiBCdXQgdGhlIGRpc3RyaWJ1dGlvbnMgb2YgZGVsYXkgdGltZXMgYXJlIGhlYXZpbHkgc2tld2VkIHRvCnRoZSByaWdodCwgc28gdGhlIGF2ZXJhZ2UgZGVwYXJ0dXJlIGRlbGF5cyBhcmUgcXVpdGUgYSBiaXQgbGFyZ2VyLgoKCiMjIDMuIEFpciBUaW1lIERpc3RyaWJ1dGlvbnMKCkZvdXIgcG9zc2libGUgdmlzdWFsaXphdGlvbnMgd2l0aG91dCBtdWNoIGZpbmUgdHVuaW5nOgoKYGBge3IsIGZpZy53aWR0aCA9IDgsIGZpZy5oZWlnaHQgPSA3fQpsaWJyYXJ5KGdncGxvdDIpCmxpYnJhcnkoZ2dyaWRnZXMpCmxpYnJhcnkocGF0Y2h3b3JrKQp0aG0gPC0gdGhlbWVfbWluaW1hbCgpICsgdGhlbWUodGV4dCA9IGVsZW1lbnRfdGV4dChzaXplID0gMTYpKQpwMCA8LSBnZ3Bsb3QoZmxpZ2h0cywgYWVzKHggPSBhaXJfdGltZSkpICsgdGhtCnAxIDwtIHAwICsKICAgIGdlb21fZGVuc2l0eShhZXMoY29sb3IgPSBvcmlnaW4pLCBidyA9IDUwKSArCiAgICBnZ3RpdGxlKCJDb2xvciIpCnAyIDwtIHAwICsKICAgIGdlb21fZGVuc2l0eShhZXMoZmlsbCA9IG9yaWdpbiksIGFscGhhID0gMC40LCBidyA9IDUwKSArCiAgICBnZ3RpdGxlKCJGaWxsIHdpdGggQWxwaGEgQmxlbmRpbmciKQpwMyA8LSBwMCArCiAgICBnZW9tX2RlbnNpdHkoYncgPSA1MCkgKyBmYWNldF93cmFwKH4gb3JpZ2luLCBuY29sID0gMSkgKwogICAgZ2d0aXRsZSgiRmFjZXRzIikKcDQgPC0gcDAgKwogICAgZ2VvbV9kZW5zaXR5X3JpZGdlcyhhZXMoeSA9IG9yaWdpbiwgaGVpZ2h0ID0gYWZ0ZXJfc3RhdChkZW5zaXR5KSksCiAgICAgICAgICAgICAgICAgICAgICAgIHN0YXQgPSAiZGVuc2l0eSIsIGJ3ID0gNTApICsKICAgIHNjYWxlX3lfZGlzY3JldGUobGltaXRzID0gYygiTEdBIiwgIkpGSyIsICJFV1IiKSkgKwogICAgZ2d0aXRsZSgiUmlkZ2VsaW5lIikKKHAxIHwgcDIpIC8gKHAzIHwgcDQpIyMgKyBwbG90X2xheW91dChndWlkZXMgPSAiY29sbGVjdCIpCmBgYAoKTmVpdGhlciBzaW5nbGUtcGxvdCB2aWV3IHdvcmtzIHBhcnRpY3VsYXJseSB3ZWxsIGluIHRoaXMgY2FzZS4gIEZvcgp0aGUgcGxvdCB1c2luZyBgZmlsbGAgd2l0aCBgYWxwaGFgIGJsZW5kaW5nIHRoZSBvdmVybGFwIGlzIHRvbyBsYXJnZQp0byBhbGxvdyB0aGUgZGVuc2l0aWVzIHRvIGJlIGRpc3Rpbmd1aXNoZWQgZWFzaWx5LiBUaGUgcGxvdCBtYXBwaW5nCmBvcmlnaW5gIHRvIGBjb2xvcmAgd29ya3Mgc29tZXdoYXQgYmV0dGVyIGJ1dCB0aGUgbGluZXMgYXJlIHN0aWxsCmhhcmQgdG8gZm9sbG93LiBUaGUgZmFjZXRlZCBwbG90IGFuZCB0aGUgcmlkZ2VsaW5lIHBsb3QgYXJlIHZpc3VhbGx5CnF1aXRlIHNpbWlsYXIgYW5kIGJvdGggd29yayBmYWlybHkgd2VsbC4KCkZsaWdodHMgb3V0IG9mIExhIEd1YXJkaWEgYXJlIG1vc3RseSBzaG9ydGVyLCB3aXRoIHZlcnkgZmV3IHRha2luZwpvdmVyIDMwMCBtaW51dGVzLiBTb21ld2hhdCBtb3JlIGxvbmcgZmxpZ2h0cyBvcmlnaW5hdGUgZnJvbSBOZXdhcmssCmFuZCBjb25zaWRlcmFibHkgbW9yZSBsb25nIGZsaWdodHMgb3JpZ2luYXRlIGZyb20gSkZLLgoKCiMjIDQuIFNlbGYtUmVwb3J0ZWQgSGVpZ2h0cwoKRGVuc2l0eSBwbG90cyBhbmQgZW1waXJpY2FsIGN1bXVsYXRpdmUgZGlzdHJpYnV0aW9uIHBsb3RzIGZvciB0aGUgdHdvIGdyb3VwczoKCmBgYHtyLCBmaWcud2lkdGggPSA5LCBmaWcuaGVpZ2h0ID0gNX0KbGlicmFyeShwYXRjaHdvcmspCmRhdGEoaGVpZ2h0cywgcGFja2FnZSA9ICJkc2xhYnMiKQp0aG0gPC0gdGhlbWVfbWluaW1hbCgpICsgdGhlbWUodGV4dCA9IGVsZW1lbnRfdGV4dChzaXplID0gMTYpKQpwMSA8LSBnZ3Bsb3QoaGVpZ2h0cywgYWVzKHggPSBoZWlnaHQsIGZpbGwgPSBzZXgpKSArCiAgICBnZW9tX2RlbnNpdHkoYWxwaGEgPSAwLjYpICsKICAgIHRobSArIHRoZW1lKGxlZ2VuZC5wb3NpdGlvbiA9ICJ0b3AiKQpwMiA8LSBnZ3Bsb3QoaGVpZ2h0cywgYWVzKHggPSBoZWlnaHQsIGNvbG9yID0gc2V4KSkgKwogICAgc3RhdF9lY2RmKCkgKwogICAgdGhtICsgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gInRvcCIpCnAxIHwgcDIKYGBgCgpUaGUgZGVuc2l0eSBwbG90cyBtYWtlIGl0IGVhc2llciB0byBpZGVudGlmeSBnZW5lcmFsIGNoYXJhY3RlcmlzdGljcwpvZiB0aGUgZGlzdHJpYnV0aW9uOiBjbG9zZSB0byBzeW1tZXRyaWMsIG5vdCB0b28gZmFyIGZyb20gYmVsbC1zaGFwZWQuCgpSZWFkaW5nIG9mZiBwZXJjZW50aWxlcywgc3VjaCBhcyBtZWRpYW5zIG9mIHF1YXJ0aWxlcywgaXMgZWFzaWVyIGluCnRoZSBlQ0RGIHBsb3RzLgo=