--- title: "Olympic Medal Standings" output: html_document: toc: yes --- ```{r global_options, include = FALSE} knitr::opts_chunk$set(collapse = TRUE) ``` ```{r, include = FALSE} library(dplyr) library(ggplot2) library(lattice) library(gridExtra) set.seed(12345) ``` The PyeongChang 2018 Winter Olympics are complete and final medal counts are available. A web site showing the medal counts More details are available at The web site shows a table; it might be nice to show the results graphically. The Wikipedia page (at the time of writing) contains the standings data in a form that is easy to scrape. Using the `rvest` and `xml2` package we can extract the table from the page: ```{r} library(rvest) ## URL <- "https://en.wikipedia.org/wiki/2018_Winter_Olympics_medal_table" ## Permanent link: URL <- "https://en.wikipedia.org/w/index.php?title=2018_Winter_Olympics_medal_table&oldid=1062731624" page <- read_html(URL) page_tables <- html_table(page, fill = TRUE) length(page_tables) mtable <- page_tables[[4]] ``` The table is unusually clean; only rank might need adjusting (or could be dropped), and the name of the second column needs fixing. ```{r} head(mtable) str(mtable) library(dplyr) mtable <- mutate(mtable, Rank = as.numeric(Rank)) colnames(mtable)[[2]] colnames(mtable)[[2]] <- "NOC" ``` It is also useful to drop the redundant `Total` variable and the total row (which had it's `Rank` converted to `NA`): ```{r} mtable <- select(mtable, -Total) mtable <- filter(mtable, ! is.na(Rank)) head(mtable) ``` A tidy form would have a variable for `medal` type and medal `count`: ```{r} library(tidyr) mtable_tidy <- gather(mtable, medal, count, Gold : Bronze) mtable_tidy <- pivot_longer(mtable, Gold : Bronze, names_to = "medal", values_to = "count") head(mtable_tidy) ``` A first graph: ```{r} library(ggplot2) ggplot(mtable_tidy) + geom_col(aes(x = NOC, y = count, fill = medal)) + coord_flip() ``` An easy way to match the country order on the web site: ```{r} ggplot(mutate(mtable_tidy, NOC = factor(NOC, rev(mtable$NOC)))) + geom_col(aes(x = NOC, y = count, fill = medal)) + coord_flip() ``` Reordering the medal type levels makes the reason for the country ordering easier to see: ```{r} p <- ggplot(mutate(mtable_tidy, NOC = factor(NOC, rev(mtable$NOC)), medal = factor(medal, c("Bronze", "Silver", "Gold")))) + geom_col(aes(x = NOC, y = count, fill = medal)) + coord_flip() p ``` Finally, we can choose more reasonable colors. The Hex code for `Bronze`, for example, is from : ```{r} p + scale_fill_manual(values = c("Gold" = "#D4AF37", "Silver" = "#C0C0C0", "Bronze" = "#CD7F32")) ``` The pages on the site can also be used to extract the information, but has to be assembled from the individual day results. If this is done, the the code can be written to also obtain results for indivisual sports, for example from pages like