--- title: "Time Series Plots" output: html_document: toc: yes code_folding: show code_download: true --- ```{r setup, include = FALSE} source(here::here("setup.R")) knitr::opts_chunk$set(collapse = TRUE, message = FALSE, fig.height = 5, fig.width = 6, fig.align = "center") options(htmltools.dir.version = FALSE) set.seed(12345) library(ggplot2) library(lattice) library(tidyverse) library(gridExtra) library(patchwork) theme_set(theme_minimal() + theme(text = element_text(size = 16), panel.border = element_rect(color = "grey30", fill = NA))) ``` ## Time Series Data A lot of data is collected over time: - economic performace measures; - weather, climate data; - medical information. The objectives can be - understanding the past -- how we got where we are; - understanding patterns in variation; - forecasting what will happen next. Even if time is not of primary interest, data are often collected with a time stamp. Examining the behavior over time can reveal interesting features that may hint at confounding variables: - temperature varying with time of day; - humidity varying over the year. Time series data can be collected at regularly spaced intervals: - yearly, e.g. the `gapminder` data - monthly, e.g. the `river` data - daily, e.g. high temperatures. Data might be missing - irregularly, e.g. faulty instrument, holidays; - regularly, e.g. weekends. Time series can also be recorded at irregular times: - blood pressure at doctor visits; Variables recorded over time can be numerical or categorical. ## Time Series Objects There are a number of specialized object classes for dealing with time series. A number of packages provide `plot` or `autoplot` methods and other utilities for these objects. Some of the most useful: | Class | Package | Features | | ---------- | ---------- | -------------------------------------------------- | |`ts`, `mts`| `base` | regular series; methods for `plot` and `lines`. | | | `forecast` | methods for `autoplot` and `autolayer`. | |`zoo` | `zoo` | irregular and regular series; `autoplot` support. | | | `broom` |`tidy` for converting to tidy data frames. | ## `autoplot` and `autolayer` Many data structures provide methods for `autoplot` and `autolayer`. From the `autoplot` help page: > `autoplot` uses `ggplot2` to draw a particular plot for an object of > a particular class in a single command. This defines the S3 generic > that other classes and packages can extend. From the `autolayer` help page: > `autolayer` uses `ggplot2` to draw a particular layer for an object > of a particular class in a single command. This defines the S3 > generic that other classes and packages can extend. ## Example: River Data Creating a `ts` object for the `river` data: ```{r, include = FALSE} op <- options(max.print = 12) ``` ```{r} river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat") riverTS <- ts(river) riverTS ``` ```{r, include = FALSE} options(op) ``` Using `tidy` from package `broom` to create a data frame from the `ts` object: ```{r} riverDF <- broom::tidy(riverTS) head(riverDF, 3) ``` Change to more convenient names: ```{r} riverDF <- rename(riverDF, month = index, flow = value) head(riverDF, 3) ``` ## Basic Plots for a Single Numeric Time Series Numeric time series are usually plotted as a _line chart_. This is what the `autoplot` method for `ts` objects provided by package `forecast` does. For the river flow data: ```{r river-ts-autoplot, echo = FALSE} library(forecast) p <- autoplot(riverTS) p ``` ```{r river-ts-autoplot, eval = FALSE} ``` Using symbols in addition to lines is sometimes useful: ```{r river-ts-autoplot-syms, echo = FALSE} p + geom_point() ``` ```{r river-ts-autoplot-syms, eval = FALSE} ``` ## Aspect Ratio Aspect ratio has a significant effect on how well we perceive slope differences. - Banking to 45 degrees: choose an aspect ratio so the magnitude of the important slopes is close to 45 degrees. - This usually requires the horizontal axis to be longer than the vertical axis. Controlling the aspect ratio: - For `base` graphics use the `asp` argument to `plot`. - `lattice` uses the `aspect` argument. - For `ggplot` use the `ratio` argument to `coord_fixed`. ```{r, fig.height = 3, fig.width = 10} p + coord_fixed(ratio = 4) ``` Lattice supports several aspect ratio strategies: - "fill" to avoid unused white space; - "xy" to use a 45 degree banking rule; - "iso" for isometric scales, e.g. distances. Some work may be needed to avoid excess white space around figures. Using multiple panels can help. In an interactive setting, zoom and pan support can help. Multiple panels: ```{r river-ts-panels, echo = FALSE} mutate(riverDF, period = cut_width(month, center = 21.5, width = 43)) %>% ggplot(aes(month, flow)) + geom_line() + facet_wrap(~ period, scales = "free_x", ncol = 1) ``` ```{r river-ts-panels, eval = FALSE} ``` Adding zoom and pan support using [`plotly`](https://plotly.com/r/): ```{r, include = FALSE} library(plotly) ``` ```{r, , fig.height = 4, fig.width = 10} library(plotly) pp <- p + geom_line() + coord_fixed(ratio = 4) ggplotly(pp) ``` There is not always a single best aspect ratio. The `co2` data set in the `datasets` package contains monthly concentrations of CO2 for the years 1959 to 1997 recorded at the Mauna Loa observatory. The `co2` data is stored as an object of class `ts`: ```{r} str(co2) ``` `plot` and `xyplot` have methods for `ts` objects that simplify time series plotting. Two aspect ratios: ```{r , fig.height = 4.5, fig.width = 5.5} xyplot(co2) ``` ```{r, fig.height = 4, fig.width = 10} xyplot(co2, aspect = "xy") ``` Is there a slowdown in the increase? A graph with more recent data is [available](https://gml.noaa.gov/ccgg/trends/). The full data is [available](https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_annmean_mlo.txt) as well. `autoplot` does not yet provide an easy way to bank to 45 degrees; you need to experiment or calculate an appropriate aspect ratio yourself. The default `autoplot`: ```{r co2-autoplot-default, echo = FALSE} library(forecast) autoplot(co2) ``` ```{r co2-autoplot-default, eval = FALSE} ``` Aspect ratio adjusted with `coord_fixed`: ```{r, fig.height = 4, fig.width = 10} r45 <- median(diff(time(co2))) / median(abs(diff(co2))) autoplot(co2) + coord_fixed(ratio = r45) ``` ## Updated Mauna Loa CO2 Data Downloading and reading in the data: ```{r, message = FALSE} library(lubridate) url <- "https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt" if (! file.exists("co2new.dat") || now() > file.mtime("co2new.dat") + weeks(4)) download.file(url, "co2new.dat") co2new <- read.table("co2new.dat") ``` Clean up the variable names: ```{r} head(co2new) names(co2new) <- c("year", "month", "decimal_date", "average", "deseason", "ndays", "stdd", "umm") ``` An initial look: ```{r} ggplot(co2new, aes(decimal_date, average)) + geom_line() ``` Missing values are encoded as `-9.99` or `-99.99`. If there are any such values they need to be replaced with R's missing value representation: ```{r} filter(co2new, average < -9) co2new <- mutate(co2new, average = ifelse(average < -9, NA, average)) ``` This produces a reasonable plot: ```{r, class.source = "fold-hide"} ggplot(co2new, aes(decimal_date, average)) + geom_line() ``` The plot would show gaps for the missing values. Missing values could also be interpolated with `na.interp` from package `forecast`. Converting the new data to a `ts` object makes adding the original data a little easier: ```{r} co2newTS <- ts(co2new$average, start = c(1958, 3), frequency = 12) autoplot(co2newTS) + autolayer(co2) ``` ## Seasonal Decomposition of Time Series by Loess (STL) For time series with a strong seasonal component it can be useful to look at a _Seasonal Decomposition of Time Series by Loess_, or (_STL_). The methodology was suggested by Clevaland and coworkers. The `stl` function in the `base` package computes such a decomposition. It requires a series without missing values. ```{r} co2newTS_no_NA <- ts(co2new$average, start = c(1958, 3), frequency = 12) co2stl <- stl(co2newTS_no_NA, s.window = 21) head(co2stl$time.series) ``` Package `forecast` provides an `autoplot` method for `stl` objects that shows the original data and the three components in separate panels: ```{r co2-stl-plot, eval = FALSE} autoplot(co2stl) ``` ```{r co2-stl-plot, echo = FALSE, fig.height = 7} ``` ## Some Alternative Visualizations Bar charts are sometimes used when comparison to a base line is important; for example for [illustrating climate change](`r IMG("CaG_GlobalTempAnom_1.jpg")`). Calendar plots, or calendar heatmaps, are sometimes useful for displaying daily data where day of the week/weekend effects might be important. One example is available [here](`r IMG("calendarHeatmap.png")`). A [blog post](https://dominikkoch.github.io/Calendar-Heatmap/) shows how to build a calendar plot with `ggplot`. ## GDP Growth Data Bar charts are sometimes used for economic time series, in particular growth rate data. Some GDP growth rate data is available on this [web page](https://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter): Read the data from the web page: ```{r, cache = TRUE} library(rvest) gdpurl <- "http://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter" gdppage <- read_html(gdpurl) gdptbl <- html_table(gdppage)[[1]] head(gdptbl) ``` Clean the data: ```{r, message = FALSE} library(dplyr) library(lubridate) names(gdptbl) <- c("Date", "Value") gdptbl <- mutate(gdptbl, Value = as.numeric(sub("%", "", Value)), Date = mdy(Date)) head(gdptbl) ``` Data prior to 1947 is annual, so drop earlier data: ```{r} gdptbl <- filter(gdptbl, Date >= make_date(1948, 1, 1)) ``` Rearrange to put `Date` in increasing order: ```{r} gdptbl <- arrange(gdptbl, Date) ``` Create and plot a time series: ```{r gdp-plot, eval = FALSE} library(ggplot2) library(forecast) gdpts <- ts(gdptbl$Value, start = 1948, frequency = 4) autoplot(gdpts) + geom_hline(yintercept = 0, linetype = 2) ``` ```{r gdp-plot, echo = FALSE} ``` A bar chart with color used to distinguish positive and negative growth can be created from the original table `gdptbl`: ```{r gdp-bar, eval = FALSE, dpi = 300} library(ggplot2) mutate(gdptbl, growth = ifelse(Value >= 0, "pos", "neg")) %>% ggplot(aes(x = Date, y = Value, fill = growth)) + geom_col() + theme(legend.position = "top") + scale_fill_manual( values = c(pos = "blue", neg = "red")) ``` ```{r gdp-bar, echo = FALSE} ``` The bar chart can also be created from the time series object: ```{r gdp-bar-ts, eval = FALSE, dpi = 300} data.frame(rate = as.numeric(gdpts), time = as.numeric(time(gdpts))) %>% mutate(growth = ifelse(rate >= 0, "pos", "neg")) %>% ggplot(aes(time, rate, fill = growth)) + geom_col() + theme(legend.position = "top") + scale_fill_manual( values = c(pos = "blue", neg = "red")) ``` ```{r gdp-bar-ts, echo = FALSE} ``` ## Airline Flights From NYC The `nycflights13` data set contains data on all flight leaving New York City in 2013. Similar data is available for other cities and years. We can look at the number of departures for each day: ```{r} library(nycflights13) library(ggplot2) nf <- count(flights, year, month, day) ``` It will be useful to have a combined `date` field in our data frame. The `lubridate` function `make_date` function can be used for this: ```{r} library(lubridate) nf <- mutate(nf, date = make_date(year, month, day)) ``` A line plot of the number of flights: ```{r flights-1, echo = FALSE} ggplot(nf, aes(date, n)) + geom_line() ``` ```{r flights-1, eval = FALSE} ``` Weekday variation is clearly visible. Breaking out the days of the week into separate facets may help: ```{r flights-2, echo = FALSE} nnf <- mutate(nf, wd = wday(date, label = TRUE)) nnf_thm <- theme(axis.text.x = element_text(angle = -45, hjust = 0), plot.margin = margin(r = 30)) ggplot(nnf, aes(date, n)) + geom_line() + facet_wrap(~ wd) + nnf_thm ``` ```{r flights-2, eval = FALSE} ``` July 4 in 2013: ```{r} as.character(wday(ymd("2013-07-04"), label = TRUE)) ``` Identifying when some major holidays occur may also be useful: ```{r flights-3, echo = FALSE, fig.width = 7.5} holidays <- data.frame(Holiday = c("4th of July", "Thanksgiving"), date = ymd(c("2013-07-04", "2013-11-28"))) hlayer <- geom_vline(aes(xintercept = date, color = Holiday), linetype = 2, linewidth = 1, data = holidays) ggplot(nf, aes(date, n)) + hlayer + geom_line() ``` ```{r flights-3, eval = FALSE} ``` Holidays in the faceted plot: ```{r flights-holiday-facet-1, echo = FALSE, fig.width = 7.5} ggplot(nnf, aes(date, n)) + hlayer + geom_line() + facet_wrap(~ wd) + nnf_thm ``` ```{r flights-holiday-facet-1, eval = FALSE} ``` This marks the holiday weeks. Holidays in the faceted plot: ```{r flights-holiday-facet-2, echo = FALSE, fig.width = 7.5} hlayer2 <- geom_vline(aes(xintercept = date, color = Holiday), linewidth = 1, data = mutate(holidays, wd = wday(date, TRUE))) ggplot(nnf, aes(date, n)) + hlayer2 + geom_line() + facet_wrap(~ wd) + nnf_thm ``` This shows only the holiday days. ```{r flights-holiday-facet-2, eval = FALSE} ``` Holidays in the faceted plot: ```{r flights-holiday-facet-3, echo = FALSE, fig.width = 7.5} ggplot(nnf, aes(date, n)) + hlayer + hlayer2 + geom_line() + facet_wrap(~ wd) + nnf_thm ``` Show holidays as solid lines, days in holiday weeks as dashed lines. ```{r flights-holiday-facet-3, eval = FALSE} ``` A simple calendar plot: ```{r flights-cal-0, echo = FALSE} monthweek <- function(d, w) ceiling((d - w) / 7) + 1 nf <- mutate(nf, wd = wday(date, label = TRUE), wd = fct_rev(wd), mw = monthweek(day, wday(date))) p <- ggplot(nf, aes(x = as.character(mw), y = wd, fill = n)) + geom_tile(color = "white") + facet_wrap(~ month(date, label = TRUE)) p ``` This uses `geom_tile` and faceting. ```{r flights-cal-0, eval = FALSE} ``` Some theme and color adjustments: ```{r flights-cal-1, echo = FALSE} p <- p + scale_fill_gradient(low = "white") + ylab("") + xlab("Week of Month") + theme(panel.grid.major = element_blank(), panel.border = element_blank()) p ``` ```{r flights-cal-1, eval = FALSE} ``` This layout with days on the vertical axis works well with a single row, which could be extended to a multi-year plot: ```{r, fig.height = 2, fig.width = 10} p + facet_wrap(~ month(date, TRUE), nrow = 1) ``` A more standard layout: ```{r flights-cal-3, echo = FALSE, fig.width = 8} nf2 <- mutate(nf, wd = wday(date, label = TRUE), mw = factor(monthweek(day, wday(date))), mw = fct_rev(mw)) ggplot(nf2, aes(x = wd, y = mw, fill = n)) + geom_tile(color = "white") + facet_wrap(~ month(date, TRUE)) + scale_fill_gradient(low = "white") + ylab("") + xlab("Week of Month") + theme(panel.grid.major = element_blank(), axis.text.x = element_text(angle = 45, hjust = 1), panel.border = element_blank()) ``` ```{r flights-cal-3, eval = FALSE, fig.width = 8} ``` ## Reading Chapters [_Visualizing time series and other functions of an independent variable_](https://clauswilke.com/dataviz/time-series.html) and [_Visualizing trends_](https://clauswilke.com/dataviz/visualizing-trends.html) in [_Fundamentals of Data Visualization_](https://clauswilke.com/dataviz/). ## Exercises 1. A useful feature of the `lattice` package is the ability to easily request an aspect ratio that is _banked to 45 degrees_ by specifying `aspect = "xy"` in the arguments to `lattice` plotting functions. For example, This code produces a reasonable aspect ratio for a plot of an unemployment time series: ```{r} library(ggplot2) library(forecast) library(lattice) unemp_ts <- ts(economics$unemploy, start = c(1967, 7), frequency = 12) xyplot(unemp_ts, aspect = "xy") ``` Which of the following plots most closely matches the aspect ratio in the `lattice` plot? a. `autoplot(unemp_ts, aspect = 200)` b. `autoplot(unemp_ts) + coord_fixed(ratio = 0.0007)` c. `autoplot(unemp_ts) + coord_fixed(asp = 0.0007)` d. `autoplot(unemp_ts) + coord_fixed(70)`