class: center, middle, title-slide .title[ # Time Series Plots ] .author[ ### Luke Tierney ] .institute[ ### University of Iowa ] .date[ ### 2023-05-05 ] --- layout: true <link rel="stylesheet" href="stat4580.css" type="text/css" /> <style type="text/css"> .remark-code { font-size: 85%; } </style> ## Time Series Data --- A lot of data is collected over time: -- - economic performace measures; -- - weather, climate data; -- - medical information. -- The objectives can be -- - understanding the past -- how we got where we are; -- - understanding patterns in variation; -- - forecasting what will happen next. --- Even if time is not of primary interest, data are often collected with a time stamp. -- Examining the behavior over time can reveal interesting features that may hint at confounding variables: -- - temperature varying with time of day; -- - humidity varying over the year. --- Time series data can be collected at regularly spaced intervals: -- - yearly, e.g. the `gapminder` data -- - monthly, e.g. the `river` data -- - daily, e.g. high temperatures. -- Data might be missing -- - irregularly, e.g. faulty instrument, holidays; -- - regularly, e.g. weekends. -- Time series can also be recorded at irregular times: -- - blood pressure at doctor visits; -- Variables recorded over time can be numerical or categorical. --- layout: false ## Time Series Objects There are a number of specialized object classes for dealing with time series. -- A number of packages provide `plot` or `autoplot` methods and other utilities for these objects. -- Some of the most useful: | Class | Package | Features | | ---------- | ---------- | -------------------------------------------------- | |`ts`, `mts`| `base` | regular series; methods for `plot` and `lines`. | | | `forecast` | methods for `autoplot` and `autolayer`. | |`zoo` | `zoo` | irregular and regular series; `autoplot` support. | | | `broom` |`tidy` for converting to tidy data frames. | --- ## `autoplot` and `autolayer` Many data structures provide methods for `autoplot` and `autolayer`. -- From the `autoplot` help page: > `autoplot` uses `ggplot2` to draw a particular plot for an object of > a particular class in a single command. This defines the S3 generic > that other classes and packages can extend. -- From the `autolayer` help page: > `autolayer` uses `ggplot2` to draw a particular layer for an object > of a particular class in a single command. This defines the S3 > generic that other classes and packages can extend. --- layout: true ## Example: River Data --- Creating a `ts` object for the `river` data: ```r river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat") riverTS <- ts(river) riverTS ## Time Series: ## Start = 1 ## End = 128 ## Frequency = 1 ## [1] 8.95361 9.49409 10.19430 10.95660 11.07770 10.98170 10.48970 10.27540 ## [9] 10.16620 9.23484 8.56537 8.51064 ## [ reached getOption("max.print") -- omitted 116 entries ] ``` --- Using `tidy` from package `broom` to create a data frame from the `ts` object: ```r riverDF <- broom::tidy(riverTS) head(riverDF, 3) ## # A tibble: 3 × 2 ## index value ## <dbl> <dbl> ## 1 1 8.95 ## 2 2 9.49 ## 3 3 10.2 ``` -- Change to more convenient names: ```r riverDF <- rename(riverDF, month = index, flow = value) head(riverDF, 3) ## # A tibble: 3 × 2 ## month flow ## <dbl> <dbl> ## 1 1 8.95 ## 2 2 9.49 ## 3 3 10.2 ``` --- layout: true ## Basic Plots for a Single Numeric Time Series --- Numeric time series are usually plotted as a _line chart_. -- This is what the `autoplot` method for `ts` objects provided by package `forecast` does. -- For the river flow data: .pull-left[ <img src="timeseries_files/figure-html/river-ts-autoplot-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r library(forecast) p <- autoplot(riverTS) p ``` ] --- Using symbols in addition to lines is sometimes useful: -- .pull-left[ <img src="timeseries_files/figure-html/river-ts-autoplot-syms-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r p + geom_point() ``` ] --- layout: true ## Aspect Ratio --- Aspect ratio has a significant effect on how well we perceive slope differences. -- - Banking to 45 degrees: choose an aspect ratio so the magnitude of the important slopes is close to 45 degrees. -- - This usually requires the horizontal axis to be longer than the vertical axis. --- Controlling the aspect ratio: -- - For `base` graphics use the `asp` argument to `plot`. -- - `lattice` uses the `aspect` argument. -- - For `ggplot` use the `ratio` argument to `coord_fixed`. -- ```r p + coord_fixed(ratio = 4) ``` <img src="timeseries_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- Lattice supports several aspect ratio strategies: -- - "fill" to avoid unused white space; -- - "xy" to use a 45 degree banking rule; -- - "iso" for isometric scales, e.g. distances. -- Some work may be needed to avoid excess white space around figures. -- Using multiple panels can help. -- In an interactive setting, zoom and pan support can help. --- Multiple panels: -- .pull-left[ <img src="timeseries_files/figure-html/river-ts-panels-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r mutate(riverDF, period = cut_width(month, center = 21.5, width = 43)) %>% ggplot(aes(month, flow)) + geom_line() + facet_wrap(~ period, scales = "free_x", ncol = 1) ``` ] --- Adding zoom and pan support using [`plotly`](https://plotly.com/r/): -- ```r library(plotly) pp <- p + geom_line() + coord_fixed(ratio = 4) ggplotly(pp) ```
--- There is not always a single best aspect ratio. -- The `co2` data set in the `datasets` package contains monthly concentrations of CO2 for the years 1959 to 1997 recorded at the Mauna Loa observatory. -- The `co2` data is stored as an object of class `ts`: ```r str(co2) ## Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ... ``` -- `plot` and `xyplot` have methods for `ts` objects that simplify time series plotting. --- Two aspect ratios: .pull-left[ ```r xyplot(co2) ``` <img src="timeseries_files/figure-html/unnamed-chunk-10-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r xyplot(co2, aspect = "xy") ``` <img src="timeseries_files/figure-html/unnamed-chunk-11-1.png" style="display: block; margin: auto;" /> Is there a slowdown in the increase? A graph with more recent data is [available](https://gml.noaa.gov/ccgg/trends/). The full data is [available](https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_annmean_mlo.txt) as well. ] --- `autoplot` does not yet provide an easy way to bank to 45 degrees; you need to experiment or calculate an appropriate aspect ratio yourself. The default `autoplot`: .pull-left[ <img src="timeseries_files/figure-html/co2-autoplot-default-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r library(forecast) autoplot(co2) ``` ] --- Aspect ratio adjusted with `coord_fixed`: ```r r45 <- median(diff(time(co2))) / median(abs(diff(co2))) autoplot(co2) + coord_fixed(ratio = r45) ``` <img src="timeseries_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> --- layout: true ## Updated Mauna Loa CO2 Data --- Downloading and reading in the data: ```r library(lubridate) url <- "https://gml.noaa.gov/webdata/ccgg/trends/co2/co2_mm_mlo.txt" if (! file.exists("co2new.dat") || now() > file.mtime("co2new.dat") + weeks(4)) download.file(url, "co2new.dat") co2new <- read.table("co2new.dat") ``` -- Clean up the variable names: ```r head(co2new) ## V1 V2 V3 V4 V5 V6 V7 V8 ## 1 1958 3 1958.203 315.70 314.43 -1 -9.99 -0.99 ## 2 1958 4 1958.288 317.45 315.16 -1 -9.99 -0.99 ## 3 1958 5 1958.370 317.51 314.71 -1 -9.99 -0.99 ## 4 1958 6 1958.455 317.24 315.14 -1 -9.99 -0.99 ## 5 1958 7 1958.537 315.86 315.18 -1 -9.99 -0.99 ## 6 1958 8 1958.622 314.93 316.18 -1 -9.99 -0.99 names(co2new) <- c("year", "month", "decimal_date", "average", "deseason", "ndays", "stdd", "umm") ``` --- An initial look: ```r ggplot(co2new, aes(decimal_date, average)) + geom_line() ``` <img src="timeseries_files/figure-html/unnamed-chunk-15-1.png" style="display: block; margin: auto;" /> --- Missing values are encoded as `-9.99` or `-99.99`. If there are any such values they need to be replaced with R's missing value representation: ```r filter(co2new, average < -9) ## [1] year month decimal_date average deseason ## [6] ndays stdd umm ## <0 rows> (or 0-length row.names) co2new <- mutate(co2new, average = ifelse(average < -9, NA, average)) ``` --- .pull-left[ This produces a reasonable plot: .hide-code[ ```r ggplot(co2new, aes(decimal_date, average)) + geom_line() ``` <img src="timeseries_files/figure-html/unnamed-chunk-17-1.png" style="display: block; margin: auto;" /> ] ] -- .pull-right[ The plot would show gaps for the missing values. {{content}} ] -- Missing values could also be interpolated with `na.interp` from package `forecast`. --- Converting the new data to a `ts` object makes adding the original data a little easier: ```r co2newTS <- ts(co2new$average, start = c(1958, 3), frequency = 12) autoplot(co2newTS) + autolayer(co2) ``` <img src="timeseries_files/figure-html/unnamed-chunk-18-1.png" style="display: block; margin: auto;" /> --- layout: true ## Seasonal Decomposition of Time Series by Loess (STL) --- For time series with a strong seasonal component it can be useful to look at a _Seasonal Decomposition of Time Series by Loess_, or (_STL_). -- The methodology was suggested by Clevaland and coworkers. -- The `stl` function in the `base` package computes such a decomposition. -- It requires a series without missing values. -- ```r co2newTS_no_NA <- ts(co2new$average, start = c(1958, 3), frequency = 12) co2stl <- stl(co2newTS_no_NA, s.window = 21) head(co2stl$time.series) ## seasonal trend remainder ## Mar 1958 1.1151386 314.9331 -0.3482842 ## Apr 1958 2.2812033 315.0057 0.1630790 ## May 1958 2.7700266 315.0783 -0.3383163 ## Jun 1958 2.2252316 315.1509 -0.1360934 ## Jul 1958 0.8621872 315.2178 -0.2199841 ## Aug 1958 -1.0841494 315.2847 0.7294176 ``` --- .pull-left[ Package `forecast` provides an `autoplot` method for `stl` objects that shows the original data and the three components in separate panels: ```r autoplot(co2stl) ``` ] .pull-right[ <img src="timeseries_files/figure-html/co2-stl-plot-1.png" style="display: block; margin: auto;" /> ] --- layout: false ## Some Alternative Visualizations Bar charts are sometimes used when comparison to a base line is important; for example for [illustrating climate change](../img/CaG_GlobalTempAnom_1.jpg). <!-- original at https://www.climate.gov/sites/default/files/CaG_GlobalTempAnom_1.jpg Data is available from https://www.climate.gov/maps-data/dataset/global-temperature-anomalies-graphing-tool --> Calendar plots, or calendar heatmaps, are sometimes useful for displaying daily data where day of the week/weekend effects might be important. One example is available [here](../img/calendarHeatmap.png). A [blog post](https://dominikkoch.github.io/Calendar-Heatmap/) shows how to build a calendar plot with `ggplot`. --- layout: true ## GDP Growth Data --- Bar charts are sometimes used for economic time series, in particular growth rate data. -- Some GDP growth rate data is available on this [web page](https://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter): -- Read the data from the web page: ```r library(rvest) gdpurl <- "http://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter" gdppage <- read_html(gdpurl) gdptbl <- html_table(gdppage)[[1]] head(gdptbl) ## # A tibble: 6 × 2 ## Date `Value\n\n\n\nValue` ## <chr> <chr> ## 1 Dec 31, 2022 0.88% ## 2 Sep 30, 2022 1.94% ## 3 Jun 30, 2022 1.80% ## 4 Mar 31, 2022 3.68% ## 5 Dec 31, 2021 5.72% ## 6 Sep 30, 2021 4.96% ``` --- Clean the data: ```r library(dplyr) library(lubridate) names(gdptbl) <- c("Date", "Value") gdptbl <- mutate(gdptbl, Value = as.numeric(sub("%", "", Value)), Date = mdy(Date)) head(gdptbl) ## # A tibble: 6 × 2 ## Date Value ## <date> <dbl> ## 1 2022-12-31 0.88 ## 2 2022-09-30 1.94 ## 3 2022-06-30 1.8 ## 4 2022-03-31 3.68 ## 5 2021-12-31 5.72 ## 6 2021-09-30 4.96 ``` --- .pull-left[ Data prior to 1947 is annual, so drop earlier data: ```r gdptbl <- filter(gdptbl, Date >= make_date(1948, 1, 1)) ``` {{content}} ] -- Rearrange to put `Date` in increasing order: ```r gdptbl <- arrange(gdptbl, Date) ``` {{content}} -- Create and plot a time series: ```r library(ggplot2) library(forecast) gdpts <- ts(gdptbl$Value, start = 1948, frequency = 4) autoplot(gdpts) + geom_hline(yintercept = 0, linetype = 2) ``` -- .pull-right[ <img src="timeseries_files/figure-html/gdp-plot-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ A bar chart with color used to distinguish positive and negative growth can be created from the original table `gdptbl`: ```r library(ggplot2) mutate(gdptbl, growth = ifelse(Value >= 0, "pos", "neg")) %>% ggplot(aes(x = Date, y = Value, fill = growth)) + geom_col() + theme(legend.position = "top") + scale_fill_manual( values = c(pos = "blue", neg = "red")) ``` ] .pull-right[ <img src="timeseries_files/figure-html/gdp-bar-1.png" style="display: block; margin: auto;" /> ] --- .pull-left[ The bar chart can also be created from the time series object: ```r data.frame(rate = as.numeric(gdpts), time = as.numeric(time(gdpts))) %>% mutate(growth = ifelse(rate >= 0, "pos", "neg")) %>% ggplot(aes(time, rate, fill = growth)) + geom_col() + theme(legend.position = "top") + scale_fill_manual( values = c(pos = "blue", neg = "red")) ``` ] .pull-right[ <img src="timeseries_files/figure-html/gdp-bar-ts-1.png" style="display: block; margin: auto;" /> ] --- layout: true ## Airline Flights From NYC --- The `nycflights13` data set contains data on all flight leaving New York City in 2013. Similar data is available for other cities and years. -- We can look at the number of departures for each day: ```r library(nycflights13) library(ggplot2) nf <- count(flights, year, month, day) ``` -- It will be useful to have a combined `date` field in our data frame. -- The `lubridate` function `make_date` function can be used for this: ```r library(lubridate) nf <- mutate(nf, date = make_date(year, month, day)) ``` --- A line plot of the number of flights: .pull-left[ <img src="timeseries_files/figure-html/flights-1-1.png" style="display: block; margin: auto;" /> ] -- .pull-right-width-40[ ```r ggplot(nf, aes(date, n)) + geom_line() ``` ] -- .pull-right-width-40[ Weekday variation is clearly visible. ] --- Breaking out the days of the week into separate facets may help: .pull-left[ <img src="timeseries_files/figure-html/flights-2-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r nnf <- * mutate(nf, wd = wday(date, label = TRUE)) nnf_thm <- theme(axis.text.x = element_text(angle = -45, hjust = 0), plot.margin = margin(r = 30)) ggplot(nnf, aes(date, n)) + geom_line() + * facet_wrap(~ wd) + nnf_thm ``` {{content}} ] -- July 4 in 2013: ```r as.character(wday(ymd("2013-07-04"), label = TRUE)) ## [1] "Thu" ``` --- Identifying when some major holidays occur may also be useful: .pull-left[ <img src="timeseries_files/figure-html/flights-3-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r holidays <- data.frame(Holiday = c("4th of July", "Thanksgiving"), date = ymd(c("2013-07-04", "2013-11-28"))) hlayer <- geom_vline(aes(xintercept = date, color = Holiday), linetype = 2, linewidth = 1, data = holidays) ggplot(nf, aes(date, n)) + hlayer + geom_line() ``` ] --- .pull-left[ Holidays in the faceted plot: <img src="timeseries_files/figure-html/flights-holiday-facet-1-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r ggplot(nnf, aes(date, n)) + hlayer + geom_line() + facet_wrap(~ wd) + nnf_thm ``` {{content}} ] -- This marks the holiday weeks. --- .pull-left[ Holidays in the faceted plot: <img src="timeseries_files/figure-html/flights-holiday-facet-2-1.png" style="display: block; margin: auto;" /> -- ] .pull-right[ This shows only the holiday days. ```r hlayer2 <- geom_vline(aes(xintercept = date, color = Holiday), linewidth = 1, data = mutate(holidays, wd = wday(date, TRUE))) ggplot(nnf, aes(date, n)) + hlayer2 + geom_line() + facet_wrap(~ wd) + nnf_thm ``` ] --- .pull-left[ Holidays in the faceted plot: <img src="timeseries_files/figure-html/flights-holiday-facet-3-1.png" style="display: block; margin: auto;" /> ] .pull-right[ Show holidays as solid lines, days in holiday weeks as dashed lines. ```r ggplot(nnf, aes(date, n)) + hlayer + hlayer2 + geom_line() + facet_wrap(~ wd) + nnf_thm ``` ] --- A simple calendar plot: .pull-left[ <img src="timeseries_files/figure-html/flights-cal-0-1.png" style="display: block; margin: auto;" /> ] -- .pull-right[ This uses `geom_tile` and faceting. ```r monthweek <- function(d, w) ceiling((d - w) / 7) + 1 nf <- mutate(nf, wd = wday(date, label = TRUE), wd = fct_rev(wd), mw = monthweek(day, wday(date))) p <- ggplot(nf, aes(x = as.character(mw), y = wd, fill = n)) + * geom_tile(color = "white") + * facet_wrap(~ month(date, label = TRUE)) p ``` ] --- Some theme and color adjustments: .pull-left[ <img src="timeseries_files/figure-html/flights-cal-1-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r p <- p + scale_fill_gradient(low = "white") + ylab("") + xlab("Week of Month") + theme(panel.grid.major = element_blank(), panel.border = element_blank()) p ``` ] --- This layout with days on the vertical axis works well with a single row, which could be extended to a multi-year plot: ```r p + facet_wrap(~ month(date, TRUE), nrow = 1) ``` <img src="timeseries_files/figure-html/unnamed-chunk-27-1.png" style="display: block; margin: auto;" /> --- A more standard layout: .pull-left[ <img src="timeseries_files/figure-html/flights-cal-3-1.png" style="display: block; margin: auto;" /> ] .pull-right[ ```r nf2 <- mutate(nf, wd = wday(date, label = TRUE), mw = factor(monthweek(day, wday(date))), mw = fct_rev(mw)) ggplot(nf2, aes(x = wd, y = mw, fill = n)) + geom_tile(color = "white") + facet_wrap(~ month(date, TRUE)) + scale_fill_gradient(low = "white") + ylab("") + xlab("Week of Month") + theme(panel.grid.major = element_blank(), axis.text.x = element_text(angle = 45, hjust = 1), panel.border = element_blank()) ``` ] --- ## Reading Chapters [_Visualizing time series and other functions of an independent variable_](https://clauswilke.com/dataviz/time-series.html) and [_Visualizing trends_](https://clauswilke.com/dataviz/visualizing-trends.html) in [_Fundamentals of Data Visualization_](https://clauswilke.com/dataviz/).
//adapted from Emi Tanaka's gist at //https://gist.github.com/emitanaka/eaa258bb8471c041797ff377704c8505