---
title: "Plots for Multiple Time Series"
output:
html_document:
toc: yes
code_folding: show
code_download: true
---
```{r setup, include = FALSE}
source(here::here("setup.R"))
knitr::opts_chunk$set(collapse = TRUE, message = FALSE,
fig.height = 5, fig.width = 6, fig.align = "center")
options(htmltools.dir.version = FALSE)
set.seed(12345)
library(ggplot2)
library(lattice)
library(tidyverse)
library(gridExtra)
library(forecast)
theme_set(theme_minimal() +
theme(text = element_text(size = 16),
panel.border = element_rect(color = "grey30", fill = NA)))
```
## Multiple Time Series
Some questions for multiple time series:
- How do the series behave individually?
- How are the series related?
- Does one series lead or lag the others?
Some options for plotting multiple series:
- separate panels in a trellis display;
- multiple series in a single plot - this will require standardizing
if the scales vary substantially;
- a multivatiate plot with time represented by connecting line
segments or animation.
Another option for two time series:
- use a separate `y` axis for each.
This is usually a bad idea.
Some data structures for multiple time series:
* `mts` objects created by `ts()`.
- All series have to be numeric.
- `autoplot` methods may be sufficient.
* Regular data frames.
- The series will have to be plotted in separate layers,
- or the data frames will need converting into long form.
## Crimean War Casualties
The `HistData` package contains the data used by Florence Nightingale
to call attention to terrible sanitary conditions for the Britich army
in Crimea.
```{r}
data(Nightingale, package = "HistData")
head(select(Nightingale, -Date))
```
A plot of the `Disease`, `Wounds`, and `Other`:
```{r night-ts-auto, echo = FALSE, fig.width = 7.5}
nDWO <- select(Nightingale,
Disease, Wounds, Other) %>%
ts(start = c(1854, 4), frequency = 12)
autoplot(nDWO)
```
```{r night-ts-auto, eval = FALSE}
```
Using the rate variables produces a very similar pattern:
```{r night-rate-ts-auto, echo = FALSE, fig.width = 7.5}
nDWO.rate <- select(Nightingale,
ends_with("rate")) %>%
ts(start = c(1854, 4), frequency = 12)
autoplot(nDWO.rate)
```
```{r night-rate-ts-auto, eval = FALSE}
```
The army size does not vary enough over the time where the casualties
are high to affect the pattern much:
```{r night-army-ts-auto, echo = FALSE}
nArmy <- select(Nightingale, Army) %>%
ts(start = c(1854, 4), frequency = 12)
autoplot(nArmy)
```
```{r night-army-ts-auto, eval = FALSE}
```
Instead of showing three series in one plot you can also use faceting.
```{r night-auto-facet, echo = FALSE}
autoplot(nDWO, facets = TRUE)
```
```{r night-auto-facet, eval = FALSE}
```
To create the plot of `Disease`, `Wounds`, and `Other` using basic
`ggplot` it is best to first convert the data into long form.
You can create a long version of the data, using either `pivot_longer`
or `broom::tidy`.
Using `broom:tidy` on the `mts` object:
```{r night-ggplot-broom, echo = FALSE, fig.width = 7.5}
broom::tidy(nDWO) %>%
rename(Date = index,
Deaths = value,
Cause = series) %>%
ggplot(aes(x = Date,
y = Deaths,
color = Cause)) +
geom_line()
```
```{r night-ggplot-broom, eval = FALSE}
```
Using `pivot_longer` on the original data frame:
```{r night-ggplot-pivot-longer, echo = FALSE, fig.width = 7.5}
select(Nightingale,
Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "Cause",
values_to = "Deaths") %>%
ggplot(aes(x = Date,
y = Deaths,
color = Cause)) +
geom_line()
```
```{r night-ggplot-pivot-longer, eval = FALSE}
```
Using faceting instead of a single plot:
```{r night-ggplot-facet, echo = FALSE}
select(Nightingale,
Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "Cause",
values_to = "Deaths") %>%
ggplot(aes(x = Date, y = Deaths)) +
geom_line() +
facet_wrap(~ Cause, ncol = 1)
```
```{r night-ggplot-facet, eval = FALSE}
```
To focus on the shape of the curves it is often useful to allow the
vertical axis ranges to adjust to the data:
```{r night-ggplot-facet-free-y, echo = FALSE}
select(Nightingale,
Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "Cause",
values_to = "Deaths") %>%
ggplot(aes(x = Date, y = Deaths)) +
geom_line() +
facet_wrap(~ Cause, ncol = 1,
scales = "free_y")
```
```{r night-ggplot-facet-free-y, eval = FALSE}
```
This is particularly useful when the ranges are very different, for
example if the `Army` series is included:
```{r night-ggplot-facet-free-y-army, echo = FALSE}
select(Nightingale,
Army, Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "series",
values_to = "Deaths") %>%
ggplot(aes(Date, Deaths)) +
geom_line() +
facet_wrap(~ series, ncol = 1,
scales = "free_y")
```
```{r night-ggplot-facet-free-y-army, eval = FALSE}
```
## Some Notes
Using `autoplot` is often a quick way to get useful plots.
Using basic `ggplot` graphics is a bit more work but provides more
flexibility.
## Many Time Series
Annual flow for dams in Italy:
```{r italy-dams-single, echo = FALSE, fig.width = 8}
data(hydroSIMN, package = "nsRFA")
annualflows <- mutate(annualflows,
cod = factor(cod))
ggplot(annualflows,
aes(x = anno, y = dato,
group = cod,
color = factor(cod))) +
geom_line()
```
Clearly showing all series in a single plot is not helpful!
```{r italy-dams-single, eval = FALSE}
```
With small multiples some patterns can be seen:
```{r italy-dams-facet, echo = FALSE, fig.width = 9, fig.height = 9}
ggplot(annualflows,
aes(x = anno, y = dato,
group = cod)) +
geom_line() +
facet_wrap(~ cod) +
theme(axis.text.x =
element_text(angle = 45,
hjust = 1))
```
```{r italy-dams-facet, eval = FALSE}
```
Small multiples can also be arranged in ways that convey spatial information:
```{r unemp-geofacet, echo = FALSE, fig.width = 9, fig.height = 9}
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
geom_line() +
facet_geo(~ state,
grid = "us_state_grid2",
label = "name") +
scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
caption = "Data Source: bls.gov",
x = "Year",
y = "Unemployment Rate (%)") +
theme(strip.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
axis.text.x = element_blank())
```
```{r unemp-geofacet, eval = FALSE}
```
## The Fall and Rise of Income Inequality
A [2015 post](
http://www.npr.org/sections/money/2015/02/11/384988128/the-fall-and-rise-of-u-s-inequality-in-2-graphs)
on the NPR Planet Money site uses two graphs to show transitions in
income inequality.
The page includes a link to the
[data source](http://topincomes.parisschoolofeconomics.eu/).
This has been unavailable for some time.
I tried to obtain the same data but may not have done it quite right.
The closest source I could find is .
The data is available locally:
```{r}
income <- read.table("https://www.stat.uiowa.edu/~luke/data/nprincome.dat")
```
Plots over time:
```{r income-single, echo = FALSE, fig.width = 7.5}
select(income, -year) %>%
ts(start = 1913) %>%
autoplot()
```
```{r income-single, eval = FALSE}
```
The ranges are too different for this to be effective.
One alternative: standardize the series to one, or 100%, at some point.
Here the initial value in 1913 is used:
```{r income-single-rel, echo = FALSE, fig.width = 7.5}
select(income, -year) %>%
transmute(U01rel = U01 / U01[1],
L90rel = L90 / L90[1]) %>%
ts(start = 1913) %>%
autoplot()
```
```{r income-single-rel, eval = FALSE}
```
Faceted plots, with free $y$ axes, are another option:
```{r income-facet, echo = FALSE}
select(income, -year) %>%
ts(start = 1913) %>%
autoplot(facets = TRUE)
```
```{r income-facet, eval = FALSE}
```
With two series another option is a _connected scatter plot_.
```{r income-connected-1, echo = FALSE}
library(ggrepel)
ggplot(income, aes(U01, L90)) +
geom_path(size = 2) +
labs(x = "Average income for the top 1%",
y = "Average income for the bottom 90%")
```
```{r income-connected-1, eval = FALSE}
```
This simple form does not show the direction or scale of time.
Color or annotation can be used to show direction and scale of time:
```{r income-connected-2, echo = FALSE, fig.width = 7.5, warning = FALSE}
library(ggrepel)
ggplot(income, aes(U01, L90, color = year)) +
geom_path(size = 2) +
geom_text_repel(
aes(label =
ifelse(year %% 5 == 0 |
year == max(year),
year,
"")),
color = "grey50") +
labs(x = "Average income for the top 1%",
y = "Average income for the bottom 90%")
```
```{r income-connected-2, eval = FALSE}
```
Animated is another option:
```{r income-gganim, echo = FALSE, cache = TRUE, message = FALSE}
library(gganimate)
p <- ggplot(income,
aes(x = U01, y = L90,
color = year)) +
geom_path(size = 2,
color = "lightgrey",
data = mutate(income,
year = NULL)) +
geom_path(size = 2) +
labs(title = "Year: {frame_along}",
x = "Average income for the top 1%",
y = "Average income for the bottom 90%") +
transition_reveal(year)
animate(p, height = 350, width = 500,
fps = 5, end_pause = 10)
```
```{r income-gganim, eval = FALSE}
```
## Stack Loss Data
Operational data obtained From 21 days of operation of a plant for the
oxidation of ammonia (NH3) to nitric acid (HNO3).
The dependent variable, `stack.loss` is 10 times the percentage of the
ingoing ammonia to the plant that escapes from the absorption column
unabsorbed.
```{r stackloss-splom, echo = FALSE}
library(lattice)
splom(stackloss)
```
```{r stackloss-splom, eval = FALSE}
```
The data are collected over time:
```{r stackloss-splom-line, echo = FALSE, fig.height = 7}
library(dplyr)
sl <- mutate(stackloss,
intex = 1 : nrow(stackloss))
cpal <- colorRampPalette(c("#132B43", "#56B1F7"))
cols <- cpal(nrow(stackloss) - 1)
pfun <- function(x, y, ...) {
n <- length(x)
lsegments(x[-n], y[-n], x[-1], y[-1],
col = cols, lwd = 3)
}
splom(stackloss, panel = pfun)
```
```{r stackloss-splom-line, eval = FALSE}
```
Randomly reordering the indices can help calibrate whether there is a
time effect:
```{r stackloss-splom-line-rnd, echo = FALSE, fig.height = 7}
splom(sample_n(stackloss, nrow(stackloss)),
panel = pfun)
```
```{r stackloss-splom-line-rnd, eval = FALSE}
```
This is sometimes called _graphical inference_.
> Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja,
> "Graphical Inference for Infovis," _IEEE Transactions on Visualization
> and Computer Graphics_, Vol. 16, No. 6, November/December 2010.
Picking one plot and embedding the true order among 19 random orderings:
```{r stack-loss-embed, echo = FALSE}
N <- 20
k <- sample(N, 1)
ord <- seq_len(nrow(stackloss))
ss <- function(i) {
if (i == k)
sl <- stackloss
else
sl <- sample_n(stackloss, nrow(stackloss))
mutate(sl, order = ord, sample = i)
}
d <- lapply(1 : N, ss)
d <- do.call(rbind, d)
library(ggplot2)
ggplot(d, aes(x = Acid.Conc., y = stack.loss, color = order)) +
geom_path(size = 0.8) + facet_wrap(~ sample) +
guides(color = "none")
```
True order: plot `r k`.
The code:
```{r stack-loss-embed, eval = FALSE}
```
## Lynx-Hare Data
Trapping data on Canadian lynx and snowshoe hare:
```{r lynxhare-facets, echo = FALSE}
lynxhare <- read.table("https://www.stat.uiowa.edu/~luke/data/LynxHare.txt")
names(lynxhare) <- c("year", "hare", "lynx")
lhts <- ts(select(lynxhare, -year), start = 1845)
autoplot(lhts, facets = TRUE)
```
Cycle patterns are similar.
```{r lynxhare-facets, eval = FALSE}
```
Lynx cycles often lag behind:
```{r lynxhare-single, echo = FALSE}
autoplot(lhts)
```
```{r lynxhare-single, eval = FALSE}
```
## Things to Look For
Changes in data definition.
* Administrative definitions, such as employment/unemployment can
change over time.
* Administrative entities, such as countries and states, can
change over time.
* Instruments can change (satelite imagery replacing surface
measurements).
Outliers need not be globally outlying: departures from local
patterns may also be important.
Level shifts and changes in volatility.
Shocks affecting level or volatility.
## Australian Airline Data
[Weekly passenger loads](https://www.otexts.org/fpp/2/1), in
thousands, on Ansett Airlines between Australia’s two largest cities
(available as `melsyd` in package `fpp`):
```{r australian-airlines, echo = FALSE, message = FALSE}
data(melsyd, package = "fpp")
autoplot(melsyd, facets = TRUE)
```
Some features:
* Strike in 1989.
* Increase in passenger load starting in second half of 1991.
* Dip in economy, rise in business in 1992 corresponding to an experiment
with fewer economy/more business seats.
```{r australian-airlines, eval = FALSE}
```
## Weekly SP 500 Closing Returns
Weekly closing returns of the SP 500 from 2003 to September, 2012:
```{r SP500, echo = FALSE, message = FALSE}
library(xts)
library(astsa)
autoplot(sp500w)
```
Some features:
* Increased volatility at the start of the financial crisis.
* Volatility seems to be decreasing, but not very fast.
```{r SP500, eval = FALSE}
```
## Forecasting
Studying the past may help predict the future.
Forecasting a time series usually involves choosing a model and
running the model forward.
Accuracy of forecasts decreases rapidly the farther ahead the forecast
is made.
An example from the `forecast` package: `WWWusage` is a time series of
the numbers of users connected to the Internet.
```{r WWWusage, eval = FALSE}
library(forecast)
fit <- ets(WWWusage)
fc <- forecast(WWWusage, model = fit)
autoplot(fc)
```
```{r WWWusage, echo = FALSE, message = FALSE}
```
## Labeling Trend Lines
For trend lines using a standard guide can be a little awkward.
Average life expectancy against year for the continents in the
`gapminder` data:
```{r trend-lines-guide, echo = FALSE, message = FALSE, fig.width = 7.5}
library(gapminder)
d <- group_by(gapminder,
continent, year) %>%
summarize(avgLifeExp = mean(lifeExp)) %>%
ungroup()
p <- ggplot(d,
aes(x = year,
y = avgLifeExp,
color = continent)) +
geom_line(size = 1)
p
```
```{r trend-lines-guide, eval = FALSE}
```
With the legend on the right, a small improvement is to order the
levels to match the order of the right hand curve values:
```{r trend-lines-guide-reorder, echo = FALSE}
d_last <- filter(d, year == max(year)) %>%
mutate(continent =
reorder(continent, -avgLifeExp))
p %+%
mutate(d,
continent =
factor(continent,
levels(d_last$continent)))
```
```{r trend-lines-guide-reorder, eval = FALSE}
```
_Direct labeling_ is often a better choice.
One option is to use `geom_text` at the ends of the lines.
This may need some additional adjustments to margins and clipping
behavior:
```{r trend-lines-direct, echo = FALSE, fig.width = 7.5}
p <- p + guides(color = "none")
d_last <- filter(d, year == max(year))
p + geom_text(aes(label = continent),
data = d_last,
size = 5,
hjust = -0.1) +
coord_cartesian(clip = "off",
expand = FALSE) +
theme(plot.margin =
margin(30, 60, 10, 10))
```
```{r trend-lines-direct, eval = FALSE}
```
Another option is to use a second axis:
```{r trend-lines-sec-axis, echo = FALSE, fig.width = 7.5}
p + scale_y_continuous(
sec.axis = dup_axis(
breaks = d_last$avgLifeExp,
labels = d_last$continent,
name = NULL)) +
scale_x_continuous(expand = c(0, 0))
```
```{r trend-lines-sec-axis, eval = FALSE}
```
It is also possible to place the labels along the lines.
The [`geomtextpath`
package](https://allancameron.github.io/geomtextpath/) provides
support for this.
```{r trend-lines-text-path, echo = FALSE}
library(geomtextpath)
ggplot(d, aes(x = year,
y = avgLifeExp,
color = continent)) +
geom_textline(aes(label = continent),
linewidth = 1,
vjust = -0.5,
hjust = 0.95) +
guides(color = "none")
```
```{r trend-lines-text-path, eval = FALSE}
```
## Other Approaches
A [talk from
2019](http://www.visualisingdata.com/2019/04/talk-slides-design-of-time/)
describes a wide range of options for visualizing temporal data.
## Reading
Chapters [_Visualizing time series and other functions of an
independent
variable_](https://clauswilke.com/dataviz/time-series.html) and
[_Visualizing
trends_](https://clauswilke.com/dataviz/visualizing-trends.html) in
[_Fundamentals of Data
Visualization_](https://clauswilke.com/dataviz/).
## Exercises
1. Creating a time series or multiple time series object and using
autoplot can be an efficient way to obtain line plots of time
series data. The following code uses the `weather` table in the
`nycflights13` package to create a data frame containing minimum
and maximum temperatures for each month in 2013:
```{r}
library(tidyverse)
library(nycflights13)
library(lubridate)
library(forecast)
temps <- group_by(weather, year, month) %>%
summarize(max_temp = max(temp, na.rm = TRUE),
min_temp = min(temp, na.rm = TRUE)) %>%
ungroup()
```
Which of the following expressions produces a line plot showing
the minimum and maximum temperature series (and no other series)?
a. `autoplot(temps)`
b. `autoplot(temps$max_temp, temps$min_temp)`
c. `ts(temps) %>% autoplot()`
d. `select(temps, max_temp, min_temp) %>% ts() %>% autoplot()`