Multiple Time Series
Some questions for multiple time series:
How do the series behave individually?
How are the series related?
Does one series lead or lag the others?
Some options for plotting multiple series:
separate panels in a trellis display;
multiple series in a single plot - this will require standardizing if the scales vary substantially;
a multivatiate plot with time represented by connecting line segments or animation.
Another option for two time series:
use a separate y
axis for each.
This is usually a bad idea.
Some data structures for multiple time series:
Crimean War Casualties
The HistData
package contains the data used by Florence Nightingale to call attention to terrible sanitary conditions for the Britich army in Crimea.
data(Nightingale, package = "HistData")
head(select(Nightingale, -Date))
## Month Year Army Disease Wounds Other Disease.rate Wounds.rate Other.rate
## 1 Apr 1854 8571 1 0 5 1.4 0.0 7.0
## 2 May 1854 23333 12 0 9 6.2 0.0 4.6
## 3 Jun 1854 28333 11 0 6 4.7 0.0 2.5
## 4 Jul 1854 28722 359 0 23 150.0 0.0 9.6
## 5 Aug 1854 30246 828 1 30 328.5 0.4 11.9
## 6 Sep 1854 30290 788 81 70 312.2 32.1 27.7
A plot of the Disease
, Wounds
, and Other
:
nDWO <- select(Nightingale,
Disease, Wounds, Other) %>%
ts(start = c(1854, 4), frequency = 12)
autoplot(nDWO)
Using the rate variables produces a very similar pattern:
nDWO.rate <- select(Nightingale,
ends_with("rate")) %>%
ts(start = c(1854, 4), frequency = 12)
autoplot(nDWO.rate)
The army size does not vary enough over the time where the casualties are high to affect the pattern much:
nArmy <- select(Nightingale, Army) %>%
ts(start = c(1854, 4), frequency = 12)
autoplot(nArmy)
Instead of showing three series in one plot you can also use faceting.
autoplot(nDWO, facets = TRUE)
To create the plot of Disease
, Wounds
, and Other
using basic ggplot
it is best to first convert the data into long form.
You can create a long version of the data, using either pivot_longer
or broom::tidy
.
Using broom:tidy
on the mts
object:
broom::tidy(nDWO) %>%
rename(Date = index,
Deaths = value,
Cause = series) %>%
ggplot(aes(x = Date,
y = Deaths,
color = Cause)) +
geom_line()
Using pivot_longer
on the original data frame:
select(Nightingale,
Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "Cause",
values_to = "Deaths") %>%
ggplot(aes(x = Date,
y = Deaths,
color = Cause)) +
geom_line()
Using faceting instead of a single plot:
select(Nightingale,
Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "Cause",
values_to = "Deaths") %>%
ggplot(aes(x = Date, y = Deaths)) +
geom_line() +
facet_wrap(~ Cause, ncol = 1)
To focus on the shape of the curves it is often useful to allow the vertical axis ranges to adjust to the data:
select(Nightingale,
Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "Cause",
values_to = "Deaths") %>%
ggplot(aes(x = Date, y = Deaths)) +
geom_line() +
facet_wrap(~ Cause, ncol = 1,
scales = "free_y")
This is particularly useful when the ranges are very different, for example if the Army
series is included:
select(Nightingale,
Army, Date, Disease, Wounds, Other) %>%
pivot_longer(-Date,
names_to = "series",
values_to = "Deaths") %>%
ggplot(aes(Date, Deaths)) +
geom_line() +
facet_wrap(~ series, ncol = 1,
scales = "free_y")
Some Notes
Using autoplot
is often a quick way to get useful plots.
Using basic ggplot
graphics is a bit more work but provides more flexibility.
Many Time Series
Annual flow for dams in Italy:
Clearly showing all series in a single plot is not helpful!
data(hydroSIMN, package = "nsRFA")
annualflows <- mutate(annualflows,
cod = factor(cod))
ggplot(annualflows,
aes(x = anno, y = dato,
group = cod,
color = factor(cod))) +
geom_line()
With small multiples some patterns can be seen:
ggplot(annualflows,
aes(x = anno, y = dato,
group = cod)) +
geom_line() +
facet_wrap(~ cod) +
theme(axis.text.x =
element_text(angle = 45,
hjust = 1))
Small multiples can also be arranged in ways that convey spatial information:
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
geom_line() +
facet_geo(~ state,
grid = "us_state_grid2",
label = "name") +
scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
caption = "Data Source: bls.gov",
x = "Year",
y = "Unemployment Rate (%)") +
theme(strip.text.x = element_text(size = 6),
axis.text.y = element_text(size = 6),
axis.text.x = element_blank())
The Fall and Rise of Income Inequality
A 2015 post on the NPR Planet Money site uses two graphs to show transitions in income inequality.
The page includes a link to the data source .
This has been unavailable for some time.
I tried to obtain the same data but may not have done it quite right.
The closest source I could find is https://wid.world/data/ .
The data is available locally:
income <- read.table("https://www.stat.uiowa.edu/~luke/data/nprincome.dat")
Plots over time:
select(income, -year) %>%
ts(start = 1913) %>%
autoplot()
The ranges are too different for this to be effective.
One alternative: standardize the series to one, or 100%, at some point.
Here the initial value in 1913 is used:
select(income, -year) %>%
transmute(U01rel = U01 / U01[1],
L90rel = L90 / L90[1]) %>%
ts(start = 1913) %>%
autoplot()
Faceted plots, with free \(y\) axes, are another option:
select(income, -year) %>%
ts(start = 1913) %>%
autoplot(facets = TRUE)
With two series another option is a connected scatter plot .
library(ggrepel)
ggplot(income, aes(U01, L90)) +
geom_path(linewidth = 2) +
labs(x = "Average income for the top 1%",
y = "Average income for the bottom 90%")
This simple form does not show the direction or scale of time.
Color or annotation can be used to show direction and scale of time:
library(ggrepel)
ggplot(income, aes(U01, L90, color = year)) +
geom_path(size = 2) +
geom_text_repel(
aes(label =
ifelse(year %% 5 == 0 |
year == max(year),
year,
"")),
color = "grey50") +
labs(x = "Average income for the top 1%",
y = "Average income for the bottom 90%")
Animated is another option:
library(gganimate)
p <- ggplot(income,
aes(x = U01, y = L90,
color = year)) +
geom_path(size = 2,
color = "lightgrey",
data = mutate(income,
year = NULL)) +
geom_path(size = 2) +
labs(title = "Year: {frame_along}",
x = "Average income for the top 1%",
y = "Average income for the bottom 90%") +
transition_reveal(year)
animate(p, height = 350, width = 500,
fps = 5, end_pause = 10)
Stack Loss Data
Operational data obtained From 21 days of operation of a plant for the oxidation of ammonia (NH3) to nitric acid (HNO3).
The dependent variable, stack.loss
is 10 times the percentage of the ingoing ammonia to the plant that escapes from the absorption column unabsorbed.
library(lattice)
splom(stackloss)
The data are collected over time:
library(dplyr)
sl <- mutate(stackloss,
intex = seq_len(nrow(stackloss)))
cpal <- colorRampPalette(c("#132B43", "#56B1F7"))
cols <- cpal(nrow(stackloss) - 1)
pfun <- function(x, y, ...) {
n <- length(x)
lsegments(x[-n], y[-n], x[-1], y[-1],
col = cols, lwd = 3)
}
splom(stackloss, panel = pfun)
Randomly reordering the indices can help calibrate whether there is a time effect:
splom(sample_n(stackloss, nrow(stackloss)),
panel = pfun)
This is sometimes called graphical inference .
Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja, “Graphical Inference for Infovis,” IEEE Transactions on Visualization and Computer Graphics , Vol. 16, No. 6, November/December 2010.
Picking one plot and embedding the true order among 19 random orderings:
True order: plot 4.
The code:
N <- 20
k <- sample(N, 1)
ord <- seq_len(nrow(stackloss))
ss <- function(i) {
if (i == k)
sl <- stackloss
else
sl <- sample_n(stackloss, nrow(stackloss))
mutate(sl, order = ord, sample = i)
}
d <- lapply(1 : N, ss)
d <- do.call(rbind, d)
library(ggplot2)
ggplot(d, aes(x = Acid.Conc., y = stack.loss, color = order)) +
geom_path(size = 0.8) + facet_wrap(~ sample) +
guides(color = "none")
Lynx-Hare Data
Trapping data on Canadian lynx and snowshoe hare:
Cycle patterns are similar.
lynxhare <- read.table("https://www.stat.uiowa.edu/~luke/data/LynxHare.txt")
names(lynxhare) <- c("year", "hare", "lynx")
lhts <- ts(select(lynxhare, -year), start = 1845)
autoplot(lhts, facets = TRUE)
Lynx cycles often lag behind:
autoplot(lhts)
Things to Look For
Changes in data definition.
Administrative definitions, such as employment/unemployment can change over time.
Administrative entities, such as countries and states, can change over time.
Instruments can change (satelite imagery replacing surface measurements).
Outliers need not be globally outlying: departures from local patterns may also be important.
Level shifts and changes in volatility.
Shocks affecting level or volatility.
Australian Airline Data
Weekly passenger loads , in thousands, on Ansett Airlines between Australia’s two largest cities (available as melsyd
in package fpp
):
Some features:
Strike in 1989.
Increase in passenger load starting in second half of 1991.
Dip in economy, rise in business in 1992 corresponding to an experiment with fewer economy/more business seats.
data(melsyd, package = "fpp")
autoplot(melsyd, facets = TRUE)
Weekly SP 500 Closing Returns
Weekly closing returns of the SP 500 from 2003 to September, 2012:
Some features:
Increased volatility at the start of the financial crisis.
Volatility seems to be decreasing, but not very fast.
library(xts)
library(astsa)
autoplot(sp500w)
Forecasting
Studying the past may help predict the future.
Forecasting a time series usually involves choosing a model and running the model forward.
Accuracy of forecasts decreases rapidly the farther ahead the forecast is made.
An example from the forecast
package: WWWusage
is a time series of the numbers of users connected to the Internet.
library(forecast)
fit <- ets(WWWusage)
fc <- forecast(WWWusage, model = fit)
autoplot(fc)
Labeling Trend Lines
For trend lines using a standard guide can be a little awkward.
Average life expectancy against year for the continents in the gapminder
data:
library(gapminder)
d <- group_by(gapminder,
continent, year) %>%
summarize(avgLifeExp = mean(lifeExp)) %>%
ungroup()
p <- ggplot(d,
aes(x = year,
y = avgLifeExp,
color = continent)) +
geom_line(size = 1)
p
With the legend on the right, a small improvement is to order the levels to match the order of the right hand curve values:
d_last <- filter(d, year == max(year)) %>%
mutate(continent =
reorder(continent, -avgLifeExp))
p %+%
mutate(d,
continent =
factor(continent,
levels(d_last$continent)))
Direct labeling is often a better choice.
One option is to use geom_text
at the ends of the lines.
This may need some additional adjustments to margins and clipping behavior:
p <- p + guides(color = "none")
d_last <- filter(d, year == max(year))
p + geom_text(aes(label = continent),
data = d_last,
size = 5,
hjust = -0.1) +
coord_cartesian(clip = "off",
expand = FALSE) +
theme(plot.margin =
margin(30, 60, 10, 10))
Another option is to use a second axis:
p + scale_y_continuous(
sec.axis = dup_axis(
breaks = d_last$avgLifeExp,
labels = d_last$continent,
name = NULL)) +
scale_x_continuous(expand = c(0, 0))
It is also possible to place the labels along the lines.
The geomtextpath
package provides support for this.
library(geomtextpath)
ggplot(d, aes(x = year,
y = avgLifeExp,
color = continent)) +
geom_textline(aes(label = continent),
linewidth = 1,
vjust = -0.5,
hjust = 0.95) +
guides(color = "none")
Bump Charts
Top 10 countries in terms of GDP per capita in 2007:
top_gdp <- filter(gapminder,
continent == "Europe",
year == max(year)) %>%
slice_max(gdpPercap, n = 10) %>%
pull(country)
d <- filter(gapminder,
country %in% top_gdp) %>%
group_by(year) %>%
mutate(
rank = min_rank(desc(gdpPercap))) %>%
ungroup()
d_last <- filter(d, year == max(year))
ggplot(d, aes(year,
gdpPercap,
color = country)) +
geom_line()
The dominant feature is the overall trend.
One option: show deviations from the overall trend.
Another option: show ranks within years.
This is called a bump chart .
A simple bump chart:
ggplot(d, aes(year, rank, color = country)) +
geom_line(size = 1) +
scale_y_reverse(breaks = 1:10) +
geom_text(aes(label = country),
data = d_last,
size = 5,
hjust = -0.1) +
coord_cartesian(clip = "off",
expand = FALSE) +
theme(plot.margin =
margin(30, 90, 10, 10)) +
guides(color = "none")
The ggbump
package provides geom_bump
for drawing smooth bump chart lines:
library(ggbump)
ggplot(d, aes(year, rank, color = country)) +
geom_bump(size = 1) +
scale_y_reverse(breaks = 1:10) +
geom_text(aes(label = country),
data = d_last,
size = 5,
hjust = -0.1) +
coord_cartesian(clip = "off",
expand = FALSE) +
theme(plot.margin =
margin(30, 90, 10, 10)) +
guides(color = "none")
Other Approaches
A talk from 2019 describes a wide range of options for visualizing temporal data.
Exercises
Creating a time series or multiple time series object and using autoplot can be an efficient way to obtain line plots of time series data. The following code uses the weather
table in the nycflights13
package to create a data frame containing minimum and maximum temperatures for each month in 2013:
library(tidyverse)
library(nycflights13)
library(lubridate)
library(forecast)
temps <- group_by(weather, year, month) %>%
summarize(max_temp = max(temp, na.rm = TRUE),
min_temp = min(temp, na.rm = TRUE)) %>%
ungroup()
Which of the following expressions produces a line plot showing the minimum and maximum temperature series (and no other series)?
autoplot(temps)
autoplot(temps$max_temp, temps$min_temp)
ts(temps) %>% autoplot()
select(temps, max_temp, min_temp) %>% ts() %>% autoplot()
---
title: "Plots for Multiple Time Series"
output:
  html_document:
    toc: yes
    code_folding: show
    code_download: true
---

<link rel="stylesheet" href="stat4580.css" type="text/css" />
<style type="text/css"> .remark-code { font-size: 85%; } </style>
```{r setup, include = FALSE}
source(here::here("setup.R"))
knitr::opts_chunk$set(collapse = TRUE, message = FALSE,
                      fig.height = 5, fig.width = 6, fig.align = "center")

options(htmltools.dir.version = FALSE)

set.seed(12345)
library(ggplot2)
library(lattice)
library(tidyverse)
library(gridExtra)
library(forecast)
theme_set(theme_minimal() +
          theme(text = element_text(size = 16),
                panel.border = element_rect(color = "grey30", fill = NA)))
```


## Multiple Time Series

Some questions for multiple time series:

- How do the series behave individually?

- How are the series related?

- Does one series lead or lag the others?

Some options for plotting multiple series:

- separate panels in a trellis display;

- multiple series in a single plot - this will require standardizing
  if the scales vary substantially;

- a multivatiate plot with time represented by connecting line
  segments or animation.

Another option for two time series:

- use a separate `y` axis for each.

This is usually a bad idea.

Some data structures for multiple time series:

* `mts` objects created by `ts()`.

    - All series have to be numeric.
    - `autoplot` methods may be sufficient.

* Regular data frames.

    - The series will have to be plotted in separate layers,
    - or the data frames will need converting into long form.


## Crimean War Casualties

The `HistData` package contains the data used by Florence Nightingale
to call attention to terrible sanitary conditions for the Britich army
in Crimea.

```{r}
data(Nightingale, package = "HistData")
head(select(Nightingale, -Date))
```

A plot of the `Disease`, `Wounds`, and `Other`:

```{r night-ts-auto, echo = FALSE, fig.width = 7.5}
nDWO <- select(Nightingale,
               Disease, Wounds, Other) %>%
    ts(start = c(1854, 4), frequency = 12)

autoplot(nDWO)
```
```{r night-ts-auto, eval = FALSE}
```

Using the rate variables produces a very similar pattern:

```{r night-rate-ts-auto, echo = FALSE, fig.width = 7.5}
nDWO.rate <- select(Nightingale,
                    ends_with("rate")) %>%
    ts(start = c(1854, 4), frequency = 12)

autoplot(nDWO.rate)
```
```{r night-rate-ts-auto, eval = FALSE}
```

The army size does not vary enough over the time where the casualties
are high to affect the pattern much:

```{r night-army-ts-auto, echo = FALSE}
nArmy <- select(Nightingale, Army) %>%
    ts(start = c(1854, 4), frequency = 12)

autoplot(nArmy)
```
```{r night-army-ts-auto, eval = FALSE}
```

Instead of showing three series in one plot you can also use faceting.

```{r night-auto-facet, echo = FALSE}
autoplot(nDWO, facets = TRUE)
```
```{r night-auto-facet, eval = FALSE}
```

To create the plot of `Disease`, `Wounds`, and `Other` using basic
`ggplot` it is best to first convert the data into long form.

You can create a long version of the data, using either `pivot_longer`
or `broom::tidy`.

Using `broom:tidy` on the `mts` object:

```{r night-ggplot-broom, echo = FALSE, fig.width = 7.5}
broom::tidy(nDWO) %>%
    rename(Date = index,
           Deaths = value,
           Cause = series) %>%
    ggplot(aes(x = Date,
               y = Deaths,
               color = Cause)) +
    geom_line()
```
```{r night-ggplot-broom, eval = FALSE}
```

Using `pivot_longer` on the original data frame:

```{r night-ggplot-pivot-longer, echo = FALSE, fig.width = 7.5}
select(Nightingale,
       Date, Disease, Wounds, Other) %>%
    pivot_longer(-Date,
                 names_to = "Cause",
                 values_to = "Deaths") %>%
    ggplot(aes(x = Date,
               y = Deaths,
               color = Cause)) +
    geom_line()
```
```{r night-ggplot-pivot-longer, eval = FALSE}
```

Using faceting instead of a single plot:

```{r night-ggplot-facet, echo = FALSE}
select(Nightingale,
       Date, Disease, Wounds, Other) %>%
    pivot_longer(-Date,
                 names_to = "Cause",
                 values_to = "Deaths") %>%
    ggplot(aes(x = Date, y = Deaths)) +
    geom_line() +
    facet_wrap(~ Cause, ncol = 1)
```
```{r night-ggplot-facet, eval = FALSE}
```

To focus on the shape of the curves it is often useful to allow the
vertical axis ranges to adjust to the data:

```{r night-ggplot-facet-free-y, echo = FALSE}
select(Nightingale,
       Date, Disease, Wounds, Other) %>%
    pivot_longer(-Date,
                 names_to = "Cause",
                 values_to = "Deaths") %>%
    ggplot(aes(x = Date, y = Deaths)) +
    geom_line() +
    facet_wrap(~ Cause, ncol = 1,
               scales = "free_y")
```
```{r night-ggplot-facet-free-y, eval = FALSE}
```

This is particularly useful when the ranges are very different, for
example if the `Army` series is included:

```{r night-ggplot-facet-free-y-army, echo = FALSE}
select(Nightingale,
       Army, Date, Disease, Wounds, Other) %>%
    pivot_longer(-Date,
                 names_to = "series",
                 values_to = "Deaths") %>%
    ggplot(aes(Date, Deaths)) +
    geom_line() +
    facet_wrap(~ series, ncol = 1,
               scales = "free_y")
```
```{r night-ggplot-facet-free-y-army, eval = FALSE}
```


## Some Notes

Using `autoplot` is often a quick way to get useful plots.

Using basic `ggplot` graphics is a bit more work but provides more
flexibility.


## Many Time Series

Annual flow for dams in Italy:

```{r italy-dams-single, echo = FALSE, fig.width = 8}
data(hydroSIMN, package = "nsRFA")
annualflows <- mutate(annualflows,
                      cod = factor(cod))

ggplot(annualflows,
       aes(x = anno, y = dato,
           group = cod,
           color = factor(cod))) +
    geom_line()
```
Clearly showing all series in a single plot is not helpful!

```{r italy-dams-single, eval = FALSE}
```

With small multiples some patterns can be seen:

```{r italy-dams-facet, echo = FALSE, fig.width = 9, fig.height = 9}
ggplot(annualflows,
       aes(x = anno, y = dato,
           group = cod)) +
    geom_line() +
    facet_wrap(~ cod) +
    theme(axis.text.x =
              element_text(angle = 45,
                           hjust = 1))
```
```{r italy-dams-facet, eval = FALSE}
```

Small multiples can also be arranged in ways that convey spatial information:

```{r unemp-geofacet, echo = FALSE, fig.width = 9, fig.height = 9}
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
    geom_line() +
    facet_geo(~ state,
              grid = "us_state_grid2",
              label = "name") +
    scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
    labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
         caption = "Data Source: bls.gov",
         x = "Year",
         y = "Unemployment Rate (%)") +
    theme(strip.text.x = element_text(size = 6),
          axis.text.y = element_text(size = 6),
          axis.text.x = element_blank())
```
```{r unemp-geofacet, eval = FALSE}
```


## The Fall and Rise of Income Inequality

A [2015
post](https://www.npr.org/sections/money/2015/02/11/384988128/the-fall-and-rise-of-u-s-inequality-in-2-graphs)
on the NPR Planet Money site uses two graphs to show transitions in
income inequality.

The page includes a link to the
<!-- this times out in the link check -->
[data source](http://topincomes.parisschoolofeconomics.eu/).

This has been unavailable for some time.

I tried to obtain the same data but may not have done it quite right.

The closest source I could find is <https://wid.world/data/>.

The data is available locally:

```{r}
income <- read.table("https://www.stat.uiowa.edu/~luke/data/nprincome.dat")
```

Plots over time:

```{r income-single, echo = FALSE, fig.width = 7.5}
select(income, -year) %>%
    ts(start = 1913) %>%
    autoplot()
```

```{r income-single, eval = FALSE}
```

The ranges are too different for this to be effective.

One alternative: standardize the series to one, or 100%, at some point.

Here the initial value in 1913 is used:

```{r income-single-rel, echo = FALSE, fig.width = 7.5}
select(income, -year) %>%
    transmute(U01rel = U01 / U01[1],
              L90rel = L90 / L90[1]) %>%
    ts(start = 1913) %>%
    autoplot()
```
```{r income-single-rel, eval = FALSE}
```

Faceted plots, with free $y$ axes, are another option:

```{r income-facet, echo = FALSE}
select(income, -year) %>%
    ts(start = 1913) %>%
    autoplot(facets = TRUE)
```
```{r income-facet, eval = FALSE}
```

With two series another option is a _connected scatter plot_.

```{r income-connected-1, echo = FALSE}
library(ggrepel)
ggplot(income, aes(U01, L90)) +
    geom_path(linewidth = 2) +
    labs(x = "Average income for the top 1%",
         y = "Average income for the bottom 90%")
```
```{r income-connected-1, eval = FALSE}
```

This simple form does not show the direction or scale of time.

Color or annotation can be used to show direction and scale of time:

```{r income-connected-2, echo = FALSE, fig.width = 7.5, warning = FALSE}
library(ggrepel)
ggplot(income, aes(U01, L90, color = year)) +
    geom_path(size = 2) +
    geom_text_repel(
        aes(label =
                ifelse(year %% 5 == 0 |
                       year == max(year),
                       year,
                       "")),
        color = "grey50") +
    labs(x = "Average income for the top 1%",
         y = "Average income for the bottom 90%")
```
```{r income-connected-2, eval = FALSE}
```

Animated is another option:

<!-- **** could look into getting closer to the original NPR version -->

```{r income-gganim, echo = FALSE, cache = TRUE, message = FALSE}
library(gganimate)
p <- ggplot(income,
            aes(x = U01, y = L90,
                color = year)) +
    geom_path(size = 2,
              color = "lightgrey",
              data = mutate(income,
                            year = NULL)) +
    geom_path(size = 2) +
    labs(title = "Year: {frame_along}",
         x = "Average income for the top 1%",
         y = "Average income for the bottom 90%") +
    transition_reveal(year)
animate(p, height = 350, width = 500,
        fps = 5, end_pause = 10)
```
```{r income-gganim, eval = FALSE}
```


## Stack Loss Data

Operational data obtained From 21 days of operation of a plant for the
oxidation of ammonia (NH3) to nitric acid (HNO3).

The dependent variable, `stack.loss` is 10 times the percentage of the
ingoing ammonia to the plant that escapes from the absorption column
unabsorbed.

```{r stackloss-splom, echo = FALSE}
library(lattice)
splom(stackloss)
```
```{r stackloss-splom, eval = FALSE}
```

The data are collected over time:

```{r stackloss-splom-line, echo = FALSE, fig.height = 7}
library(dplyr)
sl <- mutate(stackloss,
             intex = seq_len(nrow(stackloss)))
cpal <- colorRampPalette(c("#132B43", "#56B1F7"))
cols <- cpal(nrow(stackloss) - 1)
pfun <- function(x, y, ...) {
    n <- length(x)
    lsegments(x[-n], y[-n], x[-1], y[-1],
              col = cols, lwd = 3)
}
splom(stackloss, panel = pfun)
```
```{r stackloss-splom-line, eval = FALSE}
```

Randomly reordering the indices can help calibrate whether there is a
time effect:

```{r stackloss-splom-line-rnd, echo = FALSE, fig.height = 7}
splom(sample_n(stackloss, nrow(stackloss)),
      panel = pfun)
```
```{r stackloss-splom-line-rnd, eval = FALSE}
```

This is sometimes called _graphical inference_.

> Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja,
> "Graphical Inference for Infovis," _IEEE Transactions on Visualization
> and Computer Graphics_, Vol. 16, No. 6, November/December 2010.

Picking one plot and embedding the true order among 19 random orderings:

```{r stack-loss-embed, echo = FALSE}
N <- 20
k <- sample(N, 1)

ord <- seq_len(nrow(stackloss))
ss <- function(i) {
    if (i == k)
        sl <- stackloss
    else
        sl <- sample_n(stackloss, nrow(stackloss))
    mutate(sl, order = ord, sample = i)
}

d <- lapply(1 : N, ss)
d <- do.call(rbind, d)

library(ggplot2)
ggplot(d, aes(x = Acid.Conc., y = stack.loss, color = order)) +
    geom_path(size = 0.8) + facet_wrap(~ sample) +
    guides(color = "none")
```
True order: plot `r k`.

The code:

```{r stack-loss-embed, eval = FALSE}
```


## Lynx-Hare Data

Trapping data on Canadian lynx and snowshoe hare:

<!--
http://people.whitman.edu/~hundledr/courses/M250F03/M250.html
http://people.whitman.edu/~hundledr/courses/M250F03/LynxHare.txt
-->

```{r lynxhare-facets, echo = FALSE}
lynxhare <- read.table("https://www.stat.uiowa.edu/~luke/data/LynxHare.txt")
names(lynxhare) <- c("year", "hare", "lynx")
lhts <- ts(select(lynxhare, -year), start = 1845)
autoplot(lhts, facets = TRUE)
```

Cycle patterns are similar.

```{r lynxhare-facets, eval = FALSE}
```

Lynx cycles often lag behind:

```{r lynxhare-single, echo = FALSE}
autoplot(lhts)
```
```{r lynxhare-single, eval = FALSE}
```


## Things to Look For

Changes in data definition.

* Administrative definitions, such as employment/unemployment can
  change over time.

* Administrative entities, such as countries and states, can
  change over time.

* Instruments can change (satelite imagery replacing surface
  measurements).

Outliers need not be globally outlying: departures from local
patterns may also be important.

Level shifts and changes in volatility.

Shocks affecting level or volatility.


## Australian Airline Data

[Weekly passenger loads](https://otexts.com/fpp2/graphics.html), in
thousands, on Ansett Airlines between Australia’s two largest cities
(available as `melsyd` in package `fpp`):

```{r australian-airlines, echo = FALSE, message = FALSE}
data(melsyd, package = "fpp")
autoplot(melsyd, facets = TRUE)
```

Some features:

* Strike in 1989.

* Increase in passenger load starting in second half of 1991.

* Dip in economy, rise in business in 1992 corresponding to an experiment
  with fewer economy/more business seats.

```{r australian-airlines, eval = FALSE}
```


## Weekly SP 500 Closing Returns

Weekly closing returns of the SP 500 from 2003 to September, 2012:

```{r SP500, echo = FALSE, message = FALSE}
library(xts)
library(astsa)
autoplot(sp500w)
```

Some features:

* Increased volatility at the start of the financial crisis.

* Volatility seems to be decreasing, but not very fast.

```{r SP500, eval = FALSE}
```


## Forecasting

Studying the past may help predict the future.

Forecasting a time series usually involves choosing a model and
running the model forward.

Accuracy of forecasts decreases rapidly the farther ahead the forecast
is made.

An example from the `forecast` package: `WWWusage` is a time series of
the numbers of users connected to the Internet.

```{r WWWusage, eval = FALSE}
library(forecast)
fit <- ets(WWWusage)
fc <- forecast(WWWusage, model = fit)
autoplot(fc)
```

```{r WWWusage, echo = FALSE, message = FALSE}
```


## Labeling Trend Lines

For trend lines using a standard guide can be a little awkward.

Average life expectancy against year for the continents in the
`gapminder` data:

```{r trend-lines-guide, echo = FALSE, message = FALSE, fig.width = 7.5}
library(gapminder)
d <- group_by(gapminder,
              continent, year) %>%
    summarize(avgLifeExp = mean(lifeExp)) %>%
    ungroup()

p <- ggplot(d,
            aes(x = year,
                y = avgLifeExp,
                color = continent)) +
    geom_line(size = 1)
p
```
```{r trend-lines-guide, eval = FALSE}
```

With the legend on the right, a small improvement is to order the
levels to match the order of the right hand curve values:

```{r trend-lines-guide-reorder, echo = FALSE}
d_last <- filter(d, year == max(year)) %>%
    mutate(continent =
               reorder(continent, -avgLifeExp))

p %+%
    mutate(d,
           continent =
               factor(continent,
                      levels(d_last$continent)))
```

```{r trend-lines-guide-reorder, eval = FALSE}
```

_Direct labeling_ is often a better choice.

One option is to use `geom_text` at the ends of the lines.

This may need some additional adjustments to margins and clipping
behavior:

```{r trend-lines-direct, echo = FALSE, fig.width = 7.5}
p <- p + guides(color = "none")

d_last <- filter(d, year == max(year))

p + geom_text(aes(label = continent),
              data = d_last,
              size = 5,
              hjust = -0.1) +
    coord_cartesian(clip = "off",
                    expand = FALSE) +
    theme(plot.margin =
              margin(30, 60, 10, 10))
```
```{r trend-lines-direct, eval = FALSE}
```

Another option is to use a second axis:
<!--
Adapted from https://wilkelab.org/SDS375/slides/redundant-coding.html#16
-->

```{r trend-lines-sec-axis, echo = FALSE, fig.width = 7.5}
p + scale_y_continuous(
        sec.axis = dup_axis(
            breaks = d_last$avgLifeExp,
            labels = d_last$continent,
            name = NULL)) +
    scale_x_continuous(expand = c(0, 0))
```
```{r trend-lines-sec-axis, eval = FALSE}
```

It is also possible to place the labels along the lines.

The [`geomtextpath`
package](https://allancameron.github.io/geomtextpath/) provides
support for this.

```{r trend-lines-text-path, echo = FALSE}
library(geomtextpath)
ggplot(d, aes(x = year,
              y = avgLifeExp,
              color = continent)) +
    geom_textline(aes(label = continent),
                  linewidth = 1,
                  vjust = -0.5,
                  hjust = 0.95) +
    guides(color = "none")
```

```{r trend-lines-text-path, eval = FALSE}
```


## Bump Charts

Top 10 countries in terms of GDP per capita in 2007:

```{r, class.source = "fold-hide"}
top_gdp <- filter(gapminder,
                  continent == "Europe",
                  year == max(year)) %>%
    slice_max(gdpPercap, n = 10) %>%
    pull(country)
d <- filter(gapminder,
            country %in% top_gdp) %>%
    group_by(year) %>%
    mutate(
        rank = min_rank(desc(gdpPercap))) %>%
    ungroup()
d_last <- filter(d, year == max(year))

ggplot(d, aes(year,
              gdpPercap,
              color = country)) +
    geom_line()
```

The dominant feature is the overall trend.

One option: show deviations from the overall trend.

Another option: show ranks within years.

This is called a _bump chart_.

A simple bump chart:

```{r, class.source = "fold-hide"}
ggplot(d, aes(year, rank, color = country)) +
    geom_line(size = 1) +
    scale_y_reverse(breaks = 1:10) +
    geom_text(aes(label = country),
              data = d_last,
              size = 5,
              hjust = -0.1) +
    coord_cartesian(clip = "off",
                    expand = FALSE) +
    theme(plot.margin =
              margin(30, 90, 10, 10)) +
    guides(color = "none")
```

The `ggbump` package provides `geom_bump` for drawing smooth bump
chart lines:
```{r, class.source = "fold-hide"}
library(ggbump)
ggplot(d, aes(year, rank, color = country)) +
    geom_bump(size = 1) +
    scale_y_reverse(breaks = 1:10) +
    geom_text(aes(label = country),
              data = d_last,
              size = 5,
              hjust = -0.1) +
    coord_cartesian(clip = "off",
                    expand = FALSE) +
    theme(plot.margin =
              margin(30, 90, 10, 10)) +
    guides(color = "none")
```


## Other Approaches

A [talk from
2019](https://www.visualisingdata.com/2019/04/talk-slides-design-of-time/)
describes a wide range of options for visualizing temporal data.


## Reading

Chapters [_Visualizing time series and other functions of an
  independent
  variable_](https://clauswilke.com/dataviz/time-series.html) and
  [_Visualizing
  trends_](https://clauswilke.com/dataviz/visualizing-trends.html) in
  [_Fundamentals of Data
  Visualization_](https://clauswilke.com/dataviz/).


## Exercises

1. Creating a time series or multiple time series object and using
    autoplot can be an efficient way to obtain line plots of time
    series data. The following code uses the `weather` table in the
    `nycflights13` package to create a data frame containing minimum
    and maximum temperatures for each month in 2013:

    ```{r}
    library(tidyverse)
    library(nycflights13)
    library(lubridate)
    library(forecast)

    temps <- group_by(weather, year, month) %>%
        summarize(max_temp = max(temp, na.rm = TRUE),
                  min_temp = min(temp, na.rm = TRUE)) %>%
        ungroup()
    ```

    Which of the following expressions produces a line plot showing
    the minimum and maximum temperature series (and no other series)?

    a. `autoplot(temps)`
    b. `autoplot(temps$max_temp, temps$min_temp)`
    c. `ts(temps) %>% autoplot()`
    d. `select(temps, max_temp, min_temp) %>% ts() %>% autoplot()`


<!--
Lynx-Hare Data Variants
https://gist.github.com/mages/1f0f0d5bbe50af81cc19
https://github.com/stan-dev/example-models/tree/master/knitr/lotka-volterra

HaresLynxObservations <- "Year   Hares.x.1000    Lynx.x.1000
1900    30      4
1901    47.2    6.1
1902    70.2    9.8
1903    77.4    35.2
1904    36.3    59.4
1905    20.6    41.7
1906    18.1    19
1907    21.4    13
1908    22      8.3
1909    25.4    9.1
1910    27.1    7.4
1911    40.3    8
1912    57      12.3
1913    76.6    19.5
1914    52.3    45.7
1915    19.5    51.1
1916    11.2    29.7
1917    7.6     15.8
1918    14.6    9.7
1919    16.2    10.1
1920    24.7    8.6"

hl.obs <- read.table(text=HaresLynxObservations, header=TRUE)
names(hl.obs) <- c("Year", "Hares", "Lynx") ## Population in 000's
library(lattice)
xyplot(Hares+Lynx~ Year, data = hl.obs,type = "l", auto.key = TRUE)

https://opendata.stackexchange.com/questions/6767/where-can-i-find-biological-time-series-data?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
-->
