---
title: "Time Series Plots"
output:
html_document:
toc: yes
---
```{r global_options, include=FALSE}
knitr::opts_chunk$set(collapse=TRUE)
```
```{r, include = FALSE}
library(lattice)
library(tidyverse)
library(gridExtra)
set.seed(12345)
```
A lot of data is collected over time:
- economic performace measures;
- weather, climate data;
- medical information.
The objectives can be
- understanding the past -- how we got where we are;
- understanding patterns in variation;
- forecasting what will happen next.
Even if time is not of primary interest, data are often collected with
a time stamp.
Examining the behavior over time can reveal interesting features that
may hint at confounding variables:
- temperature varying with time of day;
- humidity varying over the year.
Time series data can be collected at regularly spaced intervals:
- yearly, e.g. the `gapminder` data
- monthly, e.g. the `river` data
- daily, e.g. high temperatures.
Data might be missing
- irregularly, e.g. faulty instrument, holidays;
- regularly, e.g. weekends.
Time series can also be recorder at irregular times:
- blood pressure at doctor visits;
Variables recorded over time can be numerical or categorical.
## Basic Plots for a Single Numeric Time Series
Numeric time series are usually plotted as a _line chart_.
```{r}
river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat")
plot(river, type = "l")
```
Using symbols in addition to lines is sometimes useful:
```{r}
plot(river, type = "b")
```
Using `ggplot`:
```{r}
library(ggplot2)
p <- ggplot(NULL, aes(y = river, x = seq_along(river)))
p + geom_line() + geom_point()
```
Using lattice:
```{r}
library(lattice)
xyplot(river ~ seq_along(river), type = c("l", "p"))
```
## Aspect Ratio
Aspect ratio has a significant effect on how well we perceive slope
differences.
- Banking to 45 degrees: choose an aspect ratio so the magnitude of
the important slopes is close to 45 degrees.
- This usually requires the horizontal axis to be longer than the
vertical axis.
Controling the aspect ratio:
- For `base` graphics use the `asp` argument to `plot`.
- `lattice` uses the `aspect` argument.
- For `ggplot` use the `ratio` argument to `coord_fixed`.
```{r}
plot(river, asp = 5, type = "l")
xyplot(river ~ seq_along(river), aspect = 0.2, type = "l")
p + geom_line() + coord_fixed(ratio = 5)
```
- Lattice supports several aspect ratio strategies:
- "fill" to avoid unused white space;
- "xy" to use a 45 degree banking rule;
- "iso" for isometric scales, e.g. distances.
- Some work may be needed to avoid excess white space around figures.
- Using multiple panels can help.
- In an interactive setting, zoom and pan support can help.
Multiple panels in `lattice` with overlap:
```{r}
xyplot(river ~ seq_along(river) | equal.count(seq_along(river), 3, overlap=0.1),
type="l", aspect="xy", strip=FALSE,
scales=list(relation = "sliced"),
xlab="months")
```
Multiple panels in `ggplot`:
```{r}
nbr = 3
dr <- data.frame(flow = river,
time = seq_along(river),
period = cut(seq_along(river), nbr))
ggplot(dr, aes(time, flow)) +
geom_line() + ## coord_fixed(ratio = 5) +
facet_wrap(~ period, nrow = nbr, scales = "free_x")
```
Adding zoom and pan support using [`plotly`](https://plot.ly/r/):
```{r, include = FALSE}
library(plotly)
```
```{r}
library(plotly)
pp <- p + geom_line() + coord_fixed(ratio = 5)
ggplotly(pp)
```
There is not always a single best aspect ratio.
The `co2` data set in the `datasets` package contains monthly
concentrations of CO2 for the years 1959 to 1997 recorded at the Mauna
Loa observatory.
The `co2` data is stored as an object of class `ts`:
```{r}
str(co2)
attributes(co2)
```
`plot` and `xyplot` have methods for `ts` objects that simplify time
series plotting.
Two aspect ratios:
```{r}
xyplot(co2)
```
```{r, fig.height = 2}
xyplot(co2, aspect = "xy")
```
Is there a slowdown in the increase? A graph with more recent data is
[available](https://www.esrl.noaa.gov/gmd/ccgg/trends/full.html). The
full data is
[available](ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt)
as well.
## Updated Mauna Loa CO2 Data
Downloading and reading in the data:
```{r, message = FALSE}
library(lubridate)
url <- "ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt"
if (! file.exists("co2new.dat") ||
now() > file.mtime("co2new.dat") + weeks(4))
download.file(url, "co2new.dat")
co2new <- read.table("co2new.dat")
```
An initial look:
```{r}
head(co2new)
names(co2new) <- c("year", "month", "decimal_data", "average",
"interpolated", "trend", "ndays")
plot(co2new$average)
head(co2new)
```
Missing values are encoded as `-99.99`. These need to be replaced with
R's missing value representation:
```{r}
filter(co2new, average < -90)
co2new <- mutate(co2new, average = ifelse(average < -90, NA, average))
```
```{r}
plot(co2new$average, type = "l")
```
The plot shows gaps for the missing values. The `interpolated` series
fills the gaps:
```{r}
plot(co2new$average, type = "l")
lines(co2new$interpolated, col = "red")
```
Converting the new data to a `ts` object makes adding the original
data a little easier:
```{r}
avts <- ts(co2new$average, start = c(1958, 3), frequency = 12)
plot(avts)
lines(co2, col = "red")
```
- For time series with a strong seasonal component it can be useful to
look at a _Seasonal Decomposition of Time Series by Loess_, or
(_STL_).
- The methodology was suggested by Clevaland and coworkers.
- The `stl` function computes such a decomposition; it requires a
series without missing values.
```{r}
avits <- ts(co2new$interpolated, start = c(1958, 3), frequency = 12)
co2stl <- stl(avits, s.window = 21)
head(co2stl$time.series)
plot(co2stl)
```
## Time Series Objects
There are a number of specialized object classes for dealing with time
series. Some of the most useful:
Class Package Features
---- ----- -----
`ts, mts` `base` package regular series
`zoo` `zoo` package irregular and regular series; `autoplot` support
----
The `zoo` package provides a method for the `ggplot2` function `autoplot`
that produces an appropriate plot for an object of class `zoo`:
```{r, message = FALSE}
library(zoo)
p <- autoplot(as.zoo(avts))
p
```
We can then add a layer for the original `co2` data using `geom_line`.
Creating a new data frame for the `co2` data makes this easier:
```{r}
zco2 = data.frame(time = time(co2), average = co2)
p + geom_line(aes(x = time, y = average), data = zco2, color = "red")
```
## Some Alternative Visualizations
- Bar charts are sometimes used then comparison to a base line is important;
for example for [illustrating climate change](img/CaG_GlobalTempAnom_1.jpg).
- Calendar plots, or calendar heatmaps, are sometimes useful for
displaying daily data where day of the week/weekend effects might be
important. One example:
- A
[blog post](https://www.r-bloggers.com/ggplot2-time-series-heatmaps/)
shows how to build a calendar plot with `ggplot`.
## GDP Growth Data
Bar charts are sometimes used for economic time series, in particular
growth rate data.
Some GDP growth rate
[data](http://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter):
Read the data from the web page:
```{r, eval = FALSE}
library(rvest)
gdpurl <- "http://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter"
gdppage <- read_html(gdpurl)
gdptbl <- html_table(gdppage)[[1]]
head(gdptbl)
```
Clean the data:
```{r, eval = FALSE}
library(dplyr)
library(lubridate)
names(gdptbl) <- c("Date", "Value")
gdptbl <- mutate(gdptbl,
Value = as.numeric(sub("%", "", Value)),
Date = mdy(Date))
head(gdptbl)
```
Data prior to 1947 is annual, so drop earlier data:
```{r, eval = FALSE}
gdptbl <- filter(gdptbl, Date >= make_date(1948, 1, 1))
```
Rearrange to put `Date` in increasing order:
```{r, eval = FALSE}
gdptbl <- arrange(gdptbl, Date)
```
Create and plot a time series:
```{r, eval = FALSE}
gdpts <- ts(gdptbl$Value, start = 1947, frequency = 4)
plot(gdpts)
abline(h = 0, lty = 2)
```
A bar chart with color used to distinguish positive and negative
growth can be created from the original table `gdptbl`:
```{r, eval = FALSE}
library(ggplot2)
ggplot(mutate(gdptbl, growth = ifelse(Value >= 0, "pos", "neg")),
aes(x = Date, y = Value, fill = growth)) +
geom_col() + scale_fill_manual(values = c(pos = "blue", neg = "red"))
```
The bar chart can also be created from the time series object:
```{r, eval = FALSE}
gdpd <- data.frame(rate = as.numeric(gdpts),
time = as.numeric(time(gdpts)))
gdpd <- mutate(gdpd, growth = ifelse(rate >= 0, "pos", "neg"))
ggplot(gdpd, aes(time, rate, fill = growth)) +
geom_col() + scale_fill_manual(values = c(pos = "blue", neg = "red"))
```
## Airline Flights From NYC
The `nycflights13` data set contains data on all flight leaving New
York City in 2013. Similar data is available for other cities and
years.
We can look at the number of departures for each day:
```{r}
library(nycflights13)
library(ggplot2)
nf <- count(flights, year, month, day)
```
* It will be useful to have a combined `date` field in our data frame.
* The `lubridate` function `make_date` function can be used for this:
```{r}
library(lubridate)
nf <- mutate(nf, date = make_date(year, month, day))
```
Some plots:
```{r}
library(ggplot2)
p <- ggplot(nf, aes(date, n)) + geom_line()
p
p + facet_wrap(~ wday(date, TRUE)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
```
A simple calendar plot:
```{r}
monthweek <- function(d, w) ceiling((d - w) / 7) + 1
nf <- mutate(nf, wd = wday(date, label = TRUE))
nf <- mutate(nf, wd = factor(wd, levels = rev(levels(wd))))
nf <- mutate(nf, mw = monthweek(day, wday(date)))
ggplot(nf, aes(x = as.character(mw), y = wd, fill = n)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "white") +
facet_wrap(~ month(date, TRUE)) +
ylab("") + xlab("Week of Month") +
theme(panel.grid.major = element_blank())
```
A slightly better version:
```{r}
nf2 <- mutate(nf, wd = wday(date, label = TRUE))
nf2 <- mutate(nf2, wd = factor(wd))
nf2 <- mutate(nf2, mw = factor(monthweek(day, wday(date))))
nf2 <- mutate(nf2, mw = factor(mw, rev(levels(mw))))
ggplot(nf2, aes(x = wd, y = mw, fill = n)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "white") +
facet_wrap(~ month(date, TRUE)) +
ylab("") + xlab("Week of Month") +
theme(panel.grid.major = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1))
```
## Multiple Time Series
Some options for plotting multiple series:
- separate panels in a trellis display;
- multiple series in a single plot - this will require standardizing
if the scales vary substantially;
- a multivatiate plot with time represented by connecting line
segments or animation.
Another option for two time series:
- use a separate `y` axis for each.
This is usually a bad idea.
## Crimean War Casualties
The `HistData` package contains the data used by Florence Nightingale
to call attention to terrible sanitary conditions for the Britich army
in Crimea.
```{r}
library(lattice)
data(Nightingale, package = "HistData")
head(Nightingale)
```
Lattice allows multiple series to be plotted with
```{r}
xyplot(Disease + Wounds + Other ~ Date, data = Nightingale,
type = "l", auto.key = TRUE)
```
Using the rate variables produces a very similar pattern:
```{r}
xyplot(Disease.rate + Wounds.rate + Other.rate ~ Date, data = Nightingale,
type = "l", auto.key = TRUE)
```
The army size does not vary enough over the time where the casualties
are high to affect the pattern much:
```{r}
xyplot(Army ~ Date, data = Nightingale, type = "l")
```
To use `ggplot` first create a long format data frame:
```{r}
library(tidyr)
nd <- gather(Nightingale, series, value, 4, 8 : 10)
head(nd)
```
The series can then be graphed as
```{r}
library(dplyr)
library(ggplot2)
ggplot(filter(nd, series != "Army"), aes(Date, value)) +
geom_line(aes(color = series))
```
* The long format also supports creating faceted plots.
* Usually for time series this is done with
* `y` scales free;
* a single column and `x` scales aligned.
```{r}
xyplot(value ~ Date | series, data = nd, type = "l", auto.key = TRUE,
scales = c(y = "free"), layout = c(1, 4))
ggplot(nd, aes(Date, value)) + geom_line() +
facet_wrap(~ series, scales = "free_y", ncol = 1)
```
Converting to a time series or zoo object is another option:
```{r}
nts <- ts(select(Nightingale, Army, ends_with("rate")),
start = c(1854, 4), frequency = 12)
library(zoo)
ntz <- as.zoo(nts)
```
```{r}
plot(nts)
plot(nts[, -1], plot.type = "single")
xyplot(nts)
xyplot(nts[, -1], superpose = TRUE)
```
```{r}
autoplot(ntz) + facet_free()
autoplot(ntz[, -1], facets = NULL)
```
In a faceted plot without free scales, variables need to be measured
on comparable scales:
```{r}
autoplot(ntz)
autoplot(ntz[, -1])
```
A free `y` scale can be requested using faceting functions, e.g.
```{r}
autoplot(ntz) + facet_free()
```
## Many Time Series
From Unwin's book: Annual flow for dams in Italy:
```{r}
library(nsRFA)
data(hydroSIMN)
annualflows <- mutate(annualflows, cod = factor(cod))
ggplot(annualflows, aes(anno, dato, group = cod, color = factor(cod))) +
geom_line()
ggplot(annualflows, aes(anno, dato, group = cod)) +
geom_line() + facet_wrap(~ cod)
```
Unemployment rate by state:
```{r}
library(ggplot2)
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
geom_line() +
facet_geo(~ state, grid = "us_state_grid2", label = "name") +
scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
caption = "Data Source: bls.gov",
x = "Year",
y = "Unemployment Rate (%)") +
theme(strip.text.x = element_text(size = 6))
```
## The Fall and Rise of Income Inequality
A [2015 post](
http://www.npr.org/sections/money/2015/02/11/384988128/the-fall-and-rise-of-u-s-inequality-in-2-graphs) on the NPR Planet Money site uses two graphs to show transitions in income inequality.
The page includes a link to the
[data source](http://topincomes.parisschoolofeconomics.eu/). This has
been unavailable for some time.
I tried to obtain the same data but may not have done it quite right.
The closest source I could find is .
```{r}
income <- read.table("https://www.stat.uiowa.edu/~luke/data/nprincome.dat")
library(lattice)
library(ggplot2)
library(tidyr)
```
Plots over time:
```{r, evall = FALSE}
xyplot(U01 + L90 ~ year, data = income, type = "l", auto.key = TRUE)
xyplot(U01 / U01[1] + L90 / L90[01] ~ year, data = income, type = "l",
auto.key = TRUE)
its <- ts(select(income, -year), start = 1913)
xyplot(its)
```
Comparative plot:
```{r}
xyplot(L90 ~ U01, data = income)
xyplot(L90 ~ U01, data = income, type = c("l", "p"))
```
## Stack Loss Data
Operational data obtained From 21 days of operation of a plant for the
oxidation of ammonia (NH3) to nitric acid (HNO3).
The dependent variable, `stack.loss` is 10 times the percentage of the
ingoing ammonia to the plant that escapes from the absorption column
unabsorbed.
Raw data:
```{r, eval = TRUE, fig.height = 9}
library(lattice)
splom(stackloss)
```
The data are collected over time:
```{r, eval = TRUE, , fig.height = 9}
library(dplyr)
sl <- mutate(stackloss, intex = 1:nrow(stackloss))
splom(stackloss, type = "l")
```
Randomly reordering the indices can help calibrate whether there is a
time effect:
```{r, eval = TRUE, fig.height = 9}
splom(stackloss[sample(nrow(stackloss)),], type = "l")
```
This is sometimes called _graphical inference_.
> Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja,
> "Graphical Inference for Infovis," IEEE Transactions on Visualization
> and Computer Graphics, Vol. 16, No. 6, November/December 2010.
Picking one plot and embedding the true order among 19 random orderings:
```{r, eval = TRUE}
N <- 20
k <- sample(N, 1)
ss <- function(i) {
if (i == k)
sl <- stackloss
else
sl <- stackloss[sample(nrow(stackloss)),]
mutate(sl, sample = i)
}
d <- lapply(1 : N, ss)
d <- do.call(rbind, d)
xyplot(stack.loss ~ Acid.Conc. | sample, type = "l", data = d)
library(ggplot2)
ggplot(d, aes(x = Acid.Conc., y = stack.loss)) +
geom_path() + facet_wrap(~sample)
k
```
## Lynx-Hare Data
Trapping data on Canadian lynx and snowshoe hare:
```{r}
lynxhare <- read.table("https://www.stat.uiowa.edu/~luke/data/LynxHare.txt")
names(lynxhare) <- c("year", "hare", "lynx")
lhts <- ts(lynxhare[, -1], start = 1845)
lhtz <- as.zoo(lhts)
```
Cycle patterns are similar:
```{r}
autoplot(lhtz) + facet_free()
```
Lynx cycles often lag behind:
```{r}
autoplot(lhtz, facet = NULL)
```
## Things to Look For
- Changes in data definition.
- Administrative definitions, such as employment/unemployment can
change over time.
- Administrative entities, such as countries and states, can
change over time.
- Instruments can change (satelite imagery replacing surface
measurements).
- Outliers need not be globally outlying: departures from local
patterns may also be important.
- Level shifts and changes in volatility.
- Shocks affecting level or volatility.
[Australian airline data](https://www.otexts.org/fpp/2/1):
```{r, message = FALSE}
library(fpp)
xyplot(melsyd)
```
Some features:
- Strike in 1989.
- Increase in passenger load starting in second half of 1991.
- Dip in economy, rise in business in 1992 corresponding to an experiment
with fewer economy/more business seats.
Weekly closing returns of the SP 500 from 2003 to September, 2012:
```{r, message = FALSE}
library(xts)
library(astsa)
autoplot(sp500w)
```
Some features:
- Increased volatility at the start of the financial crisis.
- Volatility seems to be decreasing, but not very fast.
## Forecasting
- Studying the past may help predict the future.
- Forecasting a time series usually involves choosing a model and
running the model forward.
- Accuracy of forecast decreases rapidly the farther ahead the
forecast is made.
An example from the `forecast` package: `WWWusage` is a time series of
the numbers of users connected to the Internet.
```{r}
library(forecast)
fit <- ets(WWWusage)
fc <- forecast(WWWusage, model=fit)
plot(fc)
```
## Other Approaches
A [recent
talk](http://www.visualisingdata.com/2019/04/talk-slides-design-of-time/)
describes a wide range of options for visualizing temporal data.