A lot of data is collected over time:

The objectives can be

Even if time is not of primary interest, data are often collected with a time stamp.

Examining the behavior over time can reveal interesting features that may hint at confounding variables:

Time series data can be collected at regularly spaced intervals:

Data might be missing

Time series can also be recorder at irregular times:

Variables recorded over time can be numerical or categorical.

Basic Plots for a Single Numeric Time Series

Numeric time series are usually plotted as a line chart.

river <- scan("https://www.stat.uiowa.edu/~luke/data/river.dat")
plot(river, type = "l")

Using symbols in addition to lines is sometimes useful:

plot(river, type = "b")

Using ggplot:

library(ggplot2)
p <- ggplot(NULL, aes(y = river, x = seq_along(river)))
p + geom_line() + geom_point() 

Using lattice:

library(lattice)
xyplot(river ~ seq_along(river), type = c("l", "p"))

Aspect Ratio

Aspect ratio has a significant effect on how well we perceive slope differences.

Controling the aspect ratio:

plot(river, asp = 5, type = "l")

xyplot(river ~ seq_along(river), aspect = 0.2, type = "l")

p + geom_line() + coord_fixed(ratio = 5)

Multiple panels in lattice with overlap:

xyplot(river ~ seq_along(river) | equal.count(seq_along(river), 3, overlap=0.1),
       type="l", aspect="xy", strip=FALSE,
       scales=list(relation = "sliced"),
       xlab="months")

Multiple panels in ggplot:

nbr = 3
dr <- data.frame(flow = river,
                 time = seq_along(river),
                 period = cut(seq_along(river), nbr))
ggplot(dr, aes(time, flow)) +
    geom_line() + ## coord_fixed(ratio = 5) +
    facet_wrap(~ period, nrow = nbr, scales = "free_x")

Adding zoom and pan support using plotly:

library(plotly)
pp <- p + geom_line() + coord_fixed(ratio = 5)
ggplotly(pp)

There is not always a single best aspect ratio.

The co2 data set in the datasets package contains monthly concentrations of CO2 for the years 1959 to 1997 recorded at the Mauna Loa observatory.

The co2 data is stored as an object of class ts:

str(co2)
##  Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
attributes(co2)
## $tsp
## [1] 1959.000 1997.917   12.000
## 
## $class
## [1] "ts"

plot and xyplot have methods for ts objects that simplify time series plotting.

Two aspect ratios:

xyplot(co2)

xyplot(co2, aspect = "xy")

Is there a slowdown in the increase? A graph with more recent data is available. The full data is available as well.

Updated Mauna Loa CO2 Data

Downloading and reading in the data:

library(lubridate)
url <- "ftp://aftp.cmdl.noaa.gov/products/trends/co2/co2_mm_mlo.txt"
if (! file.exists("co2new.dat") ||
    now() > file.mtime("co2new.dat") + weeks(4))
    download.file(url, "co2new.dat")
co2new <- read.table("co2new.dat")

An initial look:

head(co2new)
##     V1 V2       V3     V4     V5     V6 V7
## 1 1958  3 1958.208 315.71 315.71 314.62 -1
## 2 1958  4 1958.292 317.45 317.45 315.29 -1
## 3 1958  5 1958.375 317.50 317.50 314.71 -1
## 4 1958  6 1958.458 -99.99 317.10 314.85 -1
## 5 1958  7 1958.542 315.86 315.86 314.98 -1
## 6 1958  8 1958.625 314.93 314.93 315.94 -1
names(co2new) <- c("year", "month", "decimal_data", "average",
                   "interpolated", "trend", "ndays")

plot(co2new$average)

head(co2new)
##   year month decimal_data average interpolated  trend ndays
## 1 1958     3     1958.208  315.71       315.71 314.62    -1
## 2 1958     4     1958.292  317.45       317.45 315.29    -1
## 3 1958     5     1958.375  317.50       317.50 314.71    -1
## 4 1958     6     1958.458  -99.99       317.10 314.85    -1
## 5 1958     7     1958.542  315.86       315.86 314.98    -1
## 6 1958     8     1958.625  314.93       314.93 315.94    -1

Missing values are encoded as -99.99. These need to be replaced with R’s missing value representation:

filter(co2new, average < -90)
##   year month decimal_data average interpolated  trend ndays
## 1 1958     6     1958.458  -99.99       317.10 314.85    -1
## 2 1958    10     1958.792  -99.99       312.66 315.61    -1
## 3 1964     2     1964.125  -99.99       320.07 319.61    -1
## 4 1964     3     1964.208  -99.99       320.73 319.55    -1
## 5 1964     4     1964.292  -99.99       321.77 319.48    -1
## 6 1975    12     1975.958  -99.99       330.58 331.60     0
## 7 1984     4     1984.292  -99.99       346.84 344.27     2
co2new <- mutate(co2new, average = ifelse(average < -90, NA, average))
plot(co2new$average, type = "l")

The plot shows gaps for the missing values. The interpolated series fills the gaps:

plot(co2new$average, type = "l")
lines(co2new$interpolated, col = "red")

Converting the new data to a ts object makes adding the original data a little easier:

avts <- ts(co2new$average, start = c(1958, 3), frequency = 12)
plot(avts)
lines(co2, col = "red")

avits <- ts(co2new$interpolated, start = c(1958, 3), frequency = 12)
co2stl <- stl(avits, s.window = 21)
head(co2stl$time.series)
##        seasonal    trend  remainder
## [1,]  1.1201682 314.9204 -0.3305750
## [2,]  2.2928419 314.9962  0.1609439
## [3,]  2.7604674 315.0720 -0.3324890
## [4,]  2.2105461 315.1478 -0.2583752
## [5,]  0.8756507 315.2170 -0.2326649
## [6,] -1.0881115 315.2862  0.7319123
plot(co2stl)

Time Series Objects

There are a number of specialized object classes for dealing with time series. Some of the most useful:

Class Package Features
ts, mts base package regular series
zoo zoo package irregular and regular series; autoplot support

The zoo package provides a method for the ggplot2 function autoplot that produces an appropriate plot for an object of class zoo:

library(zoo)
p <- autoplot(as.zoo(avts))
p

We can then add a layer for the original co2 data using geom_line. Creating a new data frame for the co2 data makes this easier:

zco2 = data.frame(time = time(co2), average = co2)

p + geom_line(aes(x = time, y = average), data = zco2, color = "red")
## Don't know how to automatically pick scale for object of type yearmon. Defaulting to continuous.

Some Alternative Visualizations

GDP Growth Data

Bar charts are sometimes used for economic time series, in particular growth rate data.

Some GDP growth rate data:

Read the data from the web page:

library(rvest)
gdpurl <- "http://www.multpl.com/us-real-gdp-growth-rate/table/by-quarter"
gdppage <- read_html(gdpurl)
gdptbl <- html_table(gdppage)[[1]]
head(gdptbl)

Clean the data:

library(dplyr)
library(lubridate)
names(gdptbl) <- c("Date", "Value")
gdptbl <- mutate(gdptbl,
                 Value = as.numeric(sub("%", "", Value)),
                 Date = mdy(Date))
head(gdptbl)

Data prior to 1947 is annual, so drop earlier data:

gdptbl <- filter(gdptbl, Date >= make_date(1948, 1, 1))

Rearrange to put Date in increasing order:

gdptbl <- arrange(gdptbl, Date)

Create and plot a time series:

gdpts <- ts(gdptbl$Value, start = 1947, frequency = 4)
plot(gdpts)
abline(h = 0, lty = 2)

A bar chart with color used to distinguish positive and negative growth can be created from the original table gdptbl:

library(ggplot2)
ggplot(mutate(gdptbl, growth = ifelse(Value >= 0, "pos", "neg")),
       aes(x = Date, y = Value, fill = growth)) +
    geom_col() + scale_fill_manual(values = c(pos = "blue", neg = "red"))

The bar chart can also be created from the time series object:

gdpd <- data.frame(rate = as.numeric(gdpts),
                   time = as.numeric(time(gdpts)))
gdpd <- mutate(gdpd, growth = ifelse(rate >= 0, "pos", "neg"))
ggplot(gdpd, aes(time, rate, fill = growth)) +
    geom_col() + scale_fill_manual(values = c(pos = "blue", neg = "red"))

Airline Flights From NYC

The nycflights13 data set contains data on all flight leaving New York City in 2013. Similar data is available for other cities and years.

We can look at the number of departures for each day:

library(nycflights13)
library(ggplot2)
nf <- count(flights, year, month, day)
library(lubridate)
nf <- mutate(nf, date = make_date(year, month, day))

Some plots:

library(ggplot2)
p <- ggplot(nf, aes(date, n)) + geom_line()
p


p + facet_wrap(~ wday(date, TRUE)) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

A simple calendar plot:

monthweek <- function(d, w) ceiling((d - w) / 7) + 1

nf <- mutate(nf, wd = wday(date, label = TRUE))
nf <- mutate(nf, wd = factor(wd, levels = rev(levels(wd))))
nf <- mutate(nf, mw = monthweek(day, wday(date)))

ggplot(nf, aes(x = as.character(mw), y = wd, fill = n)) +
    geom_tile(color = "white") +
    scale_fill_gradient(low = "white") +
    facet_wrap(~ month(date, TRUE)) +
    ylab("") + xlab("Week of Month") +
    theme(panel.grid.major = element_blank())

A slightly better version:

nf2 <- mutate(nf, wd = wday(date, label = TRUE))
nf2 <- mutate(nf2, wd = factor(wd))
nf2 <- mutate(nf2, mw = factor(monthweek(day, wday(date))))
nf2 <- mutate(nf2, mw = factor(mw, rev(levels(mw))))
            
ggplot(nf2, aes(x = wd, y = mw, fill = n)) +
    geom_tile(color = "white") +
    scale_fill_gradient(low = "white") +
    facet_wrap(~ month(date, TRUE)) +
    ylab("") + xlab("Week of Month") +
    theme(panel.grid.major = element_blank(),
          axis.text.x = element_text(angle = 45, hjust = 1))

Multiple Time Series

Some options for plotting multiple series:

Another option for two time series:

This is usually a bad idea.

Crimean War Casualties

The HistData package contains the data used by Florence Nightingale to call attention to terrible sanitary conditions for the Britich army in Crimea.

library(lattice)
data(Nightingale, package = "HistData")
head(Nightingale)
##         Date Month Year  Army Disease Wounds Other Disease.rate
## 1 1854-04-01   Apr 1854  8571       1      0     5          1.4
## 2 1854-05-01   May 1854 23333      12      0     9          6.2
## 3 1854-06-01   Jun 1854 28333      11      0     6          4.7
## 4 1854-07-01   Jul 1854 28722     359      0    23        150.0
## 5 1854-08-01   Aug 1854 30246     828      1    30        328.5
## 6 1854-09-01   Sep 1854 30290     788     81    70        312.2
##   Wounds.rate Other.rate
## 1         0.0        7.0
## 2         0.0        4.6
## 3         0.0        2.5
## 4         0.0        9.6
## 5         0.4       11.9
## 6        32.1       27.7

Lattice allows multiple series to be plotted with

xyplot(Disease + Wounds + Other ~ Date, data = Nightingale,
       type = "l", auto.key = TRUE)

Using the rate variables produces a very similar pattern:

xyplot(Disease.rate + Wounds.rate + Other.rate ~ Date, data = Nightingale,
       type = "l", auto.key = TRUE)

The army size does not vary enough over the time where the casualties are high to affect the pattern much:

xyplot(Army ~ Date, data = Nightingale, type = "l")

To use ggplot first create a long format data frame:

library(tidyr)
nd <- gather(Nightingale, series, value, 4, 8 : 10)
head(nd)
##         Date Month Year Disease Wounds Other series value
## 1 1854-04-01   Apr 1854       1      0     5   Army  8571
## 2 1854-05-01   May 1854      12      0     9   Army 23333
## 3 1854-06-01   Jun 1854      11      0     6   Army 28333
## 4 1854-07-01   Jul 1854     359      0    23   Army 28722
## 5 1854-08-01   Aug 1854     828      1    30   Army 30246
## 6 1854-09-01   Sep 1854     788     81    70   Army 30290

The series can then be graphed as

library(dplyr)
library(ggplot2)
ggplot(filter(nd, series != "Army"), aes(Date, value)) +
    geom_line(aes(color = series))

xyplot(value ~ Date | series, data = nd, type = "l", auto.key = TRUE,
       scales = c(y = "free"), layout = c(1, 4))


ggplot(nd, aes(Date, value)) + geom_line() +
    facet_wrap(~ series, scales = "free_y", ncol = 1)

Converting to a time series or zoo object is another option:

nts <- ts(select(Nightingale, Army, ends_with("rate")),
          start = c(1854, 4), frequency = 12)
library(zoo)
ntz <- as.zoo(nts)
plot(nts)

plot(nts[, -1], plot.type = "single")


xyplot(nts)

xyplot(nts[, -1], superpose = TRUE)

autoplot(ntz) + facet_free()
## Don't know how to automatically pick scale for object of type yearmon. Defaulting to continuous.

autoplot(ntz[, -1], facets = NULL)
## Don't know how to automatically pick scale for object of type yearmon. Defaulting to continuous.

In a faceted plot without free scales, variables need to be measured on comparable scales:

autoplot(ntz)
## Don't know how to automatically pick scale for object of type yearmon. Defaulting to continuous.

autoplot(ntz[, -1])
## Don't know how to automatically pick scale for object of type yearmon. Defaulting to continuous.

A free y scale can be requested using faceting functions, e.g.

autoplot(ntz) + facet_free()
## Don't know how to automatically pick scale for object of type yearmon. Defaulting to continuous.

Many Time Series

From Unwin’s book: Annual flow for dams in Italy:

library(nsRFA)
data(hydroSIMN)
annualflows <- mutate(annualflows, cod = factor(cod))

ggplot(annualflows, aes(anno, dato, group = cod, color = factor(cod))) +
    geom_line()


ggplot(annualflows, aes(anno, dato, group = cod)) +
    geom_line() + facet_wrap(~ cod)

Unemployment rate by state:

library(ggplot2)
library(geofacet)
ggplot(state_unemp, aes(year, rate)) +
  geom_line() +
  facet_geo(~ state, grid = "us_state_grid2", label = "name") +
  scale_x_continuous(labels = function(x) paste0("'", substr(x, 3, 4))) +
  labs(title = "Seasonally Adjusted US Unemployment Rate 2000-2016",
    caption = "Data Source: bls.gov",
    x = "Year",
    y = "Unemployment Rate (%)") +
    theme(strip.text.x = element_text(size = 6))

The Fall and Rise of Income Inequality

A 2015 post on the NPR Planet Money site uses two graphs to show transitions in income inequality.

The page includes a link to the data source. This has been unavailable for some time.

I tried to obtain the same data but may not have done it quite right. The closest source I could find is https://wid.world/data/.

income <- read.table("https://www.stat.uiowa.edu/~luke/data/nprincome.dat")

library(lattice)
library(ggplot2)
library(tidyr)

Plots over time:

xyplot(U01 + L90 ~ year, data = income, type = "l", auto.key = TRUE)


xyplot(U01 / U01[1] + L90 / L90[01] ~ year, data = income, type = "l",
       auto.key = TRUE)


its <- ts(select(income, -year), start = 1913)
xyplot(its)

Comparative plot:

xyplot(L90 ~ U01, data = income)

xyplot(L90 ~ U01, data = income, type = c("l", "p"))

Stack Loss Data

Operational data obtained From 21 days of operation of a plant for the oxidation of ammonia (NH3) to nitric acid (HNO3).

The dependent variable, stack.loss is 10 times the percentage of the ingoing ammonia to the plant that escapes from the absorption column unabsorbed.

Raw data:

library(lattice)
splom(stackloss)

The data are collected over time:

library(dplyr)
sl <- mutate(stackloss, intex = 1:nrow(stackloss))
splom(stackloss, type = "l")

Randomly reordering the indices can help calibrate whether there is a time effect:

splom(stackloss[sample(nrow(stackloss)),], type = "l")

This is sometimes called graphical inference.

Hadley Wickham, Dianne Cook, Heike Hofmann, and Andreas Buja, “Graphical Inference for Infovis,” IEEE Transactions on Visualization and Computer Graphics, Vol. 16, No. 6, November/December 2010.

Picking one plot and embedding the true order among 19 random orderings:

N <- 20
k <- sample(N, 1)

ss <- function(i) {
    if (i == k)
        sl <- stackloss
    else
        sl <- stackloss[sample(nrow(stackloss)),]
    mutate(sl, sample = i)
}

d <- lapply(1 : N, ss)
d <- do.call(rbind, d)

xyplot(stack.loss ~ Acid.Conc. | sample, type = "l", data = d)


library(ggplot2)
ggplot(d, aes(x = Acid.Conc., y = stack.loss)) +
    geom_path() + facet_wrap(~sample)


k
## [1] 7

Lynx-Hare Data

Trapping data on Canadian lynx and snowshoe hare:

lynxhare <- read.table("https://www.stat.uiowa.edu/~luke/data/LynxHare.txt")
names(lynxhare) <- c("year", "hare", "lynx")
lhts <- ts(lynxhare[, -1], start = 1845)
lhtz <- as.zoo(lhts)

Cycle patterns are similar:

autoplot(lhtz) + facet_free()

Lynx cycles often lag behind:

autoplot(lhtz, facet = NULL)

Things to Look For

Australian airline data:

library(fpp)
xyplot(melsyd)

Some features:

Weekly closing returns of the SP 500 from 2003 to September, 2012:

library(xts)
library(astsa)
autoplot(sp500w)

Some features:

Forecasting

An example from the forecast package: WWWusage is a time series of the numbers of users connected to the Internet.

library(forecast)
fit <- ets(WWWusage)
fc <- forecast(WWWusage, model=fit)
## Model is being refit with current smoothing parameters but initial states are being re-estimated.
## Set 'use.initial.values=TRUE' if you want to re-use existing initial values.
plot(fc)

Other Approaches

A recent talk describes a wide range of options for visualizing temporal data.