Background
Data are often associated with a point in time, a particular
year;
month;
day;
hour, minute, second, …
Some issues with points in time:
R has data types to represent
R stores dates as days since January 1, 1970, and date-times as the the number of seconds since midnight on that day in Coordinated Universal Time (UTC) .
Date objects are less complicated than data-time objects, so if you only need dates you should stick with date objects.
Base R provides many facilities for dealing with dates and date-times.
The lubridate
package provides a useful interface.
library(lubridate)
The Dates and Times chapter of R for Data Science provides more details.
Creating Dates and Times
Today and Now
The lubridate
function today()
returns today’s date as a Date
object:
today()
## [1] "2023-05-05"
class(today())
## [1] "Date"
The lubridate
function now()
returns the current date-time as a POSIXct
object:
now()
## [1] "2023-05-05 19:31:08 CDT"
class(now())
## [1] "POSIXct" "POSIXt"
The printed representation follows the international standard for the representation of dates and times (ISO8601 ).
Date and date-time objects can be used with addition and subtraction:
now() + 3600 ## one hour from now
## [1] "2023-05-05 20:31:08 CDT"
today() - 7 ## one week ago
## [1] "2023-04-28"
Parsing Dates and Times From Strings
Some common date formats:
d1 <- "2023-04-16"
d2 <- "April 16, 2023"
d3 <- "16 April 2023"
d4 <- "16 April 23"
These can be decoded by the functions ymd()
, mdy()
, and dmy()
:
ymd(d1)
## [1] "2023-04-16"
mdy(d2)
## [1] "2023-04-16"
dmy(d3)
## [1] "2023-04-16"
dmy(d4)
## [1] "2023-04-16"
By default, these functions use the current locale settings for interpreting month names or abbreviations.
Sys.getlocale("LC_TIME")
## [1] "en_US.UTF-8"
If you need to parse a French date you might use
dmy("16 Avril, 2023", locale = "fr_FR.UTF-8")
## [1] "2023-04-16"
Date-times can be decoded with functions like mdy_hm
:
mdy_hm("April 16, 2023, 6:15 PM")
## [1] "2023-04-16 18:15:00 UTC"
or
mdy_hms("April 16, 2023, 6:15:08 PM")
## [1] "2023-04-16 18:15:08 UTC"
By default these assume the time is specified in the UTC time zone.
Creating Dates and Times from Components
Dates can be created from year, month, and day by make_date()
:
make_date(2023, 4, 16)
## [1] "2023-04-16"
Creating a date
variable from the year
, month
, and day
variables in the New York City flights
table:
library(nycflights13)
fl <- mutate(flights,
date = make_date(year, month, day))
ggplot
and other graphics systems know how to make useful axis labels for dates:
ggplot(count(fl, date)) +
geom_line(aes(x = date, y = n))
Weekday/weekend differenes are clearly visible.
Date-times can be created from year, month, day, hour, minute, and second using make_datetime()
:
make_datetime(2023, 4, 16, 18, 15)
## [1] "2023-04-16 18:15:00 UTC"
An attempt to recreate the time_hour
variable in the flights table:
fl <- mutate(fl,
th = make_datetime(year, month, day,
hour))
This does not quite re-create the time_hour
variable:
identical(fl$th, fl$time_hour)
## [1] FALSE
fl$th[1]
## [1] "2013-01-01 05:00:00 UTC"
fl$time_hour[1]
## [1] "2013-01-01 05:00:00 EST"
By default, make_datetime()
assumes the time points it is given are in UTC.
The time_hour
variable is using local (eastern US) time.
We will look at time zones more below .
Date and Time Components
Components of dates and date-times can be extracted with:
year()
, month()
, day()
, hour()
, minute()
, second()
yday()
– day of the year
mday()
– day of the month, same as day
wday()
– day of the week
By default, wday()
returns an integer:
wday(today())
## [1] 6
But it can also return a label:
wday(today(), label = TRUE)
## [1] Fri
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(today(), label = TRUE, abbr = FALSE)
## [1] Friday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
Weekday names and abbreviations are obviously locale-specific, and you can specify an alternative to the default current locale:
wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8")
## [1] Freitag
## 7 Levels: Sonntag < Montag < Dienstag < Mittwoch < Donnerstag < ... < Samstag
wday(today(), label = TRUE, locale = "de_DE.UTF-8")
## [1] Fr
## Levels: So < Mo < Di < Mi < Do < Fr < Sa
Even the integer value can be tricky:
In the US, Canada, Japan the first day of the week is Sunday.
In Germany, France, and the ISO8601 standard the first day of the week is Monday.
wday()
can be asked to use a different first day, and a global default can be set.
Using wday()
and the date
variable we can look at the distribution of the number of flights by day of the week:
ggplot(fl, aes(x = wday(date, label = TRUE))) +
geom_bar(fill = "deepskyblue3")
There were substantially fewer flights on Saturdays but only slightly fewer flights on Sundays.
Rounding
floor_date()
, round_date()
, and ceiling_date()
can be used to round to a particular unit; the most useful are week
and quarter
.
Flights by week:
The first and last weeks were incomplete:
as.character(wday(ymd("2013-01-01"),
label = TRUE, abbr = FALSE))
## [1] "Tuesday"
as.character(wday(ymd("2013-12-31"),
label = TRUE, abbr = FALSE))
## [1] "Tuesday"
Time Spans
Subtracting dates or date-times produces difftime
objects:
now() - as_datetime(today())
## Time difference of 1.021668 days
today() - ymd("2023-01-01")
## Time difference of 124 days
Working with different units can be awkward; lubridate
provides durations , which always work in seconds:
as.duration(now() - as_datetime(today()))
## [1] "88272.1419148445s (~1.02 days)"
as.duration(today() - ymd("2023-01-01"))
## [1] "10713600s (~17.71 weeks)"
Durations can be created with dyears()
, ddays()
, dweeks()
, etc.:
dyears(1)
## [1] "31557600s (~1 years)"
ddays(1)
## [1] "86400s (~1 days)"
Durations can be added to a date or date-time object and can be multiplied by a number:
today()
## [1] "2023-05-05"
today() + ddays(2)
## [1] "2023-05-07"
today() + 2 * ddays(1)
## [1] "2023-05-07"
(n1 <- now())
## [1] "2023-05-05 19:31:12 CDT"
n1 + dminutes(3)
## [1] "2023-05-05 19:34:12 CDT"
Duations represent an exact number of seconds, which can lead to surprises when DST is involved.
In 2023 the switch to DST happened in the US on March 12:
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + ddays(1)
## [1] "2023-03-13 00:02:00 CDT"
Periods are an alternative that may work more intuitively.
Periods are constructed with years()
, months()
, days()
, etc:
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + days(1)
## [1] "2023-03-12 23:02:00 CDT"
Time Zones
Date-time objects specify a point in time relative to second zero, minute zero, hour zero, on January 1, 1970 in Coordinated Universal Time (UTC) .
Date-time objects can have a time zone associated with them that affects how they are printed.
now()
returns a date-time object with the time zone set as the local time zone of the computer.
now()
## [1] "2023-05-05 19:31:12 CDT"
Time zones are complex, they can change on a regular basis (DST) or as a result of politics.
When a date-time object is created from components, by default it is given the UTC time zone.
To create a point in time based on local time information, such as 10 AM on April 16, 2023, in Iowa City, a time zone for interpreting the local time needs to be specified.
The short notations like CDT are not adequate for this: Both the US and Australia have EST, which are quite different.
R uses the Internet Assigned Numbers Authority (IANA) naming convention and data base.
The local time zone is:
Sys.timezone()
## [1] "America/Chicago"
The time point 10:00:00 AM on April 16, 2023 in Iowa City can be specified as
(tm <- make_datetime(2023, 4, 16, 10, tz = "America/Chicago"))
## [1] "2023-04-16 10:00:00 CDT"
Time zones of date-time objects can be changed in two ways:
The available time zone specifications are contained in OlsonNames
:
head(OlsonNames())
## [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
## [4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera"
The instant tm
in some other time zones:
with_tz(tm, tz = "UTC")
## [1] "2023-04-16 15:00:00 UTC"
with_tz(tm, tz = "America/New_York")
## [1] "2023-04-16 11:00:00 EDT"
with_tz(tm, tz = "Asia/Shanghai")
## [1] "2023-04-16 23:00:00 CST"
with_tz(tm, tz = "Pacific/Auckland")
## [1] "2023-04-17 03:00:00 NZST"
with_tz(tm, tz = "Asia/Kolkata")
## [1] "2023-04-16 20:30:00 IST"
with_tz(tm, tz = "Canada/Newfoundland")
## [1] "2023-04-16 12:30:00 NDT"
with_tz(tm, tz = "Asia/Katmandu")
## [1] "2023-04-16 20:45:00 +0545"
Some more examples:
## All offsets that are not a full hour:
get_offset <- function(z)
abs(minute(with_tz(tm, tz = z)) - minute(tm))
offsets <- data.frame(zone = OlsonNames()) %>%
mutate(offset = sapply(zone, get_offset)) %>%
arrange(offset)
filter(offsets, offset != 0)
## Offsets for Australia:
filter(offsets, grepl("Australia", zone))
If we create the th
variable for the flights data as
fl <- mutate(flights, th = make_datetime(year, month, day, hour,
tz = "America/New_York"))
then the result matches the date_time
variable:
identical(fl$th, fl$time_hour)
## [1] TRUE
The time_hour
variable in the weather
table reflects actual points in time and, together with origin
, can serve as a primary key:
filter(count(weather, origin, time_hour), n > 1)
## # A tibble: 0 × 3
## # ℹ 3 variables: origin <chr>, time_hour <dttm>, n <int>
The month
, day
, hour
variables are confused by the time change.
In November there is a repeat:
count(weather, origin, month, day, hour) %>%
filter(n > 1)
## # A tibble: 3 × 5
## origin month day hour n
## <chr> <int> <int> <int> <int>
## 1 EWR 11 3 1 2
## 2 JFK 11 3 1 2
## 3 LGA 11 3 1 2
and there is a missing hour in March:
select(weather, origin, month, day, hour) %>%
filter(origin == "EWR", month == 3,
day == 10, hour <= 3)
## # A tibble: 3 × 4
## origin month day hour
## <chr> <int> <int> <int>
## 1 EWR 3 10 0
## 2 EWR 3 10 1
## 3 EWR 3 10 3
Things to Look Out For
For dates:
Language used for months and weekdays, and their abbreviations.
Ambiguous numerical conventions like 4/11/2023: is this April 11 or November 4?
Day 2 of the week: is this Monday or Tuesday?
For historical data, what calendar is being used? (The October Revolution happened on November 6/7, 1917 by the current Gregorian calendar; October 24/25 by the Julian calendar Russia was still using.)
For date-times
All of the above.
Daylight saving time.
Time zones.
Exercises
Using the NYC flights data, how many flights were there on Saturdays from Newark (EWR) to Cicago O’Hare (ORD) in 2013?
413
522
601
733
What day of the week will July 4, 2030, fall on?
Monday
Wednesday
Thursday
Saturday
---
title: "Dates and Times"
output:
  html_document:
    toc: yes
    code_folding: show
    code_download: true
---

<link rel="stylesheet" href="stat4580.css" type="text/css" />
<style type="text/css"> .remark-code { font-size: 85%; } </style>
```{r setup, include = FALSE, message = FALSE}
source(here::here("setup.R"))
knitr::opts_chunk$set(collapse = TRUE, message = FALSE,
                      fig.height = 5, fig.width = 6, fig.align = "center")

set.seed(12345)
library(dplyr)
library(ggplot2)
library(lattice)
library(gridExtra)
library(patchwork)
source(here::here("datasets.R"))
theme_set(theme_minimal() +
          theme(text = element_text(size = 16),
                panel.border = element_rect(color = "grey30", fill = NA)))
```


## Background

Data are often associated with a point in time, a particular

* year;

* month;

* day;

* hour, minute, second, ...

Some issues with points in time:

* Leap years and leap seconds.

* Daylight saving time.

* Local time or
  [_Coordinated Universal Time
      (UTC)_](https://www.timeanddate.com/time/aboututc.html).

* For historical data: changes in calendars.

R has data types to represent

* a particular date (`Date`);

* a particular second (`POSIXct`, date-time).

R stores dates as days since January 1, 1970, and date-times as the
the number of seconds since midnight on that day in
[_Coordinated Universal Time_
    (UTC)](https://www.timeanddate.com/time/aboututc.html).

Date objects are less complicated than data-time objects, so if you
only need dates you should stick with date objects.

Base R provides many facilities for dealing with dates and date-times.

The `lubridate` package provides a useful interface.

```{r, message = FALSE}
library(lubridate)
```

The
[_Dates and Times_ chapter](https://r4ds.had.co.nz/dates-and-times.html)
of [_R for Data Science_](https://r4ds.had.co.nz/) provides more
details.


## Creating Dates and Times


### Today and Now

The `lubridate` function `today()` returns today's date as a `Date` object:

```{r}
today()
```

```{r}
class(today())
```

The `lubridate` function `now()` returns the current date-time as a
`POSIXct` object:

```{r}
now()
```

```{r}
class(now())
```

The printed representation follows the international standard for the
representation of dates and times
([ISO8601](https://en.wikipedia.org/wiki/ISO_8601)).

Date and date-time objects can be used with addition and subtraction:

```{r}
now() + 3600  ## one hour from now
```

```{r}
today() - 7   ## one week ago
```


### Parsing Dates and Times From Strings

Some common date formats:

```{r}
d1 <- "2023-04-16"
d2 <- "April 16, 2023"
d3 <- "16 April 2023"
d4 <- "16 April 23"
```

These can be decoded by the functions `ymd()`, `mdy()`, and `dmy()`:

```{r}
ymd(d1)
```

```{r}
mdy(d2)
```

```{r}
dmy(d3)
```

```{r}
dmy(d4)
```

By default, these functions use the current _locale_ settings for
interpreting month names or abbreviations.

```{r}
Sys.getlocale("LC_TIME")
```

If you need to parse a French date you might use

```{r}
dmy("16 Avril, 2023", locale = "fr_FR.UTF-8")
```

Date-times can be decoded with functions like `mdy_hm`:

```{r}
mdy_hm("April 16, 2023, 6:15 PM")
```

or

```{r}
mdy_hms("April 16, 2023, 6:15:08 PM")
```

By default these assume the time is specified in the UTC time zone.


### Creating Dates and Times from Components

Dates can be created from year, month, and day by `make_date()`:

```{r}
make_date(2023, 4, 16)
```

Creating a `date` variable from the `year`, `month`, and `day`
variables in the New York City `flights` table:

```{r}
library(nycflights13)
fl <- mutate(flights,
             date = make_date(year, month, day))
```

`ggplot` and other graphics systems know how to make useful axis
labels for dates:

```{r, class.source = "fold-hide"}
ggplot(count(fl, date)) +
    geom_line(aes(x = date, y = n))
```

Weekday/weekend differenes are clearly visible.

Date-times can be created from year, month, day, hour, minute, and second
using `make_datetime()`:

```{r}
make_datetime(2023, 4, 16, 18, 15)
```

An attempt to recreate the `time_hour` variable in the flights table:

```{r}
fl <- mutate(fl,
             th = make_datetime(year, month, day,
                                hour))
```

This does not quite re-create the `time_hour` variable:
```{r}
identical(fl$th, fl$time_hour)
```

```{r}
fl$th[1]
fl$time_hour[1]
```

By default, `make_datetime()` assumes the time points it is given are in UTC.

The `time_hour` variable is using local (eastern US) time.

We will look at time zones more [below](#time-zones).


## Date and Time Components

Components of dates and date-times can be extracted with:

* `year()`, `month()`, `day()`, `hour()`, `minute()`, `second()`
* `yday()` -- day of the year
* `mday()` -- day of the month, same as `day`
* `wday()` -- day of the week

By default, `wday()` returns an integer:

```{r}
wday(today())
```

But it can also return a label:

```{r}
wday(today(), label = TRUE)
```

```{r}
wday(today(), label = TRUE, abbr = FALSE)
```

Weekday names and abbreviations are obviously locale-specific, and you
can specify an alternative to the default current locale:

```{r}
wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8")
```

```{r}
wday(today(), label = TRUE, locale = "de_DE.UTF-8")
```

Even the integer value can be tricky:

* In the US, Canada, Japan the first day of the week is Sunday.

* In Germany, France, and the ISO8601 standard the first day of the
  week is Monday.

`wday()` can be asked to use a different first day, and a global default
can be set.

Using `wday()` and the `date` variable we can look at the distribution
of the number of flights by day of the week:

```{r flights-wday, eval = FALSE}
ggplot(fl, aes(x = wday(date, label = TRUE))) +
    geom_bar(fill = "deepskyblue3")
```

```{r flights-wday, echo = FALSE}
```

There were substantially fewer flights on Saturdays but only slightly
fewer flights on Sundays.


## Rounding

`floor_date()`, `round_date()`, and `ceiling_date()` can be used to
round to a particular unit; the most useful are `week` and `quarter`.

Flights by week:

```{r, echo = FALSE, dpi = 300, fig.height = 4}
ggplot(fl, aes(x = round_date(date, "week"))) +
    geom_bar(fill = "deepskyblue3")
```

The first and last weeks were incomplete:

```{r}
as.character(wday(ymd("2013-01-01"),
                  label = TRUE, abbr = FALSE))
```
```{r}
as.character(wday(ymd("2013-12-31"),
                  label = TRUE, abbr = FALSE))
```


## Time Spans

Subtracting dates or date-times produces `difftime` objects:

```{r}
now() - as_datetime(today())
```

```{r}
today() - ymd("2023-01-01")
```

Working with different units can be awkward; `lubridate` provides
_durations_, which always work in seconds:

```{r}
as.duration(now() - as_datetime(today()))
```

```{r}
as.duration(today() - ymd("2023-01-01"))
```

Durations can be created with `dyears()`, `ddays()`, `dweeks()`, etc.:

```{r}
dyears(1)
```

```{r}
ddays(1)
```

Durations can be added to a date or date-time object and can be
multiplied by a number:

```{r}
today()
```

```{r}
today() + ddays(2)
```

```{r}
today() + 2 * ddays(1)
```

```{r}
(n1 <- now())
```

```{r}
n1 + dminutes(3)
```

Duations represent an exact number of seconds, which can lead to
surprises when DST is involved.

In 2023 the switch to DST happened in the US on March 12:

```{r}
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + ddays(1)
```

_Periods_ are an alternative that may work more intuitively.

Periods are constructed with `years()`, `months()`, `days()`, etc:

```{r}
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + days(1)
```


## Time Zones

Date-time objects specify a point in time relative to second zero,
minute zero, hour zero, on January 1, 1970 in
[Coordinated Universal Time (UTC)](https://www.timeanddate.com/time/aboututc.html).

Date-time objects can have a time zone associated with them that
affects how they are printed.

`now()` returns a date-time object with the time zone set as the
local time zone of the computer.

```{r}
now()
```

Time zones are complex, they can change on a regular basis (DST) or as
a result of politics.

When a date-time object is created from components, by default it is
given the UTC time zone.

To create a point in time based on local time information, such as 10
AM on April 16, 2023, in Iowa City, a time zone for interpreting the
local time needs to be specified.

The short notations like CDT are not adequate for this: Both the US
 and Australia have EST, which are quite different.

R uses the
[_Internet Assigned Numbers Authority_ (IANA)](https://www.iana.org/time-zones)
naming convention and data base.

The local time zone is:

```{r}
Sys.timezone()
```

The time point 10:00:00 AM on April 16, 2023 in Iowa City can be
specified as

```{r}
(tm <- make_datetime(2023, 4, 16, 10, tz = "America/Chicago"))
```

Time zones of date-time objects can be changed in two ways:

* `with_tz` keeps the instant in time and changes the time zone used
  for display.

* `force_tz` changes the instant in time; use this if the time zone is
  incorrectly specified but the clock time is correct.

The available time zone specifications are contained in `OlsonNames`:

```{r}
head(OlsonNames())
```

The instant `tm` in some other time zones:

```{r}
with_tz(tm, tz = "UTC")
```

```{r}
with_tz(tm, tz = "America/New_York")
```

```{r}
with_tz(tm, tz = "Asia/Shanghai")
```

```{r}
with_tz(tm, tz = "Pacific/Auckland")
```

```{r}
with_tz(tm, tz = "Asia/Kolkata")
```

```{r}
with_tz(tm, tz = "Canada/Newfoundland")
```

```{r}
with_tz(tm, tz = "Asia/Katmandu")
```

Some more examples:

```{r, eval = FALSE}
## All offsets that are not a full hour:
get_offset <- function(z)
    abs(minute(with_tz(tm, tz = z)) - minute(tm))
offsets <- data.frame(zone = OlsonNames()) %>%
    mutate(offset = sapply(zone, get_offset)) %>%
    arrange(offset)
filter(offsets, offset != 0)

## Offsets for Australia:
filter(offsets, grepl("Australia", zone))
```

If we create the `th` variable for the flights data as

```{r}
fl <- mutate(flights, th = make_datetime(year, month, day, hour,
                                         tz = "America/New_York"))
```

then the result matches the `date_time` variable:

```{r}
identical(fl$th, fl$time_hour)
```

The `time_hour` variable in the `weather` table reflects actual points
in time and, together with `origin`, can serve as a primary key:

```{r}
filter(count(weather, origin, time_hour), n > 1)
```

The `month`, `day`, `hour` variables are confused by the time
change.

In November there is a repeat:

```{r}
count(weather, origin, month, day, hour) %>%
    filter(n > 1)
```

and there is a missing hour in March:

```{r}
select(weather, origin, month, day, hour) %>%
    filter(origin == "EWR", month == 3,
           day == 10, hour <= 3)
```


## Things to Look Out For

For dates:

* Language used for months and weekdays, and their abbreviations.

* Ambiguous numerical conventions like 4/11/2023: is this April 11 or
  November 4?

* Day 2 of the week: is this Monday or Tuesday?

* For historical data, what calendar is being used? (The _October
  Revolution_ happened on November 6/7, 1917 by the current Gregorian
  calendar; October 24/25 by the Julian calendar Russia was still
  using.)

For date-times

* All of the above.

* Daylight saving time.

* Time zones.

<!--
Locale stuff may fail on some systems (on Ubuntu may need to use
locale-gen
-->
<!--
library(lubridate)
flights <- mutate(flights, th = make_datetime(year, month, day, hour,
                                              tz = "America/New_York"))
weather <- mutate(weather, th = make_datetime(year, month, day, hour,
                                              tz = "UTC"))

fl0 <- select(filter(flights, origin == "LGA"),
              dep_time, time_hour, th)
w0 <- select(filter(weather, origin == "LGA")[-(1 : 9),],
             time_hour, th, temp)

fl1 <- left_join(fl0, select(w0, -th), "time_hour")
fl2 <- left_join(fl0, select(w0, -time_hour), "th")
-->


## Reading

Chapter [_Dates and Times_](https://r4ds.had.co.nz/dates-and-times.html)
in [_R for Data Science_](https://r4ds.had.co.nz/).


## Exercises

1. Using the NYC flights data, how many flights were there on
   Saturdays from Newark (EWR) to Cicago O'Hare (ORD) in 2013?
   
    a. 413
    b. 522
    c. 601
    d. 733
   

2. What day of the week will July 4, 2030, fall on?

    a. Monday
    b. Wednesday
    d. Thursday
    c. Saturday
