---
title: "Dates and Times"
output:
html_document:
toc: yes
code_folding: show
code_download: true
---
```{r setup, include = FALSE, message = FALSE}
source(here::here("setup.R"))
knitr::opts_chunk$set(collapse = TRUE, message = FALSE,
fig.height = 5, fig.width = 6, fig.align = "center")
set.seed(12345)
library(dplyr)
library(ggplot2)
library(lattice)
library(gridExtra)
library(patchwork)
source(here::here("datasets.R"))
theme_set(theme_minimal() +
theme(text = element_text(size = 16),
panel.border = element_rect(color = "grey30", fill = NA)))
```
## Background
Data are often associated with a point in time, a particular
* year;
* month;
* day;
* hour, minute, second, ...
Some issues with points in time:
* Leap years and leap seconds.
* Daylight saving time.
* Local time or
[_Coordinated Universal Time
(UTC)_](https://www.timeanddate.com/time/aboututc.html).
* For historical data: changes in calendars.
R has data types to represent
* a particular date (`Date`);
* a particular second (`POSIXct`, date-time).
R stores dates as days since January 1, 1970, and date-times as the
the number of seconds since midnight on that day in
[_Coordinated Universal Time_
(UTC)](https://www.timeanddate.com/time/aboututc.html).
Date objects are less complicated than data-time objects, so if you
only need dates you should stick with date objects.
Base R provides many facilities for dealing with dates and date-times.
The `lubridate` package provides a useful interface.
```{r, message = FALSE}
library(lubridate)
```
The
[_Dates and Times_ chapter](https://r4ds.had.co.nz/dates-and-times.html)
of [_R for Data Science_](https://r4ds.had.co.nz/) provides more
details.
## Creating Dates and Times
### Today and Now
The `lubridate` function `today()` returns today's date as a `Date` object:
```{r}
today()
```
```{r}
class(today())
```
The `lubridate` function `now()` returns the current date-time as a
`POSIXct` object:
```{r}
now()
```
```{r}
class(now())
```
The printed representation follows the international standard for the
representation of dates and times
([ISO8601](https://en.wikipedia.org/wiki/ISO_8601)).
Date and date-time objects can be used with addition and subtraction:
```{r}
now() + 3600 ## one hour from now
```
```{r}
today() - 7 ## one week ago
```
### Parsing Dates and Times From Strings
Some common date formats:
```{r}
d1 <- "2023-04-16"
d2 <- "April 16, 2023"
d3 <- "16 April 2023"
d4 <- "16 April 23"
```
These can be decoded by the functions `ymd()`, `mdy()`, and `dmy()`:
```{r}
ymd(d1)
```
```{r}
mdy(d2)
```
```{r}
dmy(d3)
```
```{r}
dmy(d4)
```
By default, these functions use the current _locale_ settings for
interpreting month names or abbreviations.
```{r}
Sys.getlocale("LC_TIME")
```
If you need to parse a French date you might use
```{r}
dmy("16 Avril, 2023", locale = "fr_FR.UTF-8")
```
Date-times can be decoded with functions like `mdy_hm`:
```{r}
mdy_hm("April 16, 2023, 6:15 PM")
```
or
```{r}
mdy_hms("April 16, 2023, 6:15:08 PM")
```
By default these assume the time is specified in the UTC time zone.
### Creating Dates and Times from Components
Dates can be created from year, month, and day by `make_date()`:
```{r}
make_date(2023, 4, 16)
```
Creating a `date` variable from the `year`, `month`, and `day`
variables in the New York City `flights` table:
```{r}
library(nycflights13)
fl <- mutate(flights,
date = make_date(year, month, day))
```
`ggplot` and other graphics systems know how to make useful axis
labels for dates:
```{r, class.source = "fold-hide"}
ggplot(count(fl, date)) +
geom_line(aes(x = date, y = n))
```
Weekday/weekend differenes are clearly visible.
Date-times can be created from year, month, day, hour, minute, and second
using `make_datetime()`:
```{r}
make_datetime(2023, 4, 16, 18, 15)
```
An attempt to recreate the `time_hour` variable in the flights table:
```{r}
fl <- mutate(fl,
th = make_datetime(year, month, day,
hour))
```
This does not quite re-create the `time_hour` variable:
```{r}
identical(fl$th, fl$time_hour)
```
```{r}
fl$th[1]
fl$time_hour[1]
```
By default, `make_datetime()` assumes the time points it is given are in UTC.
The `time_hour` variable is using local (eastern US) time.
We will look at time zones more [below](#time-zones).
## Date and Time Components
Components of dates and date-times can be extracted with:
* `year()`, `month()`, `day()`, `hour()`, `minute()`, `second()`
* `yday()` -- day of the year
* `mday()` -- day of the month, same as `day`
* `wday()` -- day of the week
By default, `wday()` returns an integer:
```{r}
wday(today())
```
But it can also return a label:
```{r}
wday(today(), label = TRUE)
```
```{r}
wday(today(), label = TRUE, abbr = FALSE)
```
Weekday names and abbreviations are obviously locale-specific, and you
can specify an alternative to the default current locale:
```{r}
wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8")
```
```{r}
wday(today(), label = TRUE, locale = "de_DE.UTF-8")
```
Even the integer value can be tricky:
* In the US, Canada, Japan the first day of the week is Sunday.
* In Germany, France, and the ISO8601 standard the first day of the
week is Monday.
`wday()` can be asked to use a different first day, and a global default
can be set.
Using `wday()` and the `date` variable we can look at the distribution
of the number of flights by day of the week:
```{r flights-wday, eval = FALSE}
ggplot(fl, aes(x = wday(date, label = TRUE))) +
geom_bar(fill = "deepskyblue3")
```
```{r flights-wday, echo = FALSE}
```
There were substantially fewer flights on Saturdays but only slightly
fewer flights on Sundays.
## Rounding
`floor_date()`, `round_date()`, and `ceiling_date()` can be used to
round to a particular unit; the most useful are `week` and `quarter`.
Flights by week:
```{r, echo = FALSE, dpi = 300, fig.height = 4}
ggplot(fl, aes(x = round_date(date, "week"))) +
geom_bar(fill = "deepskyblue3")
```
The first and last weeks were incomplete:
```{r}
as.character(wday(ymd("2013-01-01"),
label = TRUE, abbr = FALSE))
```
```{r}
as.character(wday(ymd("2013-12-31"),
label = TRUE, abbr = FALSE))
```
## Time Spans
Subtracting dates or date-times produces `difftime` objects:
```{r}
now() - as_datetime(today())
```
```{r}
today() - ymd("2023-01-01")
```
Working with different units can be awkward; `lubridate` provides
_durations_, which always work in seconds:
```{r}
as.duration(now() - as_datetime(today()))
```
```{r}
as.duration(today() - ymd("2023-01-01"))
```
Durations can be created with `dyears()`, `ddays()`, `dweeks()`, etc.:
```{r}
dyears(1)
```
```{r}
ddays(1)
```
Durations can be added to a date or date-time object and can be
multiplied by a number:
```{r}
today()
```
```{r}
today() + ddays(2)
```
```{r}
today() + 2 * ddays(1)
```
```{r}
(n1 <- now())
```
```{r}
n1 + dminutes(3)
```
Duations represent an exact number of seconds, which can lead to
surprises when DST is involved.
In 2023 the switch to DST happened in the US on March 12:
```{r}
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + ddays(1)
```
_Periods_ are an alternative that may work more intuitively.
Periods are constructed with `years()`, `months()`, `days()`, etc:
```{r}
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + days(1)
```
## Time Zones
Date-time objects specify a point in time relative to second zero,
minute zero, hour zero, on January 1, 1970 in
[Coordinated Universal Time (UTC)](https://www.timeanddate.com/time/aboututc.html).
Date-time objects can have a time zone associated with them that
affects how they are printed.
`now()` returns a date-time object with the time zone set as the
local time zone of the computer.
```{r}
now()
```
Time zones are complex, they can change on a regular basis (DST) or as
a result of politics.
When a date-time object is created from components, by default it is
given the UTC time zone.
To create a point in time based on local time information, such as 10
AM on April 16, 2023, in Iowa City, a time zone for interpreting the
local time needs to be specified.
The short notations like CDT are not adequate for this: Both the US
and Australia have EST, which are quite different.
R uses the
[_Internet Assigned Numbers Authority_ (IANA)](https://www.iana.org/time-zones)
naming convention and data base.
The local time zone is:
```{r}
Sys.timezone()
```
The time point 10:00:00 AM on April 16, 2023 in Iowa City can be
specified as
```{r}
(tm <- make_datetime(2023, 4, 16, 10, tz = "America/Chicago"))
```
Time zones of date-time objects can be changed in two ways:
* `with_tz` keeps the instant in time and changes the time zone used
for display.
* `force_tz` changes the instant in time; use this if the time zone is
incorrectly specified but the clock time is correct.
The available time zone specifications are contained in `OlsonNames`:
```{r}
head(OlsonNames())
```
The instant `tm` in some other time zones:
```{r}
with_tz(tm, tz = "UTC")
```
```{r}
with_tz(tm, tz = "America/New_York")
```
```{r}
with_tz(tm, tz = "Asia/Shanghai")
```
```{r}
with_tz(tm, tz = "Pacific/Auckland")
```
```{r}
with_tz(tm, tz = "Asia/Kolkata")
```
```{r}
with_tz(tm, tz = "Canada/Newfoundland")
```
```{r}
with_tz(tm, tz = "Asia/Katmandu")
```
Some more examples:
```{r, eval = FALSE}
## All offsets that are not a full hour:
get_offset <- function(z)
abs(minute(with_tz(tm, tz = z)) - minute(tm))
offsets <- data.frame(zone = OlsonNames()) %>%
mutate(offset = sapply(zone, get_offset)) %>%
arrange(offset)
filter(offsets, offset != 0)
## Offsets for Australia:
filter(offsets, grepl("Australia", zone))
```
If we create the `th` variable for the flights data as
```{r}
fl <- mutate(flights, th = make_datetime(year, month, day, hour,
tz = "America/New_York"))
```
then the result matches the `date_time` variable:
```{r}
identical(fl$th, fl$time_hour)
```
The `time_hour` variable in the `weather` table reflects actual points
in time and, together with `origin`, can serve as a primary key:
```{r}
filter(count(weather, origin, time_hour), n > 1)
```
The `month`, `day`, `hour` variables are confused by the time
change.
In November there is a repeat:
```{r}
count(weather, origin, month, day, hour) %>%
filter(n > 1)
```
and there is a missing hour in March:
```{r}
select(weather, origin, month, day, hour) %>%
filter(origin == "EWR", month == 3,
day == 10, hour <= 3)
```
## Things to Look Out For
For dates:
* Language used for months and weekdays, and their abbreviations.
* Ambiguous numerical conventions like 4/11/2023: is this April 11 or
November 4?
* Day 2 of the week: is this Monday or Tuesday?
* For historical data, what calendar is being used? (The _October
Revolution_ happened on November 6/7, 1917 by the current Gregorian
calendar; October 24/25 by the Julian calendar Russia was still
using.)
For date-times
* All of the above.
* Daylight saving time.
* Time zones.
## Reading
Chapter [_Dates and Times_](https://r4ds.had.co.nz/dates-and-times.html)
in [_R for Data Science_](https://r4ds.had.co.nz/).
## Exercises
1. Using the NYC flights data, how many flights were there on
Saturdays from Newark (EWR) to Cicago O'Hare (ORD) in 2013?
a. 413
b. 522
c. 601
d. 733
2. What day of the week will July 4, 2030, fall on?
a. Monday
b. Wednesday
d. Thursday
c. Saturday