Background

Data are often associated with a point in time, a particular

Some issues with points in time:

R has data types to represent

R stores dates as days since January 1, 1970, and date-times as the the number of seconds since midnight on that day in Coordinated Universal Time (UTC).

Base R provides many facilities for dealing with dates and date-times.

The lubridate package provides a useful interface.

library(lubridate)

The Dates and Times chapter of R for Data Science provides more details.

Creating Dates and Times

Today and Now

The lubridate function today returns today’s date as a Date object:

today()
## [1] "2019-04-07"
class(today())
## [1] "Date"

The lubridate function now returns the current date-time as a POSIXct object:

now()
## [1] "2019-04-07 13:42:37 CDT"
class(now())
## [1] "POSIXct" "POSIXt"

Date and date-time objects can be used with addition and subtraction:

now() + 3600  ## one hour from now
## [1] "2019-04-07 14:42:37 CDT"
today() - 7   ## one week ago
## [1] "2019-03-31"

Parsing Dates and Times From Strings

Some common date formats:

d1 <- "2019-04-03"
d2 <- "April 3, 2019"
d3 <- "3 April 2019"
d4 <- "3 April 19"

These can be decoded by the functions ymd, mdy, and dmy:

ymd(d1)
## [1] "2019-04-03"
mdy(d2)
## [1] "2019-04-03"
dmy(d3)
## [1] "2019-04-03"
dmy(d4)
## [1] "2019-04-03"

Date-times can be decoded with functions like mdy_hm:

mdy_hm("April 3, 2019, 6:15 PM")
## [1] "2019-04-03 18:15:00 UTC"
mdy_hms("April 3, 2019, 6:15:08 PM")
## [1] "2019-04-03 18:15:08 UTC"

By default, these functions use the current locale settings

Sys.getlocale("LC_TIME")
## [1] "en_US.UTF-8"

If you need to parse a French date you might use

dmy("3 Avril, 2019", locale = "fr_FR.UTF-8")
## [1] "2019-04-03"

Creating Dates and Times from Components

Dates can be created from year, month, and day by make_date:

library(nycflights13)
fl <- mutate(flights, date = make_date(year, month, day))
ggplot(summarize(group_by(fl, date), n = n())) + geom_line(aes(x = date, y = n))

  • ggplot and other graphics systems know how to make useful axis labels for dates.
  • Weekday/weekend differenes are clearly visible.

Date-times can be created from year, month, day, hour, and second using make_datetime:

fl <- mutate(fl, th = make_datetime(year, month, day, hour))

This does not quite re-create the time_hour variable:

identical(fl$th, fl$time_hour)
## [1] FALSE
head(fl$th)
## [1] "2013-01-01 05:00:00 UTC" "2013-01-01 05:00:00 UTC"
## [3] "2013-01-01 05:00:00 UTC" "2013-01-01 05:00:00 UTC"
## [5] "2013-01-01 06:00:00 UTC" "2013-01-01 05:00:00 UTC"
head(fl$time_hour)
## [1] "2013-01-01 05:00:00 EST" "2013-01-01 05:00:00 EST"
## [3] "2013-01-01 05:00:00 EST" "2013-01-01 05:00:00 EST"
## [5] "2013-01-01 06:00:00 EST" "2013-01-01 05:00:00 EST"
  • By default, make_datetime assumes the time points it is given are in UTC.
  • The time_hour variable is using local (eastern US) time.

We will look at time zones more below.

Date and time Components

Components of dates and date-times can be extracted with:

By default, wday returns an integer, but it can also return a label:

wday(today())
## [1] 1
wday(today(), label = TRUE)
## [1] Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(today(), label = TRUE, abbr = FALSE)
## [1] Sunday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8")
## [1] Sonntag
## 7 Levels: Sonntag < Montag < Dienstag < Mittwoch < ... < Samstag

Distribution of the number of flights by day of the week:

ggplot(fl, aes(x = wday(date, label = TRUE))) + geom_bar()

Rounding

floor_date, round_date, and ceiling_date can be used to round to a particular unit; the most useful are week and quarter.

Flights by week:

ggplot(fl, aes(x = round_date(date, "week"))) + geom_bar()

Time Spans

Subtracting dates or date-times produces difftime objects:

now() - as_datetime(today())
## Time difference of 18.71129 hours
today() - ymd("2019-01-01")
## Time difference of 96 days

Working with different units can be awkward; lubridate provides durations, which always work in seconds:

as.duration(now() - as_datetime(today()))
## [1] "67360.6379976273s (~18.71 hours)"
as.duration(today() - ymd("2019-01-01"))
## [1] "8294400s (~13.71 weeks)"

Durations can be created with dyears, ddays, dweeks, etc.:

dyears(1)
## [1] "31536000s (~52.14 weeks)"
ddays()
## [1] "86400s (~1 days)"

Durations can be added to a date or date-time object and can be multiplied by a number:

today()
## [1] "2019-04-07"
today() + ddays(2)
## [1] "2019-04-09"
today() + 2 * ddays(1)
## [1] "2019-04-09"
n1 <- now()
n1
## [1] "2019-04-07 13:42:40 CDT"
n1 + dminutes(3)
## [1] "2019-04-07 13:45:40 CDT"

Duations represent an exact number of seconds, which can lead to surprises when DST is involved:

ymd_hm("2019-03-09 23:02", tz = "America/Chicago") + ddays(1)
## [1] "2019-03-11 00:02:00 CDT"

Periods are an alternative that may work more intuitively.

Periods are constructed with years, months, days, etc:

ymd_hm("2019-03-09 23:02", tz = "America/Chicago") + days(1)
## [1] "2019-03-10 23:02:00 CDT"

Time Zones

Date-time objects specify a point in time relative to second zero, minute zero, hour zero, on January 1, 1970 in Coordinated Universal Time (UTC).

Date-time objects can have a time zone associated with them that affects how they are printed.

now() returns a date-time object with the time zone set as the local time zone of the computer.

now()
## [1] "2019-04-07 13:42:40 CDT"

Time zones are complex, they can change on a regular bases (DST) or as a result of politics.

When a date-time object is created from components, by default it is given the UTC time zone.

To create a point in time based on local time information, such as 10 AM on April 3, 2019, in Iowa City, a time zone for interpreting the local time needs to be specified.

The short notations like CDT are not adequate for this:

R uses the Internet Assigned Numbers Authority (IANA) naming convention and data base.

The local time zone is:

Sys.timezone()
## [1] "America/Chicago"

The time point 10:00:00 AM on April 3, 2019 in Iowa City can be specified as

(tm <- make_datetime(2019, 4, 3, 10, tz = "America/Chicago"))
## [1] "2019-04-03 10:00:00 CDT"

Time zones can be changed in two ways:

The available time zone specifications are contained in OlsonNames:

head(OlsonNames())
## [1] "Africa/Abidjan"     "Africa/Accra"       "Africa/Addis_Ababa"
## [4] "Africa/Algiers"     "Africa/Asmara"      "Africa/Asmera"

The instant tm in some other time zones:

with_tz(tm, tz = "UTC")
## [1] "2019-04-03 15:00:00 UTC"
with_tz(tm, tz = "America/New_York")
## [1] "2019-04-03 11:00:00 EDT"
with_tz(tm, tz = "Canada/Newfoundland")
## [1] "2019-04-03 12:30:00 NDT"

If we create the th variable for the flights data as

fl <- mutate(flights, th = make_datetime(year, month, day, hour,
                                         tz = "America/New_York"))

then the result matches the date_time variable:

identical(fl$th, fl$time_hour)
## [1] TRUE

The time_hour variable in the weather table reflects actual points in time and, together with origin, can serve as a primary key:

filter(count(weather, origin,  time_hour), n > 1)
## # A tibble: 0 x 3
## # … with 3 variables: origin <chr>, time_hour <dttm>, n <int>

The month, day, hour variables are confused by the time change. In November there is a repeat:

filter(count(weather, origin,  month, day, hour), n > 1)
## # A tibble: 3 x 5
##   origin month   day  hour     n
##   <chr>  <dbl> <int> <int> <int>
## 1 EWR       11     3     1     2
## 2 JFK       11     3     1     2
## 3 LGA       11     3     1     2

and there would be a missing hour in March.