Background
Data are often associated with a point in time, a particular
year;
month;
day;
hour, minute, second, …
Some issues with points in time:
R has data types to represent
R stores dates as days since January 1, 1970, and date-times as the the number of seconds since midnight on that day in Coordinated Universal Time (UTC) .
Date objects are less complicated than data-time objects, so if you only need dates you should stick with date objects.
Base R provides many facilities for dealing with dates and date-times.
The lubridate
package provides a useful interface.
library(lubridate)
The Dates and Times chapter of R for Data Science provides more details.
Creating Dates and Times
Today and Now
The lubridate
function today()
returns today’s date as a Date
object:
today()
## [1] "2023-05-05"
class(today())
## [1] "Date"
The lubridate
function now()
returns the current date-time as a POSIXct
object:
now()
## [1] "2023-05-05 19:31:08 CDT"
class(now())
## [1] "POSIXct" "POSIXt"
The printed representation follows the international standard for the representation of dates and times (ISO8601 ).
Date and date-time objects can be used with addition and subtraction:
now() + 3600 ## one hour from now
## [1] "2023-05-05 20:31:08 CDT"
today() - 7 ## one week ago
## [1] "2023-04-28"
Parsing Dates and Times From Strings
Some common date formats:
d1 <- "2023-04-16"
d2 <- "April 16, 2023"
d3 <- "16 April 2023"
d4 <- "16 April 23"
These can be decoded by the functions ymd()
, mdy()
, and dmy()
:
ymd(d1)
## [1] "2023-04-16"
mdy(d2)
## [1] "2023-04-16"
dmy(d3)
## [1] "2023-04-16"
dmy(d4)
## [1] "2023-04-16"
By default, these functions use the current locale settings for interpreting month names or abbreviations.
Sys.getlocale("LC_TIME")
## [1] "en_US.UTF-8"
If you need to parse a French date you might use
dmy("16 Avril, 2023", locale = "fr_FR.UTF-8")
## [1] "2023-04-16"
Date-times can be decoded with functions like mdy_hm
:
mdy_hm("April 16, 2023, 6:15 PM")
## [1] "2023-04-16 18:15:00 UTC"
or
mdy_hms("April 16, 2023, 6:15:08 PM")
## [1] "2023-04-16 18:15:08 UTC"
By default these assume the time is specified in the UTC time zone.
Creating Dates and Times from Components
Dates can be created from year, month, and day by make_date()
:
make_date(2023, 4, 16)
## [1] "2023-04-16"
Creating a date
variable from the year
, month
, and day
variables in the New York City flights
table:
library(nycflights13)
fl <- mutate(flights,
date = make_date(year, month, day))
ggplot
and other graphics systems know how to make useful axis labels for dates:
ggplot(count(fl, date)) +
geom_line(aes(x = date, y = n))
Weekday/weekend differenes are clearly visible.
Date-times can be created from year, month, day, hour, minute, and second using make_datetime()
:
make_datetime(2023, 4, 16, 18, 15)
## [1] "2023-04-16 18:15:00 UTC"
An attempt to recreate the time_hour
variable in the flights table:
fl <- mutate(fl,
th = make_datetime(year, month, day,
hour))
This does not quite re-create the time_hour
variable:
identical(fl$th, fl$time_hour)
## [1] FALSE
fl$th[1]
## [1] "2013-01-01 05:00:00 UTC"
fl$time_hour[1]
## [1] "2013-01-01 05:00:00 EST"
By default, make_datetime()
assumes the time points it is given are in UTC.
The time_hour
variable is using local (eastern US) time.
We will look at time zones more below .
Date and Time Components
Components of dates and date-times can be extracted with:
year()
, month()
, day()
, hour()
, minute()
, second()
yday()
– day of the year
mday()
– day of the month, same as day
wday()
– day of the week
By default, wday()
returns an integer:
wday(today())
## [1] 6
But it can also return a label:
wday(today(), label = TRUE)
## [1] Fri
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
wday(today(), label = TRUE, abbr = FALSE)
## [1] Friday
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
Weekday names and abbreviations are obviously locale-specific, and you can specify an alternative to the default current locale:
wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8")
## [1] Freitag
## 7 Levels: Sonntag < Montag < Dienstag < Mittwoch < Donnerstag < ... < Samstag
wday(today(), label = TRUE, locale = "de_DE.UTF-8")
## [1] Fr
## Levels: So < Mo < Di < Mi < Do < Fr < Sa
Even the integer value can be tricky:
In the US, Canada, Japan the first day of the week is Sunday.
In Germany, France, and the ISO8601 standard the first day of the week is Monday.
wday()
can be asked to use a different first day, and a global default can be set.
Using wday()
and the date
variable we can look at the distribution of the number of flights by day of the week:
ggplot(fl, aes(x = wday(date, label = TRUE))) +
geom_bar(fill = "deepskyblue3")
There were substantially fewer flights on Saturdays but only slightly fewer flights on Sundays.
Rounding
floor_date()
, round_date()
, and ceiling_date()
can be used to round to a particular unit; the most useful are week
and quarter
.
Flights by week:
The first and last weeks were incomplete:
as.character(wday(ymd("2013-01-01"),
label = TRUE, abbr = FALSE))
## [1] "Tuesday"
as.character(wday(ymd("2013-12-31"),
label = TRUE, abbr = FALSE))
## [1] "Tuesday"
Time Spans
Subtracting dates or date-times produces difftime
objects:
now() - as_datetime(today())
## Time difference of 1.021668 days
today() - ymd("2023-01-01")
## Time difference of 124 days
Working with different units can be awkward; lubridate
provides durations , which always work in seconds:
as.duration(now() - as_datetime(today()))
## [1] "88272.1419148445s (~1.02 days)"
as.duration(today() - ymd("2023-01-01"))
## [1] "10713600s (~17.71 weeks)"
Durations can be created with dyears()
, ddays()
, dweeks()
, etc.:
dyears(1)
## [1] "31557600s (~1 years)"
ddays(1)
## [1] "86400s (~1 days)"
Durations can be added to a date or date-time object and can be multiplied by a number:
today()
## [1] "2023-05-05"
today() + ddays(2)
## [1] "2023-05-07"
today() + 2 * ddays(1)
## [1] "2023-05-07"
(n1 <- now())
## [1] "2023-05-05 19:31:12 CDT"
n1 + dminutes(3)
## [1] "2023-05-05 19:34:12 CDT"
Duations represent an exact number of seconds, which can lead to surprises when DST is involved.
In 2023 the switch to DST happened in the US on March 12:
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + ddays(1)
## [1] "2023-03-13 00:02:00 CDT"
Periods are an alternative that may work more intuitively.
Periods are constructed with years()
, months()
, days()
, etc:
ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + days(1)
## [1] "2023-03-12 23:02:00 CDT"
Time Zones
Date-time objects specify a point in time relative to second zero, minute zero, hour zero, on January 1, 1970 in Coordinated Universal Time (UTC) .
Date-time objects can have a time zone associated with them that affects how they are printed.
now()
returns a date-time object with the time zone set as the local time zone of the computer.
now()
## [1] "2023-05-05 19:31:12 CDT"
Time zones are complex, they can change on a regular basis (DST) or as a result of politics.
When a date-time object is created from components, by default it is given the UTC time zone.
To create a point in time based on local time information, such as 10 AM on April 16, 2023, in Iowa City, a time zone for interpreting the local time needs to be specified.
The short notations like CDT are not adequate for this: Both the US and Australia have EST, which are quite different.
R uses the Internet Assigned Numbers Authority (IANA) naming convention and data base.
The local time zone is:
Sys.timezone()
## [1] "America/Chicago"
The time point 10:00:00 AM on April 16, 2023 in Iowa City can be specified as
(tm <- make_datetime(2023, 4, 16, 10, tz = "America/Chicago"))
## [1] "2023-04-16 10:00:00 CDT"
Time zones of date-time objects can be changed in two ways:
The available time zone specifications are contained in OlsonNames
:
head(OlsonNames())
## [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa"
## [4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera"
The instant tm
in some other time zones:
with_tz(tm, tz = "UTC")
## [1] "2023-04-16 15:00:00 UTC"
with_tz(tm, tz = "America/New_York")
## [1] "2023-04-16 11:00:00 EDT"
with_tz(tm, tz = "Asia/Shanghai")
## [1] "2023-04-16 23:00:00 CST"
with_tz(tm, tz = "Pacific/Auckland")
## [1] "2023-04-17 03:00:00 NZST"
with_tz(tm, tz = "Asia/Kolkata")
## [1] "2023-04-16 20:30:00 IST"
with_tz(tm, tz = "Canada/Newfoundland")
## [1] "2023-04-16 12:30:00 NDT"
with_tz(tm, tz = "Asia/Katmandu")
## [1] "2023-04-16 20:45:00 +0545"
Some more examples:
## All offsets that are not a full hour:
get_offset <- function(z)
abs(minute(with_tz(tm, tz = z)) - minute(tm))
offsets <- data.frame(zone = OlsonNames()) %>%
mutate(offset = sapply(zone, get_offset)) %>%
arrange(offset)
filter(offsets, offset != 0)
## Offsets for Australia:
filter(offsets, grepl("Australia", zone))
If we create the th
variable for the flights data as
fl <- mutate(flights, th = make_datetime(year, month, day, hour,
tz = "America/New_York"))
then the result matches the date_time
variable:
identical(fl$th, fl$time_hour)
## [1] TRUE
The time_hour
variable in the weather
table reflects actual points in time and, together with origin
, can serve as a primary key:
filter(count(weather, origin, time_hour), n > 1)
## # A tibble: 0 × 3
## # ℹ 3 variables: origin <chr>, time_hour <dttm>, n <int>
The month
, day
, hour
variables are confused by the time change.
In November there is a repeat:
count(weather, origin, month, day, hour) %>%
filter(n > 1)
## # A tibble: 3 × 5
## origin month day hour n
## <chr> <int> <int> <int> <int>
## 1 EWR 11 3 1 2
## 2 JFK 11 3 1 2
## 3 LGA 11 3 1 2
and there is a missing hour in March:
select(weather, origin, month, day, hour) %>%
filter(origin == "EWR", month == 3,
day == 10, hour <= 3)
## # A tibble: 3 × 4
## origin month day hour
## <chr> <int> <int> <int>
## 1 EWR 3 10 0
## 2 EWR 3 10 1
## 3 EWR 3 10 3
Things to Look Out For
For dates:
Language used for months and weekdays, and their abbreviations.
Ambiguous numerical conventions like 4/11/2023: is this April 11 or November 4?
Day 2 of the week: is this Monday or Tuesday?
For historical data, what calendar is being used? (The October Revolution happened on November 6/7, 1917 by the current Gregorian calendar; October 24/25 by the Julian calendar Russia was still using.)
For date-times
All of the above.
Daylight saving time.
Time zones.
Exercises
Using the NYC flights data, how many flights were there on Saturdays from Newark (EWR) to Cicago O’Hare (ORD) in 2013?
413
522
601
733
What day of the week will July 4, 2030, fall on?
Monday
Wednesday
Thursday
Saturday
