--- title: "Dates and Times" output: html_document: toc: yes code_folding: show code_download: true --- ```{r setup, include = FALSE, message = FALSE} source(here::here("setup.R")) knitr::opts_chunk$set(collapse = TRUE, message = FALSE, fig.height = 5, fig.width = 6, fig.align = "center") set.seed(12345) library(dplyr) library(ggplot2) library(lattice) library(gridExtra) library(patchwork) source(here::here("datasets.R")) theme_set(theme_minimal() + theme(text = element_text(size = 16), panel.border = element_rect(color = "grey30", fill = NA))) ``` ## Background Data are often associated with a point in time, a particular * year; * month; * day; * hour, minute, second, ... Some issues with points in time: * Leap years and leap seconds. * Daylight saving time. * Local time or [_Coordinated Universal Time (UTC)_](https://www.timeanddate.com/time/aboututc.html). * For historical data: changes in calendars. R has data types to represent * a particular date (`Date`); * a particular second (`POSIXct`, date-time). R stores dates as days since January 1, 1970, and date-times as the the number of seconds since midnight on that day in [_Coordinated Universal Time_ (UTC)](https://www.timeanddate.com/time/aboututc.html). Date objects are less complicated than data-time objects, so if you only need dates you should stick with date objects. Base R provides many facilities for dealing with dates and date-times. The `lubridate` package provides a useful interface. ```{r, message = FALSE} library(lubridate) ``` The [_Dates and Times_ chapter](https://r4ds.had.co.nz/dates-and-times.html) of [_R for Data Science_](https://r4ds.had.co.nz/) provides more details. ## Creating Dates and Times ### Today and Now The `lubridate` function `today()` returns today's date as a `Date` object: ```{r} today() ``` ```{r} class(today()) ``` The `lubridate` function `now()` returns the current date-time as a `POSIXct` object: ```{r} now() ``` ```{r} class(now()) ``` The printed representation follows the international standard for the representation of dates and times ([ISO8601](https://en.wikipedia.org/wiki/ISO_8601)). Date and date-time objects can be used with addition and subtraction: ```{r} now() + 3600 ## one hour from now ``` ```{r} today() - 7 ## one week ago ``` ### Parsing Dates and Times From Strings Some common date formats: ```{r} d1 <- "2023-04-16" d2 <- "April 16, 2023" d3 <- "16 April 2023" d4 <- "16 April 23" ``` These can be decoded by the functions `ymd()`, `mdy()`, and `dmy()`: ```{r} ymd(d1) ``` ```{r} mdy(d2) ``` ```{r} dmy(d3) ``` ```{r} dmy(d4) ``` By default, these functions use the current _locale_ settings for interpreting month names or abbreviations. ```{r} Sys.getlocale("LC_TIME") ``` If you need to parse a French date you might use ```{r} dmy("16 Avril, 2023", locale = "fr_FR.UTF-8") ``` Date-times can be decoded with functions like `mdy_hm`: ```{r} mdy_hm("April 16, 2023, 6:15 PM") ``` or ```{r} mdy_hms("April 16, 2023, 6:15:08 PM") ``` By default these assume the time is specified in the UTC time zone. ### Creating Dates and Times from Components Dates can be created from year, month, and day by `make_date()`: ```{r} make_date(2023, 4, 16) ``` Creating a `date` variable from the `year`, `month`, and `day` variables in the New York City `flights` table: ```{r} library(nycflights13) fl <- mutate(flights, date = make_date(year, month, day)) ``` `ggplot` and other graphics systems know how to make useful axis labels for dates: ```{r, class.source = "fold-hide"} ggplot(count(fl, date)) + geom_line(aes(x = date, y = n)) ``` Weekday/weekend differenes are clearly visible. Date-times can be created from year, month, day, hour, minute, and second using `make_datetime()`: ```{r} make_datetime(2023, 4, 16, 18, 15) ``` An attempt to recreate the `time_hour` variable in the flights table: ```{r} fl <- mutate(fl, th = make_datetime(year, month, day, hour)) ``` This does not quite re-create the `time_hour` variable: ```{r} identical(fl$th, fl$time_hour) ``` ```{r} fl$th[1] fl$time_hour[1] ``` By default, `make_datetime()` assumes the time points it is given are in UTC. The `time_hour` variable is using local (eastern US) time. We will look at time zones more [below](#time-zones). ## Date and Time Components Components of dates and date-times can be extracted with: * `year()`, `month()`, `day()`, `hour()`, `minute()`, `second()` * `yday()` -- day of the year * `mday()` -- day of the month, same as `day` * `wday()` -- day of the week By default, `wday()` returns an integer: ```{r} wday(today()) ``` But it can also return a label: ```{r} wday(today(), label = TRUE) ``` ```{r} wday(today(), label = TRUE, abbr = FALSE) ``` Weekday names and abbreviations are obviously locale-specific, and you can specify an alternative to the default current locale: ```{r} wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8") ``` ```{r} wday(today(), label = TRUE, locale = "de_DE.UTF-8") ``` Even the integer value can be tricky: * In the US, Canada, Japan the first day of the week is Sunday. * In Germany, France, and the ISO8601 standard the first day of the week is Monday. `wday()` can be asked to use a different first day, and a global default can be set. Using `wday()` and the `date` variable we can look at the distribution of the number of flights by day of the week: ```{r flights-wday, eval = FALSE} ggplot(fl, aes(x = wday(date, label = TRUE))) + geom_bar(fill = "deepskyblue3") ``` ```{r flights-wday, echo = FALSE} ``` There were substantially fewer flights on Saturdays but only slightly fewer flights on Sundays. ## Rounding `floor_date()`, `round_date()`, and `ceiling_date()` can be used to round to a particular unit; the most useful are `week` and `quarter`. Flights by week: ```{r, echo = FALSE, dpi = 300, fig.height = 4} ggplot(fl, aes(x = round_date(date, "week"))) + geom_bar(fill = "deepskyblue3") ``` The first and last weeks were incomplete: ```{r} as.character(wday(ymd("2013-01-01"), label = TRUE, abbr = FALSE)) ``` ```{r} as.character(wday(ymd("2013-12-31"), label = TRUE, abbr = FALSE)) ``` ## Time Spans Subtracting dates or date-times produces `difftime` objects: ```{r} now() - as_datetime(today()) ``` ```{r} today() - ymd("2023-01-01") ``` Working with different units can be awkward; `lubridate` provides _durations_, which always work in seconds: ```{r} as.duration(now() - as_datetime(today())) ``` ```{r} as.duration(today() - ymd("2023-01-01")) ``` Durations can be created with `dyears()`, `ddays()`, `dweeks()`, etc.: ```{r} dyears(1) ``` ```{r} ddays(1) ``` Durations can be added to a date or date-time object and can be multiplied by a number: ```{r} today() ``` ```{r} today() + ddays(2) ``` ```{r} today() + 2 * ddays(1) ``` ```{r} (n1 <- now()) ``` ```{r} n1 + dminutes(3) ``` Duations represent an exact number of seconds, which can lead to surprises when DST is involved. In 2023 the switch to DST happened in the US on March 12: ```{r} ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + ddays(1) ``` _Periods_ are an alternative that may work more intuitively. Periods are constructed with `years()`, `months()`, `days()`, etc: ```{r} ymd_hm("2023-03-11 23:02", tz = "America/Chicago") + days(1) ``` ## Time Zones Date-time objects specify a point in time relative to second zero, minute zero, hour zero, on January 1, 1970 in [Coordinated Universal Time (UTC)](https://www.timeanddate.com/time/aboututc.html). Date-time objects can have a time zone associated with them that affects how they are printed. `now()` returns a date-time object with the time zone set as the local time zone of the computer. ```{r} now() ``` Time zones are complex, they can change on a regular basis (DST) or as a result of politics. When a date-time object is created from components, by default it is given the UTC time zone. To create a point in time based on local time information, such as 10 AM on April 16, 2023, in Iowa City, a time zone for interpreting the local time needs to be specified. The short notations like CDT are not adequate for this: Both the US and Australia have EST, which are quite different. R uses the [_Internet Assigned Numbers Authority_ (IANA)](https://www.iana.org/time-zones) naming convention and data base. The local time zone is: ```{r} Sys.timezone() ``` The time point 10:00:00 AM on April 16, 2023 in Iowa City can be specified as ```{r} (tm <- make_datetime(2023, 4, 16, 10, tz = "America/Chicago")) ``` Time zones of date-time objects can be changed in two ways: * `with_tz` keeps the instant in time and changes the time zone used for display. * `force_tz` changes the instant in time; use this if the time zone is incorrectly specified but the clock time is correct. The available time zone specifications are contained in `OlsonNames`: ```{r} head(OlsonNames()) ``` The instant `tm` in some other time zones: ```{r} with_tz(tm, tz = "UTC") ``` ```{r} with_tz(tm, tz = "America/New_York") ``` ```{r} with_tz(tm, tz = "Asia/Shanghai") ``` ```{r} with_tz(tm, tz = "Pacific/Auckland") ``` ```{r} with_tz(tm, tz = "Asia/Kolkata") ``` ```{r} with_tz(tm, tz = "Canada/Newfoundland") ``` ```{r} with_tz(tm, tz = "Asia/Katmandu") ``` Some more examples: ```{r, eval = FALSE} ## All offsets that are not a full hour: get_offset <- function(z) abs(minute(with_tz(tm, tz = z)) - minute(tm)) offsets <- data.frame(zone = OlsonNames()) %>% mutate(offset = sapply(zone, get_offset)) %>% arrange(offset) filter(offsets, offset != 0) ## Offsets for Australia: filter(offsets, grepl("Australia", zone)) ``` If we create the `th` variable for the flights data as ```{r} fl <- mutate(flights, th = make_datetime(year, month, day, hour, tz = "America/New_York")) ``` then the result matches the `date_time` variable: ```{r} identical(fl$th, fl$time_hour) ``` The `time_hour` variable in the `weather` table reflects actual points in time and, together with `origin`, can serve as a primary key: ```{r} filter(count(weather, origin, time_hour), n > 1) ``` The `month`, `day`, `hour` variables are confused by the time change. In November there is a repeat: ```{r} count(weather, origin, month, day, hour) %>% filter(n > 1) ``` and there is a missing hour in March: ```{r} select(weather, origin, month, day, hour) %>% filter(origin == "EWR", month == 3, day == 10, hour <= 3) ``` ## Things to Look Out For For dates: * Language used for months and weekdays, and their abbreviations. * Ambiguous numerical conventions like 4/11/2023: is this April 11 or November 4? * Day 2 of the week: is this Monday or Tuesday? * For historical data, what calendar is being used? (The _October Revolution_ happened on November 6/7, 1917 by the current Gregorian calendar; October 24/25 by the Julian calendar Russia was still using.) For date-times * All of the above. * Daylight saving time. * Time zones. ## Reading Chapter [_Dates and Times_](https://r4ds.had.co.nz/dates-and-times.html) in [_R for Data Science_](https://r4ds.had.co.nz/). ## Exercises 1. Using the NYC flights data, how many flights were there on Saturdays from Newark (EWR) to Cicago O'Hare (ORD) in 2013? a. 413 b. 522 c. 601 d. 733 2. What day of the week will July 4, 2030, fall on? a. Monday b. Wednesday d. Thursday c. Saturday