class: center, middle, title-slide # Dates and Times ### Luke Tierney ### University of Iowa ### 2022-04-07 --- layout: true <link rel="stylesheet" href="stat4580.css" type="text/css" /> <style type="text/css"> .remark-code { font-size: 85%; } </style> ## Background --- Data are often associated with a point in time, a particular -- * year; -- * month; -- * day; -- * hour, minute, second, ... -- Some issues with points in time: -- * Leap years and leap seconds. -- * Daylight saving time. -- * Local time or [_Coordinated Universal Time (UTC)_](https://www.timeanddate.com/time/aboututc.html). -- * For historical data: changes in calendars. --- R has data types to represent -- * a particular date (`Date`); -- * a particular second (`POSIXct`, date-time). -- R stores dates as days since January 1, 1970, and date-times as the the number of seconds since midnight on that day in [_Coordinated Universal Time_ (UTC)](https://www.timeanddate.com/time/aboututc.html). -- Date objects are less complicated than data-time objects, so if you only need dates you should stick with date objects. -- Base R provides many facilities for dealing with dates and date-times. -- The `lubridate` package provides a useful interface. ```r library(lubridate) ``` -- The [_Dates and Times_ chapter](http://r4ds.had.co.nz/dates-and-times.html) of [_R for Data Science_](http://r4ds.had.co.nz/) provides more details. --- layout: true ## Creating Dates and Times --- ### Today and Now .pull-left.code-80[ The `lubridate` function `today()` returns today's date as a `Date` object: ```r today() ## [1] "2022-04-07" ``` {{content}} ] -- ```r class(today()) ## [1] "Date" ``` {{content}} -- The `lubridate` function `now()` returns the current date-time as a `POSIXct` object: ```r now() ## [1] "2022-04-07 14:16:54 CDT" ``` {{content}} -- ```r class(now()) ## [1] "POSIXct" "POSIXt" ``` -- .pull-right.code-80[ The printed representation follows the international standard for the representation of dates and times ([ISO8601](https://en.wikipedia.org/wiki/ISO_8601)). {{content}} ] -- Date and date-time objects can be used with addition and subtraction: {{content}} -- ```r now() + 3600 ## one hour from now ## [1] "2022-04-07 15:16:54 CDT" ``` {{content}} -- ```r today() - 7 ## one week ago ## [1] "2022-03-31" ``` --- ### Parsing Dates and Times From Strings .pull-left.code-80[ Some common date formats: ```r d1 <- "2022-04-16" d2 <- "April 16, 2022" d3 <- "16 April 2022" d4 <- "16 April 22" ``` ] -- .pull-right.code-80[ These can be decoded by the functions `ymd()`, `mdy()`, and `dmy()`: {{content}} ] -- ```r ymd(d1) ## [1] "2022-04-16" ``` {{content}} -- ```r mdy(d2) ## [1] "2022-04-16" ``` {{content}} -- ```r dmy(d3) ## [1] "2022-04-16" ``` {{content}} -- ```r dmy(d4) ## [1] "2022-04-16" ``` --- .pull-left.code-80[ By default, these functions use the current _locale_ settings for interpreting month names or abbreviations. {{content}} ] -- ```r Sys.getlocale("LC_TIME") ## [1] "en_US.UTF-8" ``` {{content}} -- If you need to parse a French date you might use ```r dmy("16 Avril, 2022", locale = "fr_FR.UTF-8") ## [1] "2022-04-16" ``` -- .pull-right.code-80[ Date-times can be decoded with functions like `mdy_hm`: ```r mdy_hm("April 16, 2022, 6:15 PM") ## [1] "2022-04-16 18:15:00 UTC" ``` {{content}} ] -- or ```r mdy_hms("April 16, 2022, 6:15:08 PM") ## [1] "2022-04-16 18:15:08 UTC" ``` {{content}} -- By default these assume the time is specified in the UTC time zone. --- .pull-left.code-80[ ### Creating Dates and Times from Components Dates can be created from year, month, and day by `make_date()`: {{content}} ] -- ```r make_date(2022, 4, 16) ## [1] "2022-04-16" ``` {{content}} -- Creating a `date` variable from the `year`, `month`, and `day` variables in the New York City `flights` table: ```r library(nycflights13) fl <- mutate(flights, date = make_date(year, month, day)) ``` {{content}} -- `ggplot` and other graphics systems know how to make useful axis labels for dates: -- .pull-right[ .hide-code[ ```r ggplot(count(fl, date)) + geom_line(aes(x = date, y = n)) ``` <img src="datetime_files/figure-html/unnamed-chunk-19-1.png" style="display: block; margin: auto;" /> ] {{content}} ] -- Weekday/weekend differenes are clearly visible. --- .pull-left.code-80[ Date-times can be created from year, month, day, hour, minute, and second using `make_datetime()`: {{content}} ] -- ```r make_datetime(2022, 4, 16, 18, 15) ## [1] "2022-04-16 18:15:00 UTC" ``` {{content}} -- An attempt to recreate the `time_hour` variable in the flights table: ```r fl <- mutate(fl, th = make_datetime(year, month, day, hour)) ``` {{content}} -- This does not quite re-create the `time_hour` variable: ```r identical(fl$th, fl$time_hour) ## [1] FALSE ``` -- .pull-right.code-80[ ```r fl$th[1] ## [1] "2013-01-01 05:00:00 UTC" fl$time_hour[1] ## [1] "2013-01-01 05:00:00 EST" ``` {{content}} ] -- By default, `make_datetime()` assumes the time points it is given are in UTC. {{content}} -- The `time_hour` variable is using local (eastern US) time. {{content}} -- We will look at time zones more [below](#time-zones). --- layout: true ## Date and Time Components --- Components of dates and date-times can be extracted with: * `year()`, `month()`, `day()`, `hour()`, `minute()`, `second()` * `yday()` -- day of the year * `mday()` -- day of the month, same as `day` * `wday()` -- day of the week -- By default, `wday()` returns an integer: .code-80[ ```r wday(today()) ## [1] 5 ``` ] -- But it can also return a label: .code-80[ ```r wday(today(), label = TRUE) ## [1] Thu ## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat ``` ] .code-80[ ```r wday(today(), label = TRUE, abbr = FALSE) ## [1] Thursday ## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday ``` ] --- Weekday names and abbreviations are obviously locale-specific, and you can specify an alternative to the default current locale: -- ```r wday(today(), label = TRUE, abbr = FALSE, locale = "de_DE.UTF-8") ## [1] Donnerstag ## 7 Levels: Sonntag < Montag < Dienstag < Mittwoch < Donnerstag < ... < Samstag ``` -- ```r wday(today(), label = TRUE, locale = "de_DE.UTF-8") ## [1] Do ## Levels: So < Mo < Di < Mi < Do < Fr < Sa ``` -- Even the integer value can be tricky: -- * In the US, Canada, Japan the first day of the week is Sunday. -- * In Germany, France, and the ISO8601 standard the first day of the week is Monday. -- `wday()` can be asked to use a different first day, and a global default can be set. --- .pull-left.code-80[ Using `wday()` and the `date` variable we can look at the distribution of the number of flights by day of the week: {{content}} ] -- ```r ggplot(fl, aes(x = wday(date, label = TRUE))) + geom_bar(fill = "deepskyblue3") ``` -- .pull-right[ <img src="datetime_files/figure-html/flights-wday-1.png" style="display: block; margin: auto;" /> {{content}} ] -- There were substantially fewer flights on Saturdays but only slightly fewer flights on Sundays. --- layout: false ## Rounding .pull-left[ `floor_date()`, `round_date()`, and `ceiling_date()` can be used to round to a particular unit; the most useful are `week` and `quarter`. {{content}} ] -- Flights by week: <img src="datetime_files/figure-html/unnamed-chunk-29-1.png" style="display: block; margin: auto;" /> -- .pull-right[ The first and last weeks were incomplete: ```r as.character(wday(ymd("2013-01-01"), label = TRUE, abbr = FALSE)) ## [1] "Tuesday" ``` ```r as.character(wday(ymd("2013-12-31"), label = TRUE, abbr = FALSE)) ## [1] "Tuesday" ``` ] --- layout: true ## Time Spans --- Subtracting dates or date-times produces `difftime` objects: -- ```r now() - as_datetime(today()) ## Time difference of 19.28264 hours ``` -- ```r today() - ymd("2022-01-01") ## Time difference of 96 days ``` -- Working with different units can be awkward; `lubridate` provides _durations_, which always work in seconds: -- ```r as.duration(now() - as_datetime(today())) ## [1] "69417.5150971413s (~19.28 hours)" ``` -- ```r as.duration(today() - ymd("2022-01-01")) ## [1] "8294400s (~13.71 weeks)" ``` --- Durations can be created with `dyears()`, `ddays()`, `dweeks()`, etc.: -- ```r dyears(1) ## [1] "31557600s (~1 years)" ``` -- ```r ddays(1) ## [1] "86400s (~1 days)" ``` -- Durations can be added to a date or date-time object and can be multiplied by a number: -- .pull-left[ ```r today() ## [1] "2022-04-07" ``` {{content}} ] -- ```r today() + ddays(2) ## [1] "2022-04-09" ``` {{content}} -- ```r today() + 2 * ddays(1) ## [1] "2022-04-09" ``` -- .pull-right[ ```r (n1 <- now()) ## [1] "2022-04-07 14:16:57 CDT" ``` {{content}} ] -- ```r n1 + dminutes(3) ## [1] "2022-04-07 14:19:57 CDT" ``` --- Duations represent an exact number of seconds, which can lead to surprises when DST is involved. -- In 2022 the switch to DST happened in the US on March 13: -- ```r ymd_hm("2022-03-12 23:02", tz = "America/Chicago") + ddays(1) ## [1] "2022-03-14 00:02:00 CDT" ``` -- _Periods_ are an alternative that may work more intuitively. -- Periods are constructed with `years()`, `months()`, `days()`, etc: -- ```r ymd_hm("2022-03-12 23:02", tz = "America/Chicago") + days(1) ## [1] "2022-03-13 23:02:00 CDT" ``` --- layout: true ## Time Zones --- name: time-zones Date-time objects specify a point in time relative to second zero, minute zero, hour zero, on January 1, 1970 in [Coordinated Universal Time (UTC)](https://www.timeanddate.com/time/aboututc.html). -- Date-time objects can have a time zone associated with them that affects how they are printed. -- `now()` returns a date-time object with the time zone set as the local time zone of the computer. ```r now() ## [1] "2022-04-07 14:16:57 CDT" ``` -- Time zones are complex, they can change on a regular basis (DST) or as a result of politics. -- When a date-time object is created from components, by default it is given the UTC time zone. --- To create a point in time based on local time information, such as 10 AM on April 16, 2022, in Iowa City, a time zone for interpreting the local time needs to be specified. -- The short notations like CDT are not adequate for this: Both the US and Australia have EST, which are quite different. -- R uses the [_Internet Assigned Numbers Authority_ (IANA)](https://www.iana.org/time-zones) naming convention and data base. -- The local time zone is: ```r Sys.timezone() ## [1] "America/Chicago" ``` -- The time point 10:00:00 AM on April 16, 2022 in Iowa City can be specified as ```r (tm <- make_datetime(2022, 4, 16, 10, tz = "America/Chicago")) ## [1] "2022-04-16 10:00:00 CDT" ``` --- Time zones of date-time objects can be changed in two ways: -- * `with_tz` keeps the instant in time and changes the time zone used for display. -- * `force_tz` changes the instant in time; use this if the time zone is incorrectly specified but the clock time is correct. -- The available time zone specifications are contained in `OlsonNames`: ```r head(OlsonNames()) ## [1] "Africa/Abidjan" "Africa/Accra" "Africa/Addis_Ababa" ## [4] "Africa/Algiers" "Africa/Asmara" "Africa/Asmera" ``` --- The instant `tm` in some other time zones: -- .pull-left[ ```r with_tz(tm, tz = "UTC") ## [1] "2022-04-16 15:00:00 UTC" ``` {{content}} ] -- ```r with_tz(tm, tz = "America/New_York") ## [1] "2022-04-16 11:00:00 EDT" ``` {{content}} -- ```r with_tz(tm, tz = "Asia/Shanghai") ## [1] "2022-04-16 23:00:00 CST" ``` {{content}} -- ```r with_tz(tm, tz = "Pacific/Auckland") ## [1] "2022-04-17 03:00:00 NZST" ``` -- .pull-right[ ```r with_tz(tm, tz = "Asia/Kolkata") ## [1] "2022-04-16 20:30:00 IST" ``` {{content}} ] -- ```r with_tz(tm, tz = "Canada/Newfoundland") ## [1] "2022-04-16 12:30:00 NDT" ``` {{content}} -- ```r with_tz(tm, tz = "Asia/Katmandu") ## [1] "2022-04-16 20:45:00 +0545" ``` --- Some more examples: ```r ## All offsets that are not a full hour: get_offset <- function(z) abs(minute(with_tz(tm, tz = z)) - minute(tm)) offsets <- data.frame(zone = OlsonNames()) %>% mutate(offset = sapply(zone, get_offset)) %>% arrange(offset) filter(offsets, offset != 0) ## Offsets for Australia: filter(offsets, grepl("Australia", zone)) ``` --- If we create the `th` variable for the flights data as ```r fl <- mutate(flights, th = make_datetime(year, month, day, hour, tz = "America/New_York")) ``` -- then the result matches the `date_time` variable: ```r identical(fl$th, fl$time_hour) ## [1] TRUE ``` --- The `time_hour` variable in the `weather` table reflects actual points in time and, together with `origin`, can serve as a primary key: ```r filter(count(weather, origin, time_hour), n > 1) ## # A tibble: 0 × 3 ## # … with 3 variables: origin <chr>, time_hour <dttm>, n <int> ``` -- The `month`, `day`, `hour` variables are confused by the time change. -- .pull-left.code-80[ In November there is a repeat: ```r count(weather, origin, month, day, hour) %>% filter(n > 1) ## # A tibble: 3 × 5 ## origin month day hour n ## <chr> <int> <int> <int> <int> ## 1 EWR 11 3 1 2 ## 2 JFK 11 3 1 2 ## 3 LGA 11 3 1 2 ``` ] -- .pull-right.code-80[ and there is a missing hour in March: ```r select(weather, origin, month, day, hour) %>% filter(origin == "EWR", month == 3, day == 10, hour <= 3) ## # A tibble: 3 × 4 ## origin month day hour ## <chr> <int> <int> <int> ## 1 EWR 3 10 0 ## 2 EWR 3 10 1 ## 3 EWR 3 10 3 ``` ] --- layout: false ## Things to Look Out For For dates: -- * Language used for months and weekdays, and their abbreviations. -- * Ambiguous numerical conventions like 4/11/2022: is this April 11 or November 4? -- * Day 2 of the week: is this Monday or Tuesday? -- * For historical data, what calendar is being used? (The _October Revolution_ happened on November 6/7, 1917 by the current Gregorian calendar; October 24/25 by the Julian calendar Russia was still using.) -- For date-times -- * All of the above. -- * Daylight saving time. -- * Time zones. <!-- Locale stuff may fail on some systems (on Ubuntu may need to use locale-gen --> <!-- library(lubridate) flights <- mutate(flights, th = make_datetime(year, month, day, hour, tz = "America/New_York")) weather <- mutate(weather, th = make_datetime(year, month, day, hour, tz = "UTC")) fl0 <- select(filter(flights, origin == "LGA"), dep_time, time_hour, th) w0 <- select(filter(weather, origin == "LGA")[-(1 : 9),], time_hour, th, temp) fl1 <- left_join(fl0, select(w0, -th), "time_hour") fl2 <- left_join(fl0, select(w0, -time_hour), "th") --> --- layout: false ## Reading Chapter [_Dates and Times_](http://r4ds.had.co.nz/dates-and-times.html) in [_R for Data Science_](http://r4ds.had.co.nz/). --- layout: true ## Exercises --- 1) Using the NYC flights data, how many flights were there on Saturdays from Newark (EWR) to Cicago O'Hare (ORD) in 2013? * a. 413 * b. 522 * c. 601 * d. 733 --- 2) What day of the week will July 4, 2030, fall on? * a. Monday * b. Wednesday * d. Thursday * c. Saturday
//adapted from Emi Tanaka's gist at //https://gist.github.com/emitanaka/eaa258bb8471c041797ff377704c8505