-
Notifications
You must be signed in to change notification settings - Fork 26
fix: minor improvements to time zone parsing #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -154,35 +154,21 @@ coerce_datetime <- function(x, to, ...) { | |
# - "2020-01-01T00:02:03-01:00" | ||
# nolint end | ||
parse_connect_rfc3339 <- function(x) { | ||
# Convert any timestamps with offsets to a format recognized by `strptime`. | ||
# Convert timestamps with offsets to a format recognized by `strptime`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not required, but might be nice to have an example of that the timestamps look like coming in and what they look like after these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good suggestion. |
||
x <- gsub("([+-]\\d\\d):(\\d\\d)$", "\\1\\2", x) | ||
x <- gsub("Z$", "+0000", x) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand: these two regexes are not going to both be operating on a timestamp because either it ends in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Connect uses Go's built-in RFC3339 formatter (https://pkg.go.dev/time). In RFC3339, UTC can be denoted either by This set of regexes allows us to handle both. (Before this PR we handled by in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, the spec may allow that, but does Connect actually do it differently in different places? This change is defensive against it, so it's fine, I would just be concerned if Connect were internally inconsistent, that seems worth solving so any consumer of the API doesn't have to worry with this fussiness. |
||
|
||
# `purrr::map2_vec()` converts to POSIXct automatically, but we need | ||
# `as.POSIXct()` in there to account vectors of length 1, which it seems are | ||
# not converted. | ||
# | ||
# Parse with an inner call to `strptime()`; convert the resulting `POSIXlt` | ||
# object to `POSIXct`. | ||
# Parse with an inner call to `strptime()`, which returns a POSIXlt object, | ||
# and convert that to `POSIXct`. | ||
# | ||
# We must specify `tz` in the inner call to correctly compute date math. | ||
# Specifying `tz` when parsing just changes the time zone without doing any | ||
# date math! | ||
# Specifying `tz` when in the outer call just changes the time zone without | ||
# doing any date math! | ||
# | ||
# > xlt | ||
# [1] "2024-08-29 16:36:33 EDT" | ||
# > tzone(xlt) | ||
# [1] "America/New_York" | ||
# > as.POSIXct(xlt, tz = "UTC") | ||
# [1] "2024-08-29 16:36:33 UTC" | ||
purrr::map_vec(x, function(.x) { | ||
# Times with and without offsets require different formats. | ||
format_string <- ifelse( | ||
grepl("Z$", .x), | ||
"%Y-%m-%dT%H:%M:%OSZ", | ||
"%Y-%m-%dT%H:%M:%OS%z" | ||
) | ||
as.POSIXct(strptime(.x, format = format_string, tz = "UTC")) | ||
}) | ||
# > xlt [1] "2024-08-29 16:36:33 EDT" tzone(xlt) [1] "America/New_York" | ||
# as.POSIXct(xlt, tz = "UTC") [1] "2024-08-29 16:36:33 UTC" | ||
format_string <- "%Y-%m-%dT%H:%M:%OS%z" | ||
as.POSIXct(x, format = format_string, tz = Sys.timezone()) | ||
} | ||
|
||
vec_cast.POSIXct.double <- # nolint: object_name_linter | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a gut check, it is desirable to see things in local timestamps and not UTC?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, definitely. If I'm in NYC and I say to Connect, "Give me all the records from the Metrics firehose from 10 AM today to 5 PM today", and
connectapi
gives me back a data frame with its time zone column in UTC, it's going to be (1) additional mental effort for me to read the times that are returned and (2) if I wanted to group results by day, it would group by the UTC date, which… isn't what I would wanna do. Local time is a distinct improvement IMHO.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not arguing that this is what we should do (and especially not in this PR as it is), but there's another option that folks might believe these times are: the server's timezone. And in your example there, that is frequently most relevant (e.g. for when a report was rendered).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, that could actually be the most intuitive option. This PR has merged, but — would be open to making that change.