-
Notifications
You must be signed in to change notification settings - Fork 26
fix: minor improvements to time zone parsing #400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code simplification is nice, just a few questions on there for you
x <- gsub("([+-]\\d\\d):(\\d\\d)$", "\\1\\2", x) | ||
x <- gsub("Z$", "+0000", x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand: these two regexes are not going to both be operating on a timestamp because either it ends in +00:00
or it ends in Z
but it can't end in both. Is Connect inconsistent in how it emits timestamps such that we would need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connect uses Go's built-in RFC3339 formatter (https://pkg.go.dev/time).
In RFC3339, UTC can be denoted either by +00:00
or Z
at the end (https://datatracker.ietf.org/doc/html/rfc3339#section-4.2).
This set of regexes allows us to handle both. (Before this PR we handled by in the purrr::map_vec()
starting on old line 177.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the spec may allow that, but does Connect actually do it differently in different places?
This change is defensive against it, so it's fine, I would just be concerned if Connect were internally inconsistent, that seems worth solving so any consumer of the API doesn't have to worry with this fussiness.
- `get_groups()` now paginates through all results when a `prefix` is provided, if the Connect server API version supports pagination. (#328) | ||
- `get_groups()` now paginates through all results when a `prefix` is provided, | ||
if the Connect server API version supports pagination. (#328) | ||
- Timestamps from the Connect server are now displayed in your local time zone, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a gut check, it is desirable to see things in local timestamps and not UTC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, definitely. If I'm in NYC and I say to Connect, "Give me all the records from the Metrics firehose from 10 AM today to 5 PM today", and connectapi
gives me back a data frame with its time zone column in UTC, it's going to be (1) additional mental effort for me to read the times that are returned and (2) if I wanted to group results by day, it would group by the UTC date, which… isn't what I would wanna do. Local time is a distinct improvement IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not arguing that this is what we should do (and especially not in this PR as it is), but there's another option that folks might believe these times are: the server's timezone. And in your example there, that is frequently most relevant (e.g. for when a report was rendered).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right, that could actually be the most intuitive option. This PR has merged, but — would be open to making that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. I'm still curious if there's an underlying quirk in Connect that needs fixing, but we don't have to block this PR for that.
Ok, thanks! My take is that the quirk is in the RFC3339 standard — Connect uses Go's default implementation of that standard for serializing dates. That said, maybe there's a better approach — I'm not totally discounting that possibility. |
@@ -154,35 +154,21 @@ coerce_datetime <- function(x, to, ...) { | |||
# - "2020-01-01T00:02:03-01:00" | |||
# nolint end | |||
parse_connect_rfc3339 <- function(x) { | |||
# Convert any timestamps with offsets to a format recognized by `strptime`. | |||
# Convert timestamps with offsets to a format recognized by `strptime`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required, but might be nice to have an example of that the timestamps look like coming in and what they look like after these gsubs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion.
Intent
#291 fixed a bug in time zone parsing. This simplifies the approach in that PR somewhat, with the end-user benefit of displaying time zones in the local time rather than UTC. No accompanying issue.
Approach
Instead of creating a character vector of format strings based on the format of the time zone, we do an additional regex pass on the incoming timestamps. This lets us use a single format string to parse them.
The new substitution means that UTC time stamps are now explicitly given
+0000
offsets, which means that thetz
used inas.POSIXct
can be the user's local time zone. Previously, they were parsed treating the trailingZ
as a literal, which meant that to R, they did not have a time zone, so all time stamps had to be parsed as UTC. (This is hard to explain!)The comprehensive tests we added last time we touched this code were modified slightly, and do a great job at validating the new behavior. :)
Checklist
NEWS.md
(referencing the connected issue if necessary)?devtools::document()
?