Skip to content

fix: minor improvements to time zone parsing #400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 30, 2025
Merged

Conversation

toph-allen
Copy link
Collaborator

@toph-allen toph-allen commented Apr 30, 2025

Intent

#291 fixed a bug in time zone parsing. This simplifies the approach in that PR somewhat, with the end-user benefit of displaying time zones in the local time rather than UTC. No accompanying issue.

Approach

Instead of creating a character vector of format strings based on the format of the time zone, we do an additional regex pass on the incoming timestamps. This lets us use a single format string to parse them.

The new substitution means that UTC time stamps are now explicitly given +0000 offsets, which means that the tz used in as.POSIXct can be the user's local time zone. Previously, they were parsed treating the trailing Z as a literal, which meant that to R, they did not have a time zone, so all time stamps had to be parsed as UTC. (This is hard to explain!)

The comprehensive tests we added last time we touched this code were modified slightly, and do a great job at validating the new behavior. :)

Checklist

  • Does this change update NEWS.md (referencing the connected issue if necessary)?
  • Does this change need documentation? Have you run devtools::document()?

Copy link
Collaborator

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code simplification is nice, just a few questions on there for you

x <- gsub("([+-]\\d\\d):(\\d\\d)$", "\\1\\2", x)
x <- gsub("Z$", "+0000", x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand: these two regexes are not going to both be operating on a timestamp because either it ends in +00:00 or it ends in Z but it can't end in both. Is Connect inconsistent in how it emits timestamps such that we would need both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect uses Go's built-in RFC3339 formatter (https://pkg.go.dev/time).

In RFC3339, UTC can be denoted either by +00:00 or Z at the end (https://datatracker.ietf.org/doc/html/rfc3339#section-4.2).

This set of regexes allows us to handle both. (Before this PR we handled by in the purrr::map_vec() starting on old line 177.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the spec may allow that, but does Connect actually do it differently in different places?

This change is defensive against it, so it's fine, I would just be concerned if Connect were internally inconsistent, that seems worth solving so any consumer of the API doesn't have to worry with this fussiness.

- `get_groups()` now paginates through all results when a `prefix` is provided, if the Connect server API version supports pagination. (#328)
- `get_groups()` now paginates through all results when a `prefix` is provided,
if the Connect server API version supports pagination. (#328)
- Timestamps from the Connect server are now displayed in your local time zone,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a gut check, it is desirable to see things in local timestamps and not UTC?

Copy link
Collaborator Author

@toph-allen toph-allen Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. If I'm in NYC and I say to Connect, "Give me all the records from the Metrics firehose from 10 AM today to 5 PM today", and connectapi gives me back a data frame with its time zone column in UTC, it's going to be (1) additional mental effort for me to read the times that are returned and (2) if I wanted to group results by day, it would group by the UTC date, which… isn't what I would wanna do. Local time is a distinct improvement IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not arguing that this is what we should do (and especially not in this PR as it is), but there's another option that folks might believe these times are: the server's timezone. And in your example there, that is frequently most relevant (e.g. for when a report was rendered).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, that could actually be the most intuitive option. This PR has merged, but — would be open to making that change.

Copy link
Collaborator

@nealrichardson nealrichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this. I'm still curious if there's an underlying quirk in Connect that needs fixing, but we don't have to block this PR for that.

@toph-allen
Copy link
Collaborator Author

toph-allen commented Apr 30, 2025

Ok, thanks! My take is that the quirk is in the RFC3339 standard — Connect uses Go's default implementation of that standard for serializing dates.

That said, maybe there's a better approach — I'm not totally discounting that possibility.

@toph-allen toph-allen merged commit 2cea9c0 into main Apr 30, 2025
19 checks passed
@toph-allen toph-allen deleted the toph/better-timezones branch April 30, 2025 21:13
@@ -154,35 +154,21 @@ coerce_datetime <- function(x, to, ...) {
# - "2020-01-01T00:02:03-01:00"
# nolint end
parse_connect_rfc3339 <- function(x) {
# Convert any timestamps with offsets to a format recognized by `strptime`.
# Convert timestamps with offsets to a format recognized by `strptime`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required, but might be nice to have an example of that the timestamps look like coming in and what they look like after these gsubs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants