Skip to content

fix: minor improvements to time zone parsing #400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

## Enhancements and fixes

- `get_groups()` now paginates through all results when a `prefix` is provided, if the Connect server API version supports pagination. (#328)
- `get_groups()` now paginates through all results when a `prefix` is provided,
if the Connect server API version supports pagination. (#328)
- Timestamps from the Connect server are now displayed in your local time zone,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a gut check, it is desirable to see things in local timestamps and not UTC?

Copy link
Collaborator Author

@toph-allen toph-allen Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. If I'm in NYC and I say to Connect, "Give me all the records from the Metrics firehose from 10 AM today to 5 PM today", and connectapi gives me back a data frame with its time zone column in UTC, it's going to be (1) additional mental effort for me to read the times that are returned and (2) if I wanted to group results by day, it would group by the UTC date, which… isn't what I would wanna do. Local time is a distinct improvement IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not arguing that this is what we should do (and especially not in this PR as it is), but there's another option that folks might believe these times are: the server's timezone. And in your example there, that is frequently most relevant (e.g. for when a report was rendered).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right, that could actually be the most intuitive option. This PR has merged, but — would be open to making that change.

rather than in UTC. (#400)

# connectapi 0.7.0

Expand Down
34 changes: 10 additions & 24 deletions R/parse.R
Original file line number Diff line number Diff line change
Expand Up @@ -154,35 +154,21 @@ coerce_datetime <- function(x, to, ...) {
# - "2020-01-01T00:02:03-01:00"
# nolint end
parse_connect_rfc3339 <- function(x) {
# Convert any timestamps with offsets to a format recognized by `strptime`.
# Convert timestamps with offsets to a format recognized by `strptime`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required, but might be nice to have an example of that the timestamps look like coming in and what they look like after these gsubs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion.

x <- gsub("([+-]\\d\\d):(\\d\\d)$", "\\1\\2", x)
x <- gsub("Z$", "+0000", x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand: these two regexes are not going to both be operating on a timestamp because either it ends in +00:00 or it ends in Z but it can't end in both. Is Connect inconsistent in how it emits timestamps such that we would need both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connect uses Go's built-in RFC3339 formatter (https://pkg.go.dev/time).

In RFC3339, UTC can be denoted either by +00:00 or Z at the end (https://datatracker.ietf.org/doc/html/rfc3339#section-4.2).

This set of regexes allows us to handle both. (Before this PR we handled by in the purrr::map_vec() starting on old line 177.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, the spec may allow that, but does Connect actually do it differently in different places?

This change is defensive against it, so it's fine, I would just be concerned if Connect were internally inconsistent, that seems worth solving so any consumer of the API doesn't have to worry with this fussiness.


# `purrr::map2_vec()` converts to POSIXct automatically, but we need
# `as.POSIXct()` in there to account vectors of length 1, which it seems are
# not converted.
#
# Parse with an inner call to `strptime()`; convert the resulting `POSIXlt`
# object to `POSIXct`.
# Parse with an inner call to `strptime()`, which returns a POSIXlt object,
# and convert that to `POSIXct`.
#
# We must specify `tz` in the inner call to correctly compute date math.
# Specifying `tz` when parsing just changes the time zone without doing any
# date math!
# Specifying `tz` when in the outer call just changes the time zone without
# doing any date math!
#
# > xlt
# [1] "2024-08-29 16:36:33 EDT"
# > tzone(xlt)
# [1] "America/New_York"
# > as.POSIXct(xlt, tz = "UTC")
# [1] "2024-08-29 16:36:33 UTC"
purrr::map_vec(x, function(.x) {
# Times with and without offsets require different formats.
format_string <- ifelse(
grepl("Z$", .x),
"%Y-%m-%dT%H:%M:%OSZ",
"%Y-%m-%dT%H:%M:%OS%z"
)
as.POSIXct(strptime(.x, format = format_string, tz = "UTC"))
})
# > xlt [1] "2024-08-29 16:36:33 EDT" tzone(xlt) [1] "America/New_York"
# as.POSIXct(xlt, tz = "UTC") [1] "2024-08-29 16:36:33 UTC"
format_string <- "%Y-%m-%dT%H:%M:%OS%z"
as.POSIXct(x, format = format_string, tz = Sys.timezone())
}

vec_cast.POSIXct.double <- # nolint: object_name_linter
Expand Down
37 changes: 25 additions & 12 deletions tests/testthat/test-parse.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,6 @@ test_that("coerce_datetime fills the void", {
})

test_that("parse_connect_rfc3339() parses timestamps with offsets as expected", {
withr::defer(Sys.setenv(TZ = Sys.getenv("TZ")))

x_mixed <- c(
"2023-08-22T14:13:14Z",
"2020-01-01T01:02:03Z",
Expand All @@ -75,32 +73,47 @@ test_that("parse_connect_rfc3339() parses timestamps with offsets as expected",

single_offset <- "2023-08-22T15:13:14+01:00"

withr::local_envvar(TZ = "America/New_York")
expected <- as.POSIXct(strptime(
c(
"2023-08-22T14:13:14+0000",
"2020-01-01T01:02:03+0000"
),
format = "%Y-%m-%dT%H:%M:%S%z",
tz = "UTC"
tz = Sys.timezone()
))

Sys.setenv(TZ = "America/New_York")
expect_identical(parse_connect_rfc3339(x_mixed), rep(expected, 2))
expect_identical(parse_connect_rfc3339(x_zero_offset), expected)
expect_identical(parse_connect_rfc3339(x_plus_one), expected)
expect_identical(parse_connect_rfc3339(x_minus_one), expected)
expect_identical(parse_connect_rfc3339(single_zero_offset), expected[1])
expect_identical(parse_connect_rfc3339(single_offset), expected[1])

Sys.setenv(TZ = "UTC")
withr::local_envvar(TZ = "UTC")
expected <- as.POSIXct(strptime(
c(
"2023-08-22T14:13:14+0000",
"2020-01-01T01:02:03+0000"
),
format = "%Y-%m-%dT%H:%M:%S%z",
tz = Sys.timezone()
))
expect_identical(parse_connect_rfc3339(x_mixed), rep(expected, 2))
expect_identical(parse_connect_rfc3339(x_zero_offset), expected)
expect_identical(parse_connect_rfc3339(x_plus_one), expected)
expect_identical(parse_connect_rfc3339(x_minus_one), expected)
expect_identical(parse_connect_rfc3339(single_zero_offset), expected[1])
expect_identical(parse_connect_rfc3339(single_offset), expected[1])

Sys.setenv(TZ = "Asia/Tokyo")
withr::local_envvar(TZ = "Asia/Tokyo")
expected <- as.POSIXct(strptime(
c(
"2023-08-22T14:13:14+0000",
"2020-01-01T01:02:03+0000"
),
format = "%Y-%m-%dT%H:%M:%S%z",
tz = Sys.timezone()
))
expect_identical(parse_connect_rfc3339(x_mixed), rep(expected, 2))
expect_identical(parse_connect_rfc3339(x_zero_offset), expected)
expect_identical(parse_connect_rfc3339(x_plus_one), expected)
Expand All @@ -109,7 +122,9 @@ test_that("parse_connect_rfc3339() parses timestamps with offsets as expected",
expect_identical(parse_connect_rfc3339(single_offset), expected[1])
})


test_that("parse_connect_rfc3339() handles fractional seconds", {
withr::local_envvar(TZ = "UTC")
expected <- as.POSIXct(strptime(
c(
"2024-12-06T19:09:29.948016766+0000",
Expand All @@ -125,8 +140,6 @@ test_that("parse_connect_rfc3339() handles fractional seconds", {
})

test_that("make_timestamp produces expected output", {
withr::defer(Sys.setenv(TZ = Sys.getenv("TZ")))

x_mixed <- c(
"2023-08-22T14:13:14Z",
"2020-01-01T01:02:03Z",
Expand Down Expand Up @@ -158,7 +171,7 @@ test_that("make_timestamp produces expected output", {
"2020-01-01T01:02:03Z"
)

Sys.setenv(TZ = "America/New_York")
withr::local_envvar(TZ = "America/New_York")
expect_equal(
make_timestamp(coerce_datetime(x_mixed, NA_datetime_)),
rep(outcome, 2)
Expand All @@ -185,7 +198,7 @@ test_that("make_timestamp produces expected output", {
)
expect_equal(make_timestamp(outcome), outcome)

Sys.setenv(TZ = "UTC")
withr::local_envvar(TZ = "UTC")
expect_equal(
make_timestamp(coerce_datetime(x_mixed, NA_datetime_)),
rep(outcome, 2)
Expand All @@ -212,7 +225,7 @@ test_that("make_timestamp produces expected output", {
)
expect_equal(make_timestamp(outcome), outcome)

Sys.setenv(TZ = "Asia/Tokyo")
withr::local_envvar(TZ = "Asia/Tokyo")
expect_equal(
make_timestamp(coerce_datetime(x_mixed, NA_datetime_)),
rep(outcome, 2)
Expand Down