Skip to content

fix: simplify get_packages() #378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 13, 2025
Merged

Conversation

toph-allen
Copy link
Collaborator

@toph-allen toph-allen commented Mar 11, 2025

Intent

Simplify get_packages(), removing pagination-related parameters.

Fixes #

Approach

  • Internally, use a response limit of 100000 items for the endpoint.
  • Remove page_size and limit from the user-facing function get_packages().
  • To fix tests that broke, alter the page_offset() function to cease iteration when the total number of aggregated items reaches the total number of items that the server will return. (Previously, it would continue to request a new page until the number of new items was 0, but the responses always contain the total number of items to be returned!)

Checklist

  • Does this change update NEWS.md (referencing the connected issue if necessary)?
  • Does this change need documentation? Have you run devtools::document()?

@toph-allen toph-allen changed the title fix: simplify packages pagination fix: simplify get_packages() Mar 11, 2025
@toph-allen toph-allen requested review from jonkeane and tdstein March 11, 2025 15:32
Copy link
Contributor

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this

R/connect.R Outdated
url <- v1_url("packages")
query_params <- list(
name = name,
page_number = page_number,
page_size = page_size
page_size = "1000000"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really be a string?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this should be handled at a different layer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding myself back from trying to do this for all query params in all requests for now

R/connect.R Outdated
@@ -830,12 +830,12 @@ Connect <- R6::R6Class(
#' @param name The package name to filter by.
#' @param page_number The page number.
#' @param page_size Page size, max 500.
packages = function(name = NULL, page_number = 1, page_size = 500) {
packages = function(name = NULL, page_number = 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to having this as an argument if it defaults to something sensible. In case someone really does know what they're doing and wants to fiddle with it that's ok. My complain in prior PRs was that we force folks to deal with this in cases where they don't want to.

But if you would rather hide it entirely, that's fine.

Copy link
Collaborator Author

@toph-allen toph-allen Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really got the impression you wanted to hide it entirely from our previous discussion.

Either way — this is the back-end method which is called by the user-facing function get_packages() which has been drastically simplified. I think given that, it's reasonable to leave limit here, so that if people really want to mess around with it, they can. The main entrypoint for users will be much simpler.

get_packages(client, page_size = 2, limit = 2),
get_packages(client),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was limit = 2 here before? I don't see that being removed, but maybe it was just eaten in ... somewhere?

Regardless, this is one place where having the ability to specify makes testing easier (in a way that encourages good factoring of our own code...). This particular parameter is just passed through so probably NBD to have a test that confirms that, but if there was any sort of processing or whatever it certainly would make something like that easier.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where limit was removed from.


Side note, slightly related: connectapi used to paginate until the API returned empty pages, which required explicitly setting the limit at 2 in the tests. The pagination quirk has now been fixed: paginating until the response is empty is not required because the responses provide the total number of objects to be returned.

@toph-allen toph-allen requested review from jonkeane and tdstein March 12, 2025 19:58

agg_response <- res
agg_length <- length(agg_response)
while (length(res) > 0 && agg_length < limit) {
while (length(res) > 0 && agg_length < limit && agg_length < total_items) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is limit every greater than total_items? Just curious what sparked this change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

limit is often greater than total_items — it's set to Inf by default.

The result of this was that the gating factor that causes connectapi to stop requesting new pages from the API was the API returning zero items — when total_items is provided in each response.

This wasn't actually that bad in effect, but it made mocks more complex (both in code and on disk) because each API request had to have an extra empty page. We could previously work around this by providing limit explicitly in functions in tests, but removing that from user-facing functions is a good prompt to actually address the underlying itch.

Copy link
Collaborator

@tdstein tdstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this. For future reference, it's worth updating the pull request description to reflect changes made during the review process.

@toph-allen
Copy link
Collaborator Author

@tdstein 🫡 Ah yeah, thanks for pointing that out

@toph-allen toph-allen merged commit 67a52e2 into main Mar 13, 2025
19 checks passed
@toph-allen toph-allen deleted the toph/get-packages-pagination-limit branch March 13, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants