Skip to content

Conversation

jamesbraza
Copy link
Collaborator

See PR title

@jamesbraza jamesbraza self-assigned this Jul 14, 2025
@jamesbraza jamesbraza added the bug Something isn't working label Jul 14, 2025
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jul 14, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR ensures that DocDetails.pages no longer retain leading or trailing whitespace by updating the pages-cleaning logic and adding corresponding test coverage.

  • Added .strip() to the pages-cleaning step in merge_bibtex_entries.
  • Extended test_clients.py with a datetime import and a test case reflecting pages without surrounding whitespace.
  • Introduced a new VCR cassette recording API responses where the pages field contains whitespace.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
tests/test_clients.py Imported datetime and added a test entry to verify cleaned pages.
tests/cassettes/test_doi_search[paper_attributes3].yaml New cassette with responses where pages includes newline whitespace.
paperqa/types.py Appended .strip() to the pages-cleaning pipeline.
Comments suppressed due to low confidence (1)

paperqa/types.py:618

  • Add unit tests specifically for misc_string_cleaning to cover cases such as leading/trailing newlines, tabs, and multiple internal spaces in the pages field. This will ensure the new .strip() logic behaves correctly across varied whitespace patterns.
    def misc_string_cleaning(data: dict[str, Any]) -> dict[str, Any]:

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 14, 2025
Copy link
Collaborator

@mskarlin mskarlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jamesbraza

@jamesbraza jamesbraza merged commit 85a4981 into main Jul 15, 2025
5 checks passed
@jamesbraza jamesbraza deleted the fixing-pages branch July 15, 2025 00:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lgtm This PR has been approved by a maintainer size:S This PR changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants