Skip to content

Conversation

maykcaldas
Copy link
Collaborator

@maykcaldas maykcaldas commented Apr 1, 2025

The LLM sometimes broke the pattern by using citation styles other than the citation key, which caused problems on the platform downstream. This PR gives more instructions on how the citation should be included in the answer.

A few examples of wrong citations:

1 - Task: What is the molecule known to have the greatest solubility in water?
Answer: (...) Similarly, studies on aqueous solubility in drug discovery (e.g., Ishikawa and Hashimoto, 2011) discuss extremely low solubility values for particular organic compounds, but these systems are not directly comparable to the gas-phase solubility measurements in Hildebrand’s work. (...)
Issue: Ishikawa and Hashimoto, 2011 is not a citation key

2 - Task: What is the molecule known to have the greatest solubility in water?
Answer: (...) Therefore, based solely on the provided context, ammonia (NH₃) is the molecule known to have the greatest solubility in water (hildebrand1916solubility pages 14–17).
Issue: * Hildebrand’s 1916 paper (pages 14–17) * and hildebrand1916solubility should be this paper. It should have used "hildebrand1916solubility. pages 17-19". The difference is the period.

We need to use whatever the citation key is and avoid the LLM from changing it

@Copilot Copilot AI review requested due to automatic review settings April 1, 2025 00:55
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR clarifies the required citation format by preventing the use of non-citation-key formats in answers.

  • Updates the test suite to reject answers containing a citation with a name and year
  • Enhances the prompt instructions to specify allowed citation formats and provide examples

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
tests/test_agents.py Adds a test assertion to ensure answers contain valid citation keys
paperqa/prompts.py Updates prompt instructions to enforce strict citation key formats

@dosubot dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Apr 1, 2025
@maykcaldas maykcaldas marked this pull request as draft April 1, 2025 01:24
@maykcaldas maykcaldas self-assigned this Apr 1, 2025
Copy link
Collaborator

@jamesbraza jamesbraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR description, can you provide the output that made you discover this issue, for posterity

Copy link
Collaborator

@jamesbraza jamesbraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Apr 1, 2025
@@ -973,6 +973,30 @@ async def test_clinical_tool_usage(agent_test_settings) -> None:
), "No clinical trials were put into contexts"


@pytest.mark.asyncio
async def test_citation_formatting(agent_test_settings):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of collapsing this test into a subtest of test_agent_types, and it would actually:

  • Decrease CI runtime (since we're already running the agent in those tests)
  • Increase the coverage of the assertions, since more agents types are used there

Comment on lines 986 to 988
assert not re.search(
name_year_regex, response.session.answer
), "Answer contains citation with name and year instead of citation key"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion fails on current main branch, nice work here!! 🚀

@maykcaldas maykcaldas marked this pull request as ready for review April 1, 2025 19:19
@maykcaldas maykcaldas enabled auto-merge (squash) April 1, 2025 21:10
@maykcaldas maykcaldas merged commit f9f1e36 into main Apr 1, 2025
5 checks passed
@maykcaldas maykcaldas deleted the fix-citation branch April 1, 2025 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants