Skip to content

Conversation

ColmBhandal
Copy link

@ColmBhandal ColmBhandal commented Jan 24, 2022

Generated this with cffinit & entered whatever data I could find on GitHub for the authors
Some data may need to be modified/augmented

  • Closes what is a citation format file? #1322
  • I am familiar with the contributing guidelines
  • (Maintainer please advise) Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • Pull request is nearly complete and ready for detailed review.
  • (Maintainer please advise) Update the README.md file, specifically the "Citing" section, to refer potential citers now to the new "cite this repository" button
  • Maintainer: Appropriate GitHub Labels and Milestone are assigned to the Pull Request and linked Issue.

As far as I'm aware, a CFF file is a file that you add to your GitHub repo which allows others to cite your repo correctly, and gives them guidelines on how you'd like them to do that. The CFF file is both human and machine readable. For humans, think of it as the analogue to the metadata on a research paper which allows others to cite that research paper- many code repositories don't have such metadata. A CFF aims to solve that.

Because it's machine readable, the CFF is / can be parsed by downstream tooling e.g. GitHub seems to parse it & render the citation info on the repo page, as well as giving others a Bibtex format that they can use to cite the repo in regular research papers.

For more info: https://citation-file-format.github.io/#:~:text=cff%20files%20are%20plain%20text,to%20correctly%20cite%20their%20software.&text=cff%20files%20is%20the%20Citation%20File%20Format%20(CFF)..

Generated this with cffinit & entered whatever data I could find on GitHub for the authors
Some data may need to be modified/augmented
@ColmBhandal
Copy link
Author

I don't think this task is relevant to this PR, but leaving it in to be on the safe side: "Adds description and name entries in the appropriate "what's new" file"

@ColmBhandal
Copy link
Author

ColmBhandal commented Jan 24, 2022

I just confirmed that indeed GitHub does parse this CFF file and provide a new "Cite this repository" button:

image

Bearing this in mind, I think the README.md will need to change, specifically the "citation" section. I'll add a task for this so it's not forgotten. Note: I won't start working on this yet until there is some discussion/guidance from maintainers.

@cwhanse
Copy link
Member

cwhanse commented Jan 24, 2022

I'm in favor of adding this citation file. If we agree we should open an issue to edit the README, as @ColmBhandal notes.

@mikofski
Copy link
Member

Is APA the only citation style? We use IEEE. It looks more like this:

W. F. Holmgren, C. W. Hansen, and M. A. Mikofski, “pvlib python: a python package for modeling solar energy systems,” Journal of Open Source Software, vol. 3, no. 29, p. 884, 2018. [Online]. Available: http://doi.org/10.21105/joss.00884

anyway not a blocker to me, but unfortunate if that's a limit

@mikofski
Copy link
Member

See: citation-file-format/ruby-cff#64

Sadly, it's a limitation of GitHub. Presently the only support APA.

@ColmBhandal
Copy link
Author

Not sure it this counts as a "style" but there is a "BibTeX" button next to the APA button, that produces this:

@software{Hansen_PVLib,author = {Hansen, Cliff and Anderson, Kevin and Mikofski, Mark and Holmgren, Will},title = {{PVLib}}}

@wholmgren
Copy link
Member

wholmgren commented Jan 24, 2022

I'm not so sure this is a good idea. pvlib python is the product of many more people than the 4 listed in this cff. I think the best practice is to cite the DOI generated by zenodo for the version of pvlib used in a work. We already have the JOSS paper for a too-narrow citation (I'm now a bit ambivalent about the JOSS paper since it's only myself, Mark, and Cliff).

Bigger picture question: what are the reasons to cite pvlib python? Give credit to contributors? Give credit to maintainers? Adds credibility to a user's work? Quantifies use of the software which is important for e.g. DOE funding?

I feel like the currently proposed cff only manages to advance one of those goals, albeit an important one: give @kanderso-nrel credit.

Seems like other maintainers feel differently, and if they still feel that way then I'm fine with moving forward. (Just caught up on the new discussion in the issue - maybe better to respond there?)

In any case, thanks @ColmBhandal for the proposal!

@cwhanse
Copy link
Member

cwhanse commented Jan 24, 2022

what are the reasons to cite pvlib python?

IMO, to point readers of a paper to more information about pvlib. It hadn't occurred to me to also provide a CFF for each release. I was only thinking how to encourage authors to cite JOSS instead of one of the conference papers.

@mikofski
Copy link
Member

I think the cff is used by zenodo to create a citation based on the release of the software - so I believe the goal is not to refer to the JOSS/research paper, but to the version of pvlib used in the work that cites it. Ideally that zenodo citation would have all of the contributing authors listed. Perhaps it's not time yet to switch to the cff if the maintenance burden is too great.

I looked at matplotlib, scipy, and numpy and none are using a cff

  • matplotlib: they do not use zenodo or at least do not link to it, and have a citation section in their readme that points users to CITATION.bib that credits John Hunter only - I think this is a research paper, not a software entry
  • scipy has a DOI badge and a citation section in their readme that points to their nature article that lists the major contributors and then an interesting link that shows the scipy-1.0 contributors
  • numpy just has a DOI badge that points to their nature article that lists just a handful of their contributors

I kept searching:

  • scikit-learn: DOI badge to zenodo only, no cff
  • pandas: ditto, DOI badge to zenodo, no cff
  • statsmodels: no citation, nothing
  • seaborn: DOI badge and citation section in readme to JOSS paper, lists only lead author/maintainer

So I wonder if we want to the first to break ground on this. I apologize to everyone on what seems like a red herring

@ColmBhandal
Copy link
Author

@mikofski No research is wasted! Like the original issue said, "what is a citation format file?" I think the various discussions and explorations we've had answers that question. I tend to agree with you that it's probably a better approach to sit back and monitor the CFF usage over the next years, and if it does indeed take off then cross the bridge when you get to it - at which point hopefully there's better support for things like citation styles & automatically keeping it up to date with an author list.

So I'm totally happy to close this PR without merging (and maybe the issue could be closed as WON'T-DO also, so that future new contributors don't get sucked into it?)

@ColmBhandal
Copy link
Author

Hey folks, @mikofski @cwhanse @wholmgren @kanderso-nrel - I will close this PR by EOD unless anyone has a specific objection to me closing it.

I am assuming that the general consensus for the CFF, based on everyone's comments, is "could be good in future - but let's leave it for now". Bearing that in mind, I'd rather close the PR so that it doesn't become one of those stale PRs that hangs around for months and then dies a slow death.

Thanks!

@mikofski
Copy link
Member

I asked about CFF file on NumFOCUS, got this response:
image
which linked to this twitter post from pangeo professor Ryan Abernathy, but wasn't much use.

However, I did discover something interesting:

  1. Xarray is using a cff file
    image

  2. If you add a bibtex entry called "CITATION.bib" to your repo, GitHub will use that where is says "cite this repository" - this is what matplotlib does:
    image

Numpy and SciPy are also using a CITATION.bib file instead of CFF
image
image

  1. I was looking in the wrong place for the "cite this repository" GitHub feature, it's a hidden pulldown near the top on the right side that only opens when you click the pulldown

I'm still fine with closing this issue and waiting on the CFF, but we might consider adding a CITATION.bib file but I'm not sure what it should link to. Again seems like a maintenance burden if we need to update it with every release. I'm also happy to just leave everything as it is.

I think the long term goals are to:

  1. make sure folks are correctly citing pvlib using the zenodo DOI and JOSS paper
  2. give credit to all the contributors, not just the JOSS & older paper authors
  3. provide a way for users to get back to the repository and the documentation from citations
  4. automate any part of the zenodo citation that is manual (if there are any other than updating the badge)

@ColmBhandal ColmBhandal deleted the add-cff-file branch January 25, 2022 17:32
@wholmgren
Copy link
Member

The twitter screenshot confuses me because Zenodo is still issuing new DOIs for every pvlib release. The zenodo DOI in the readme is https://doi.org/10.5281/zenodo.3762635, which resolves to v0.7.2. Zenodo also says

Cite all versions? You can cite all versions by using the DOI 10.5281/zenodo.593284. This DOI represents all versions, and will always resolve to the latest one. Read more.

So maybe a short term solution is to update the readme to point to that catch-all DOI. I feel like we discussed this 2-3 years ago too, so maybe I'm forgetting something.

@mikofski
Copy link
Member

@wholmgren TBH it went a little over my head b/c I don't understand/know how zenodo is set up to create new DOI's for each release -- but my general takeaway was that this problem of chicken/egg wasn't unique to CFF's, and that a common solution is to make latest point to the versionless DOI, so that stable always points to the current DOI on zenodo, or something like that.

Anyway, what do you think about adding a CITATION.bib file instead of a CFF? And if so, would it point to JOSS or zenodo? If JOSS then seems to miss the point of cite this repo (should be zenodo), but if zenodo, then how to generate a complete list of authors, update to current DOI, and then seems like CFF would be the right approach afterall.

@mikofski
Copy link
Member

I learned some more interesting info about this lately.

  • Zenodo first tries to parse .zenodo.json if it exists in the repo and will ignore a CITATION.cff files if also present. If .zenodo.json doesn't exist, then it tries to parse the contents of CITATION.cff if it exists. Finally, after parsing either file, Zenodo will use the GitHub API to make it's best guess about any missing metadata like author names, version, license, etc. - see: Zenodo Help FAQ's "How does a CITATION.cff file affect the metadata of my GitHub release?" and Zenodo developer docs for GitHub integration
  • It's possible that because the CITATION.cff file has the version, that the Zenodo webook will fail if the version is not updated in the CITATION.cff file before the release triggers the Zenodo webhook - workarounds for this are (1) update the cff file before triggering the release or (2) omit the version from the cff file and then Zenodo will fall back on the GitHub API to get the version - this is what pandas is doing
  • The chicken/egg problem refers to the DOI, which one wouldn't know until after the release is triggered. One workaround for this (as Will suggested) is to use the catch-all DOI that always points to the latest release on Zenodo. Another is to omit the DOI altogether, but then it won't show up in the auto-generated APA or BibTex file - this is what pandas does
  • To avoid letting Zenodo parse the cff file, another option is to force zenodo use GitHub API and by omitting the cff file and providing a CITATION.bib or any of the other citation files which zenodo will ignore. Although to be honest this is redundant if it merely points to the Zenodo archive, because that's what the Zenodo DOI badge already does. I think this is why SciPy, NumPy, MPL, and others are setting the bib file to their static JOSS, Nature, IEEE, etc. articles.
  • Ditto one could also omit the cff file, provide a bib or other citation file, and use .zenodo.json together with GitHub API to get metadata
  • Another problem with either cff file or .zenodo.json is listing authors which would require manual editing to update or there are some scripts that the SciPy team have developed to use the GitHub API to scrape these as well - but that seems circular if Zenodo will fall back on the GitHub API anyway, so why bother. A workaround that pandas uses is to just list a single author as "The Pandas development team"
  • Ditto this problem for a bib or other citation file if pointing it at Zenodo and not at a static publication. SciPy, NumPy, MPL, and others avoid this by using a static publication for their CITATION.bib file.
  • If zenodo has access to either a cff or .zenodo.json file it will use it to fill in metadata such as license, full author names, author ORCID numbers, repository URL, and other information that it can't parse from the GitHub API. In particular, the GitHub API may not have full names only user names which are not as informative. Also Zenodo can't seem to pull license info or license URL from the Github API either. I can't think of any workaround other than manual editing to update author metadata in .zenodo.json or a cff file, but we update author names in what's new, it's possible that we can ask authors who desire acknowledgement in the citation to update .zenodo.json or a cff file.
  • I still cannot find many scientific python packages using a cff file (or .zenodo.json) except pandas now does. Most are using a bib file to link to their static journal article.

more info:

@AdamRJensen
Copy link
Member

I looked at matplotlib, scipy, and numpy and none are using a cff

  • matplotlib: they do not use zenodo or at least do not link to it, and have a citation section in their readme that points users to CITATION.bib that credits John Hunter only - I think this is a research paper, not a software entry

Matplotlib now uses a CITATION.cff file FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

what is a citation format file?
5 participants