Research IT

Introducing Software Citation

In August, GitHub announced software citation support in GitHub repositories. Discover how Rob Haines, Head of Research IT, played an instrumental role in making this happen.


As software has become more critical to research, and as people have started to build careers around writing software in academia – not least here at the University in our Research Software Engineering team – it has become more important for people to be able to get credit for the software that they have written. In academic circles, of course, credit is often given by someone citing another’s work. Traditionally the works being cited would be papers, but as software has risen in prominence, maybe it’s time to start citing software directly. The citation of research software has several purposes, most importantly attribution and credit, but also the provision of impact metrics for funding proposals, REF, job interviews and so on.

So, when the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1) was held here in Manchester in September 2017, a group got together to talk about how to make citing software easier and, in particular, to look at defining a standard by which to provide the essential data to do so. This group, which was convened by Stephan Druskat, and included Research IT’s Andrew Rowley, came up with the Citation File Format (CFF) which could be easily included in a software repository alongside other standard files such as a ReadMe or Licence file.

At subsequent events, such as the Software Sustainability Institute’s Collaborations Workshops, more CFF discussions took place, and people started to build tooling around the standard to help people use it. I decided to build a library in Ruby to programmatically build and edit CFF files – ruby-cff – as an interesting personal project that I thought might have some use someday. I promptly forgot all about it until around Easter 2021 when GitHub got in touch. Arfon Smith, who I had previously worked with on papers about software citation in academia, wanted to add an easy way for people to be able to grab useable citation text directly from a repository in GitHub. He had discovered CFF and subsequently ruby-cff and, because GitHub is a Ruby on Rails application, ruby-cff was a perfect fit.

At this point the library didn’t have any output capability so one of the GitHub developers, Patrick Dinger, added functionality to output a CFF file as either an APA-like citation string or a BibTeX entry. With this in place, GitHub added a user interface and now, if you have a CITATION.cff file in your repository, you get a widget on the right-hand side of the GitHub interface that gives you the citation text to copy into your paper.

citation-link.png

GitHub’s software citation support is an important advance for the adoption of software citation, as software authors can easily provide the required information for citing software directly in a git repository. It also provides a basis for other tools and workflows to integrate with this information: within hours of the initial announcement by GitHub CEO Nat Friedman, we saw support for this new workflow by the scholarly repository Zenodo and the reference manager Zotero. There have also been discussions about adding this feature to GitLab as well.

This is just one aspect of software citation though, as authors are still strongly encouraged to archive the source code in a long-term archive. This ensures that the precise version of code used for a particular output is preserved, which is important for repeating and replicating research. Options for this could be the Software Heritage universal software archive, and/or Zenodo via the Making Your Code Citable workflow. The versioning of software provides a challenge for citation systems as there can be multiple use cases to support, such as the need to cite a specific version, and to aggregate the citations of all versions in a single place. This will be tackled in the future to ensure that GitHub releases can be linked to the version information provided in this new GitHub feature, and to support the generic citation without a specific version.

For anyone who writes research software – and this applies particularly to RSEs – this new feature, and the subsequent tools and workflows built on top of it, should make it easier to accrue credit for this software, and build evidence for career development and progression, without having to rely on writing a paper and getting it published somewhere.

The GitHub docs have the details on how to use this feature, and if you would like to know more about CFF itself it is documented in its GitHub repository.

If you have any questions about version control or the use of GitHub why not come along to one of our virtual drop-in sessions and speak to a member of the Research IT team?