Research IT

Turning Analysis Scripts into Packages

Find out how Peter Crowther, a Research Software Engineer in Research IT, helped Christopher Daniel from the Lightform Group in the Department of Materials to improve and prepare his code ready for publication.

Data analysis code often starts off as small scripts, but the complexity can build up quickly. As analysis scripts get bigger they can become harder to maintain. Errors can easily creep in and changing the code slightly can require increasingly complex work-arounds. By the time the code leads to publication, it is often difficult to verify or reproduce the results. As well as this, there is often a lot of work that has gone into developing the scripts, which may be lost if they are not in a state that can be easily shared with others.

At a certain point, it can be good to take a step back and clean up the code. Peter Crowther, a Research Software Engineer in Research IT, has recently been working with Christopher Daniel from the LightForm group in the Department of Materials, to improve a collection of Python scripts used for the analysis of synchrotron X-ray diffraction (SXRD) patterns.

Synchrotron X-ray diffraction is an analysis method that can be used to analyse the structure and behaviour of materials. In the case of LightForm, the group is interested in analysing how the crystal structure of metal alloys change during manufacturing processes, which are then used in a diverse range of applications from aeroplanes to nuclear reactors.

Chris had explored some other tools for data analysis, but found that none of them quite suited his needs and so he developed his own analysis code in Python. He then got a to a point where the analysis was prototyped, but he needed some help bringing it up to standard for publication.

Using the scripts as a template, Peter was able to work with Chris to clean up and build on the code to make it more adaptable, easier to use and also improve the performance. After some work, the value of the analysis to the wider field became apparent and Peter and Chris worked together to turn the scripts into a Python package which was named “xrdfit”. Packaging python code makes it much easier to share with other researchers.

As xrdfit improved, it was decided that the work should be submitted to JOSS – the Journal of Open Source Software. JOSS is a journal focussed on publishing good quality open source software and making it accessible to the wider research community. The criteria JOSS uses for assessment are things like, being easily installable, well documented and providing tests to verify correctness/repeatability. Just like traditional academic publishing, the code can be improved with the feedback from reviewers and once complete an accompanying manuscript is published online describing the importance of the work. In August 2020, xrdfit was accepted for publication in JOSS and the paper is already attracting interest from the research community.

This example shows that with a little extra work, analysis scripts can be turned into a research output of their own. As well as the improvements in performance and usability, the package can then be used by other researchers, helping them to analyse their data.

This work was carried out with funding from the EPSRC LightForm grant EP/R001715/1.

If you are interested in receiving help with your code or any other Research IT related issues, please get in touch!