Research IT

Developing New Ways of Comparing Ancient Documents

Nicolas Gruel, a research software engineer from Research IT has been working with Prof Peter Pormann (School of Arts, Languages and Cultures) to easily compare interpretations and translations of a collection of around 60 early Ancient Greek medical works associated with the physician Hippocrates.


Named the "Hippocratic Aphorisms", the aphorisms cover a wide range of medical areas including case histories, theories of illness and a variety of ailments and conditions. The project aimed to examine rare Arabic translations of the aphorisms, along with several running commentaries by scholars who have worked with them over the years.

The project obtained scanned copies of manuscripts containing Arabic translations of the Hippocratic Aphorisms that had, over the years, been annotated with commentaries from various sources. These manuscripts were converted from plain images to a text-based format, containing the aphorisms and their associated commentaries, aggregated from multiple sources. The different sources are not necessarily in agreement on the correct translation! The original goal of this project was to further process the text files into XML files in a standard format suitable for upload to digital libraries so that the multiple interpretations and translations could be more easily compared.

The Text Encoding Initiative (TEI) provides a standard XML format suitable for many different types of humanities, social sciences and linguistics data, which such digital libraries can ingest and display in many different ways. In a close collaboration with Prof Pormann's team, Research IT wrote bespoke software named eXegis that converts their files into TEI XML. In the process of performing this conversion, eXegis can detect and repair errors in the original files, ensuring that the resulting XML is correct, and also providing feedback to the researchers. As a result, this tool now has a second life as a teaching tool for humanities researchers new to transcribing texts of this nature. eXegis is opensource (BSD Licence) and available on github.

By doing this work, the project has created public, searchable transcriptions of important texts which were, up until now, only available as fragile, handwritten documents, allowing scholars to better understand the history of the translation of these texts.

If you think that your research project could benefit from the input of a skilled research software engineer, get in touch for a chat.