Research IT

WhatsApp Scraping

When researchers Drs Gary Motteram, Susan Dawson and Amanda Banks Gatenby from the School of Environment, Education and Development wanted to run analysis on WhatsApp messages, they came to Research IT for help.


They were interested in performing qualitative data analysis of social media data (specifically WhatsApp) from their communications with school leaders they are working with in Côte d'Ivoire. The researchers wanted to see if patterns emerged in the WhatsApp messages or if interesting features came from the data.

Joshua Woodcock, Research Software Engineer (RSE), wrote a script that picks apart the message text and creates a streamlined file that keeps data points such as sender, timestamp and the message itself. Unnecessary information such as when somebody leaves or joins the chat or changes icons etc is discarded.

During the project WhatsApp changed how their messages were exported meaning that that the original scripts would only work on newly exported messages, not older ones. Thankfully, Joshua was able to implement a fix, meaning that the script will work on any exported WhatsApp message (for now!).

Even though the researchers have only just got access to the script it has already demonstrated that it can analyse over 2000 messages in a matter of seconds, paving the way for the analysis of many more messages.

This small project was an ideal demonstration of how technology changes, even over the course of a short term project, and how important it is to make sure that your code is robust so that it can be used in the future - and easily adapted, not if, but when technology changes.

Joshua’s code has been made open source so anyone can use it. The code along with instructions can be found in the Research IT Git Hub repository.

If you are interested in running the script or performing a similar analysis, we offer a one-day introductory course in Python, the programming language needed to run the scripts. The Python course dates are 16 Oct 2019, 15 Jan 2020, 15 Apr 2020, 17 Jun 2020. All our training courses are free for UoM staff and PGRs.

If you have a research project or idea that we may be able to help with, get in touch!