University of Manchester Logo

Research IT

coloured-wall-1197380-1599x1066.jpg

Analysing Tweets from Twitter

Twitter has just announced a new programme to allow researchers to interact with Twitter to extract data for research purposes. In this blog Peter Crowther, a Research Software Engineer in Research IT, gives an introduction to interacting with Twitter programmatically and what these latest changes mean for researchers.

Twitter is one of the world's most popular social media platforms and its format lends itself to sharing of thoughts and of the moment ideas. This means that Twitter can be a valuable resource for researchers who want to measure people's sentiments around particular themes.

A search of the University of Manchester Research Explorer shows over 500 published works for the search term “Twitter”, on a wide range of topics from predicting traffic flow to predicting people's personalities from their tweets. Here in Research IT, Ian Hinder worked on a Twitter analysis project with researcher Dr. Vitaly Kazakov which aimed to analyse the sentiment around the “Information War” between Russia and the West, focussing on large scale public events like the Sochi Winter Olympics in 2017.

So how can researchers go about gathering data from tweets? The very simplest way is to manually search Twitter using a web browser, but this is unlikely to scale well for the amount of information needed for a typical research study. A more automated way would be to write a program to interact directly with the web pages of Twitter to extract information. This technique commonly called “web scraping” would be more efficient than a manual search but can be tricky to program. The structure of a web page is not designed to be easily read programmatically and web scraping programs are often fragile - they are liable to break if the design of the web page changes.

The best way to programmatically access data from a web source is through an Application Programming Interface (API). If the website offers one, an API is a great way to efficiently programmatically access data. Many web services offer APIs, from Dropbox to Facebook to Google Finance to AirBnB. An API provides a documented and stable interface to be able to write code to interact with that service

Frequently these APIs are free to access since the companies want people to interact with their services. There is often a free tier of service that allows access to basic API functionality, however this free tier may be limited in the number of requests per minute that can be made, or the amount of data that can be downloaded. For the most part this should be sufficient for researchers, but if not, the premium tier that offers higher volume access to the API can be expensive. For example, in 2018 there was significant controversy when Google significantly decreased the access limits to the free tier of their Maps API, incurring significant costs for many who embedded Google Maps in their applications and websites.

So what are the latest changes at Twitter? Historically Twitter has had a limited basic API and a premium paid API. They have just introduced an Academic Tier which has some of the enhanced features of the premium API and is free for non-commercial use by academics after registration. They have also added a range of resources for learning to use the API. Interacting with the Twitter API is possible through almost any programming language and there are a range of libraries available to make programming easier.

If you have questions about Twitter scraping or interacting with web APIs in general, come and chat to one of our Research Software Engineers at our next Research IT drop-in session. You can also reach us through the “Research Support Requests” link on our webpage.

Return to article index