Ann Gledson, Research Software Engineer, recently presented at the Alan Turing Institute Tools, Practices and Systems Programme, describing the 'Mine-the-Gaps' visualisation tool. The tool was developed as part of the Turing project ‘Understanding the Relationship between Human-Health and the Environment’, led by Prof Caroline Jay from the Department of Computer Science and was used as part of a project to examine how hay fever symptoms collected via the Britain Breathing app were affected by changes in air quality and weather.
As part of this project, back in January 2021 Douglas Lowe and Ann Gledson explained the problems they’d found with online air quality, pollen and weather sensor datasets and presented their new cleaned and pre-processed data-set and tools to allow other researchers to easily access and utilise such data. One such shortfall was regional coverage: Sensors can’t be installed everywhere, so many areas are sparsely covered, with some measurements (e.g. pollens and SO2) even more sparse than others. So how do we for example, approximate the level of pollen for somebody reporting severe allergy symptoms in Manchester if the nearest sensor is in Chester? We implemented two basic, distance-based estimation algorithms to address these regional gaps. Whilst being mindful that better methods exist that we could develop, we decided that it would be more useful to provide researchers with a way to visualise, evaluate and compare new regional estimation techniques.
Mine-the-Gaps allows researchers to visually check sensor data, sensor locations, regional coverage and the likely accuracy of any regional approximation methods by displaying such data on a map. Estimated data can be loaded into the application via a CSV file, or new algorithms can be added to our region estimations Python package, used by Mine-the-Gaps to calculate estimations on-the-fly. All uploaded data is presented on a map and time-series charts can be displayed that compare actual sensor data with approximations for that location, had the sensor been missing. Further evaluation methods are planned in upcoming work.
The live version of Mine-the-Gaps currently displays 5 years of cleaned and pre-processed UK sensor data, originally from DEFRA (air quality) and the Met Office (pollen and weather) monitoring stations alongside our basic region estimations. Sites can be filtered by any metadata fields included in the input CSV files and all the data is downloadable via an API. Users can also upload their own time-series geospatial data locally and compare this with the pre-loaded data.
If anybody has their own sensor data and/or estimation method to be visualised, the app can be cloned and installed from our code repository. Just 2 lines of script are required to get it running locally and accessible from any browser. Time-series sensor data from any location can then be uploaded and estimations made for any regions (upload the required region boundaries in a CSV file). If you need to improve functionality, Mine-the-Gaps is open-source and issues or pull requests are welcome.
In summary Mine-the-Gaps can be used as-is with pre-loaded UK environment data; by cloning the repository and loading in new data; or by using our GitHub links to extend the web-app and region-estimators functionality.
Potential use cases include the evaluation of current and new regional estimation methods for sensor data; evaluating the effects of variation in region shape and size when making estimations; helping to determine where a new sensor would be best positioned to fit with the existing networks; the visualisation of irregularly spaced datasets; and visualising sensor data over time. Plans are also underway to use Mine-the-Gaps for UG student projects.
A recent Turing Fellow seminar by Prof Jay on "Human Health and the Environment" is available online.