Research IT

Restricted Data Research in the Cloud

If your research involves restricted data you may feel that your computational choices are reduced. Gary Leeming, Project Design Lead for Research Lifecycle Programme Project S "Develop a service to manage restricted data", discusses what the University is doing to open up more options.

With an increasing focus on security and privacy for data, especially personal data such as commercial data or criminal records, there is a need to support researchers in keeping their data safe. The Project S development team is responsible for designing a Highly Restricted Data Service (HRDS) and doing some research to learn more about how people currently plan and run projects using high-risk data so we can understand how we might make better tools and services for researchers in the future. This research will inform a series of co-design sessions later in the year so all interested researchers can participate.

The traditional approach to security is often at odds to how researchers like to work as it requires the negotiation of opaque governance processes, forms and services that are often locked down and limited. With the ongoing development of HRDS at the University of Manchester we are exploring how we can simplify the processes for secure research and whether the use of cloud services, with the promise of easy extension of capability and tools, and a software-defined architecture approach, can help create a more effective research platform.

Based on ongoing interactions with researchers Project S has set up a plan to explore three main areas of functionality: data safe havens; anonymisation of data; and the collection of data.

Data Safe Havens

Data safe havens (DSH) are one of the most common approaches to secure data research. The idea of a DSH is that an environment is established, with appropriate security and governance controls, that contains both the data and the software tools for a researcher to undertake their research. This approach has already been implemented many times, including the University of Manchester Trustworthy Research Environment, the Welsh SAIL project, and the Turing Institute have also been developing a cloud-based system. Following the learning from these projects we have been developing a solution with support from Amazon Web Services (AWS) that allows for the secure ingress and egress of data, as well as a Linux-based environment for the researchers. Some of the learnings from our work include:

  • A two-factor authentication (2FA) approach means that the researcher can access their environment quickly but still be secure
  • We have chosen to have an environment per-user to ensure that every user is entirely responsible for their own data and work. Sharing of code within a project is permissible, but not the virtual workstations. This is enforced through the 2FA and an SSH key, as well as a password
  • Because the environment is defined in code it is easy to re-build and manage future environments in a more automated way in future. This should make the process of supporting researchers easier
  • It is easy to control the software and versions deployed on the virtual workstations using machine images

Anonymisation of Data

The security and management environment for the cloud-based DSH are also the foundation for the other areas of functionality. We have started development of the potential for anonymisation processes. The purpose of this work is to look at how we can validate data that is being extracted from the DSH is safe to export, and also to look at how we can modify data in such a way that researchers can use the data without needing an expensive DSH environment.

Collection of Data

The final project will look at how we can bring data in to the environment in a secure way. For example, data from wearables that may contain sensitive geographic location information connected to medical records is one use case.

The HRDS project is looking for further examples and project ideas that need a secure research environment, and we are currently engaging in further analysis and research activity. We need a variety of different voices and experiences involved so that we can jointly develop solutions that work better for researchers and governance. Whether you are new or experienced in using high-risk data, or if you avoid doing it because it’s too difficult, then we want to hear from you. We would also like to hear from you if you support researchers using high-risk data, for example, research support, Information Governance, research governance etc.