Over the past few months our Research Platform team have been busy behind the scenes adding resources to the University’s main computational platform. With minimal disruption to the large numbers of researchers and PGRs who currently use the resource, the platform now has the following specs –
- approximately 8,000 HTC CPU cores
- 128 Nvidia GPUs (V100s and A100s), once the latest nodes already procured are in production
- over 4,000 cores in the HPC Pool
- more very high memory nodes coming via the current procurement, so that in a few months we ill have 18 nodes with at least 1.5 TB RAM
There has been continual investment in compute nodes since the CSF3 was originally commissioned around six years ago, from research grant funds and also through Research Lifecycle Programme (RLP) funding. However the critical infrastructure such as the login nodes, management nodes, software application storage and scratch file systems had not been updated for some time, increasing the risk of service disruption.
Last year the Research Platform team began a programme of work to address these issues. In the summer of 2021 we replaced the infrastructure underlying the scratch filesystem; this summer we replaced the remaining critical infrastructure; and soon we will be investing to expand the scratch filesystem.
So what does this mean for researchers at the University? It provides greater capability for our Machine Learning and AI researchers complimenting the access to the N8 CIR TIer 2 GPU machine and access to high RAM nodes.
Dr Sophie Nixon from Manchester Institute of Biotechnology has been making good use of our new RAM nodes - "Crucially, the provision of more and bigger high memory nodes will unlock the potential for scientists at Manchester to do bioinformatics analyses of genomic data from microbial communities. Manchester has a large research presence in microbiome science, spanning all three faculties, but bioinformatics has remained the critical bottleneck to being a world leader in this area. The greater capacity to run bioinformatics jobs on the CSF3 will help solve this."
In general though, it will also ensure the longevity of the CSF3 as it will now be in production for another five years ensuring the full benefits of investments in compute nodes from research grants and also via RLP are fully realised.