What Is SuperComputing
The Supercomputing Conference, also known as SC, is an annual event established in 1988 by the Association for Computing Machinery (ACM) and the IEEE Computer Society. The importance of SC lies in its role as a meeting point for scientific discovery and technical innovation. It provides a venue for scientists, engineers, and technologists to present their latest computational achievements and advancements in High Performance Computing (HPC).
The conference covers a wide range of topics, including software, algorithms, applications, CPUs and GPUs, quantum technologies, cooling, memory, interconnects, and standards. As expected, there was a very prominent theme of AI winding its way through everything this year, and a new machine claimed the number one spot in the Top 500 supercomputer listings: El Capitan, hosted at the Lawrence Livermore National Laboratory, USA. It is built by HPE/Cray with an incredible 1,051,392 AMD CPU cores and 9,988,244 AMD GPU cores and uses 40MW power at peak performance!
Over the years, SC has grown significantly, and SC24 in November – hosted in Atlanta, Georgia, USA – broke all records with over 18,000 in person attendees and over 500 exhibitors. In short, SC plays a crucial role in advancing the field of HPC and fostering collaboration among experts from industry, academia, and government.
A Learning Experience
There is a wide range of workshops and tutorials that run either side of the main conference programme. Workshops provide a venue for focussed “mini-conference-within-the-conference” events which have their own organising committees and peer-reviewed presentations. Tutorials are dedicated teaching sessions where attendees can learn directly from leading experts in the field.
Rob attended two full-day workshops:
Research Software Engineers in HPC
This is a workshop for which I’ve been on the reviewing committee for a few years but had never had the chance to attend. It was an interesting day with a typically thought-provoking keynote talk about future hardware developments and predictions of the end of NVIDIA’s near-monopoly on GPU/AI capability in the not-so-distant future. An important element of the day was discussing how RSEs can adapt and continue to be at the cutting edge of research while so many technical advancements are being made so quickly. Ultimately, being the technical enabler in team-based research will be the key.
Women in HPC: Diversity and Inclusion for All
This workshop, which has run for a few years now, is focussed on improving diversity and inclusion for all underrepresented groups in the HPC workforce. Its aim is to build a deeper understanding of what diversity, equity and inclusion means for different groups as well as looking at strategies for recruitment, retention and success. An important part of the day was the opportunity for early career folks to network and engage with more senior attendees in an informal setting. My favourite part of the day was a session on Troika Consulting,2 which is a novel method of small group coaching. It surprised me how immediately effective it was, and I’d love to try it again at some point.
Meanwhile George attended:
Using Containers to Accelerate HPC
A full day hands-on tutorial run by people from various international HPC sites and Nvidia. I attended this due to the increasing popularity of containers on the University’s HPC platform – the CSF. We currently support of one the main container technologies, but it was useful to learn what other sites are doing with some of the competitor methods. One of the goals of increasing the use of containers is to better help user to support themselves on the CSF.
Both George and Rob attended:
Fourth Combined Workshop on Interactive and Urgent HPC
The main interest from our perspective was to see what other sites were doing to support users who are not necessarily familiar, or comfortable, with the traditional batch compute approach to high performance computing. In particular, there was an impressive presentation from Lund University on accessing their supercomputing resources in an interactive manner – ideas which we can bring to our platforms.
Vendor Meetings
An important part of attending SC for us is meeting with various vendors, both to maintain relationships with those we already work with and to meet with those we’re looking to explore working with in the future. For many companies this is the prime opportunity to meet with a wider range of consultants than we would be able to for the rest of the year in the UK. The larger outfits take over entire hotels near the conference site, so there was quite a lot of running about around downtown Atlanta to get to all of our scheduled meetings.
There were three sessions with Dell, from whom we buy the hardware for our Computational Shared Facility. We covered subjects such as networking for AI applications, data centres and cooling, and an indepth look at their upcoming server range. The session on data centres was particularly eye-opening, when we were shown graphs of expected power usage of new CPUs and GPUs and the fact that 480kW server racks will be available in the next couple of years.
A meeting with NVIDIA saw us discuss their GPU roadmaps and see their reference server designs. Their flagship machine houses 36 Grace CPUs and 72 Blackwell GPUs in a rack, which are connected to form a single massive GPU for super-heavy workloads. We also discussed the deepening partnership between NVIDIA and The University and what next steps might be.
Two sessions with AWS covered Cloud HPC, which is their new service for running batch compute jobs on their infrastructure. This is something that may complement our on premise HPC services for certain workloads and usage patterns. We also met with Ronin who provide a layer over AWS to simplify management of resources and improve budgetary control. We’re hoping this can form the basis of a more self-service provision of Cloud to our research community.
Finally, we also met HPE. We don’t have formal relationships with them at the moment, but it was good to hear about how they could potentially work with us if we’re looking to invest in an HPC or AI capability in the future.
Highlights from the Show-floor
It’s hard to describe just how big the show-floor is at SC. Looking at the layout, a relatively conservative estimate would suggest it’s about 1400 x 300 feet (this is in the US, don’t forget) in size, which is about 9.5 acres! No wonder one of the most useful bits of advice when attending SC is to make sure you have comfortable shoes.
Exhibitors cover the full range of organisations in the HPC space. As well as all the large server and storage vendors that you would expect to see, there are many software houses and niche players there too. We were surprised by how many companies were there just selling cooling systems – pumps, pipes and liquids for removing heat from overworked hardware. There was also a strong contingent of universities present, although mostly from the US and Japan, due to the way large HPC systems are funded in those countries.
To stand out from the crowd and ensure plenty of footfall, there are many different strategies exhibitors employ to attract people to their stands. A fairly standard, and very useful, one is to host a range of talks about their products or research. There’s pretty much a conference within the conference going on, albeit with a definite sales focus. Other tactics rely on gimmicks, such as handing out raffle tickets to win fancy Lego sets, general swag such as t-shirts, socks or hats, or even free popcorn or beer. Dell probably “won” this year with a full-service bar serving bubbly and cocktails.
Reflections on the Week
Overall, it was a very useful experience and between us we covered enormous amounts of ground (figuratively and literally) over the time we were at SC. It’s always valuable to meet people where they are, and it was certainly nice to meet some folks in person that we’ve only ever seen on the end of a video call in a different time zone. As the University makes plans around, in particular, AI and what sort of capabilities our research community need in this space, it was the perfect opportunity to immerse ourselves in everything that is going on out there, and very timely.
