I've been running an HPC system for a science group for a while now and have built a couple of different systems based on common HPC infrastructures (ROCKS or Open HPC). These have been built on top of the rebuilt RHEL distros (mostly CentOS), but I don't really need the level of stability that these provide and would actually like the sort of updates that you get from something like CentOS stream, so this seems like a time to try this.
The problem is that I haven't found an HPC framework which would natively support this so I'm potentially going to have to roll my own. I don't need anything fancy just some way to automatically deploy nodes and set up slurm to get jobs queued.
Any pointers to suitable frameworks or tools which would help with this and which aren't tied to older distros?
I would not be using CentOS in your use case as it is a rolling release and as such not considered stable for production environments. In recent times Ubuntu server has taken over where CentOS was once used.
In regards to a framework for HPC, I would be looking at grid computing and using one of the scientific workflow management solutions which is compatible with your requirements and a Linux environment.
The lack of stability is actually quite attractive to me. In a scientific environment we're normally running fairly new, often unstable code, and we often hit problems because of using older versions of libraries / packages / compilers, so somthing which stays a bit more current would be good and we can deal with breakage if it happens. The trouble is the management systems around HPC assume you're working on enterprise systems, which isn't really true in our case.
I've looked at things like OpenHPC but they're still on RHEL8 (RHEL9 is in testing but not released yet), and even lower level tools like warewulf is still only supporting RHEL8 at the moment which is getting too old for me to want to build a new system from it.
I've looked at more generic tools like Ansible and Chef / Puppet but before I go down that rabbit hole I'd like a sanity check that there isn't something more suited that I'm missing.
It's a misconception that Centos Stream is a rolling release. It comes in versioned releases that tracks ahead of Red Hat by a few months and have 5-year support cycles.
It literally states in the CentOS site that it is a, and I quote;
"Continuously delivered disto that tracks just ahead of Redhat Enterprise Linux (RHEL) development..."
Yeah, conceptually I like it. A while back I used to run my systems on Fedora which was great in that I always had the latest of everything, but doing updates every 6 months got tedious. Stream seems like a good compromise on the way to that.
I mean, if you know the software you need to have, to make it work on RHEL, It might take a bit of work on your part, but I can't imagine getting it installed on CentOS Stream will be that onerous a task.