First of all, I would like to thank the team and the 2 admins of other servers and for their help! We did some thorough troubleshooting to get this working!
The upgrade
The upgrade itself isn't too hard. Create a backup, and then change the image names in the docker-compose.yml and restart.
But, like the first 2 tries, after a few minutes the site started getting slow until it stopped responding. Then the troubleshooting started.
The solutions
What I had noticed previously, is that the lemmy container could reach around 1500% CPU usage, above that the site got slow. Which is weird, because the server has 64 threads, so 6400% should be the max.
So we tried what had suggested before: we created extra lemmy containers to spread the load. (And extra lemmy-ui containers). And used nginx to load balance between them.
Et voilà. That seems to work.
Also, as suggested by him, we start the lemmy containers with the scheduler disabled, and have 1 extra lemmy running with the scheduler enabled, unused for other stuff.
There will be room for improvement, and probably new bugs, but we're very happy is now at 0.18.1-rc. This fixes a lot of bugs.
Admin communicating information about the site to users like genuine human beings, instead of the corporate-sanitized pablum reddit admins speak in? Yeah that's refreshing.
It's cool we all want the new community to succeed and get live updates on things being fixed or worked on. There's some shared misery in growing pains/bugs as we all stress test the system, but I think that only somehow brings more communal joy when the problems are fixed.