So I started lemmy.world on a 2CPU/4GB VPS. Keeping an eye on the performance.
Soon I decided to double that. And after the first few thousand of users joined, doubled it again to 8CPU/16GB. That also was the max I could for that VPS type.
But, already I saw some donations come in, without really asking. That reminded me of the willingness to donate on Mastodon, which allowed me to easily pay for a very powerful server for mastodon.world, one of the reasons it grew so fast. Other (large) servers crashed and closed registrations, I (mainly) didn't.
So, I decided to buy the same large server (32cpu/64threads with 128GB RAM) as for masto (but that masto one has double the RAM). With the post announcing that, I also mentioned the donation possibilities. That brought a lot of donations immediately, already funding this server for at least 2 months. (To the anonymous person donating $100 : wow!).
Now next: to solve the issue with post slowness. That's probably a database issue.
And again: migration took 4 minutes downtime, and that could have been less if I wasn't eating pizza at the same time. So if any server wants to migrate: please do! If you have the userbase, you'll get the donations for it. Contact me if you have questions.
I'm not an admin, but have followed the sizing discussions around the lemmyverse as closely as I can from my position of lacking first-hand knowledge:
lemmy.ml is the biggest instance by user count, but runs on incredibly modest 8-cpu hardware. Their cloud provider doesn't provide any easy scale up options for them, so they can't trivially restart on a bigger VM with their db and disk in place. I suspect this means that instance is going to suffer for a bit as they figure out what to do next.
lemmy.world on the other hand was running on a box at least twice as big as lemmy.ml at last count, and I believe they can go quite a bit bigger if they need to.
The lemmy.world admins also run mastodon.world and lived through the twitterpocalypse, seeing peak user registrations rates of 4k per hour. So this is not their first rodeo in terms of explosive growth, I'm sure that experience gives them some tricks up their sleeve.
The admin team is pretty clearly technically strong. If I recall correctly, ruud is a professional database admin. One of the spooky parts of Lemmy performance-wise is the db. If ruud or others on the admin team custom-tuned their pg setup based on their own analysis of how/why it's slow, they may be getting more performance per CPU cycle than other instances running more stock configs or that are cargo-culting tweaks that aren't optimal for their setup without understanding what makes them work.
I'm surprised that sh.itjust.works isn't growing faster. They also have a hefty hardware setup and seemingly the technical admins to handle big user counts. I wonder if it's a branding problem, where lemmy.world sounds inviting and plausibly serious where sh.itjust.works sounds like clowntown even though it's run by a capable and serious team.
It's known in the industry as the throw-hardware-at-it optimization. It's often effective and what's needed to buy time for software optimization to come in.
Talk about dumb luck! I chose this server (apparently 2 days after launch) because docmentation suggested choosing a less populated server to spread the load. Now I'm on one of the biggest and most stable. Me so happy!
He has been posting updates along the way. It's a combo of upgrading the server as it hits its limits and tuning of his web proxy and docker container to handle the increased load and federation requirements.
Been doing an amazing job of it too. I just randomly chose this instance and I'm glad.
Likely experience and knowledge improving the quality of deployment. Most instances are likely underspecced, are on hosts that aren't easy to scale up with, or are maxed out in their current offering tier (lemmy.ml comes to mind there)