Sorry but you are just talking assumptions without even having looked at the facts.
Its not cheap, but basically a single toptier gaming desktop with an additional graphics card (or 2) is literally all you need.
I know multiple people who work normal IT jobs that have already started on setting up their own. They plan on running them for their whole family, many users at a time from the same machine.
And this is before even considering how fast open source moves, i am expecting quantized models which can have double speed for negligible quality impact any second now.
Building my entire data model around the Tienanmen Square copypasta. I can run this thing on a Raspberry Pi plugged into a particularly starchy potato and it reliably returns the only answer I've thought to ask it.