Reminds me of the time when AI image generation was new, and someone generated a bunch of screwed up road intersections with stuff like circular crosswalks and whatnot. Everyone was like "humans can't fix the traffic, but don't worry, computers can't fix it either." ...I think about it a lot.
You absolutely can regex (some) html if you sanitize and maybe convert it beforehand.
Btw, why are parsers always built to support the whole thing and maybe throw an error on or just consume unsupported shenanigans? That's how you get security vulnerabilities in picture formats. Instead of just picking the things you support and ignoring the rest.
I'm not sure I understand your reply, but if your second sentence is saying that this image is AI-generated then you might like to know that this building is in Belgium and there's otherphotos online.