Bad news everyone. It is with immense regret that I write to inform you we have suffered a total loss of data for firefish.lgbt, musician.social, and outdoors.lgbt.
How did we get here?
During a routine #GitOps repository cleanup a subdirectory containing yaml manifests that create our namespaces ...
The Kubernetes ecosystem is full of tools and addons to help solve particular problems (often utilizing the dynamic nature of K8s), but each of these brings additional complexity, which add up over time until it's very hard to intuitively reason about the consequences of change.
I personally prefer my IaaC with a manual review & approval step. Once you get more automated, the testing complexity & cost (and need for additional dev/test environments), and of course risk increases.
It's a shame that the backup/restore testing didn't work in this case, though. These kind of TIFUs are better with a happy-ish end.