I'm looking for experiences and opinions on kubernetes storage.
I want to create a highly available homelab that spans 3 locations where the pods have a preferred locations but can move if necessary.
I've looked at linstore or seaweedfs/garage with juicefs but I'm not sure how well the performance of those options is across the internet and how well they last in long term operation.
Is anyone else hosting k3s across the internet in their homelab?
I've been using backblaze b2 (via s3fs-fuse container + bidirectional mount propagation to a host path) and a little bit of google drive (via rclone mount + the same mounting business) within kubernetes. I only use this for tubearchivist which I consider to be disposable. No way I'm using these "devices" for anything I really care about. I haven't tried gauging the performance of either of these, but I can say, anecdotally, that both are fine for tubearchivist to write to in a reasonable amount of time (the bottleneck is yt-dlp ingesting from youtube) and playback seems to be on par with local storage with the embedded tubearchivist player and jellyfin. I've had no issues with this, been using it about a year now, and overall I feel it's a decent solution if you need a lot of cheap-ish storage that you are okay with not trusting.
You don't want to try and span locations on a Container/hypervisor level. The problem is that there is likely to much latency between the sites which will screw with things. Instead, set up replicated data types where it is necessary.
The problem is that I want failover to work if a site goes offline, this happens quite a bit with private ISP where I live and instead of waiting for the connection to be restored my idea was that kubernetes would see the failed node and replace it.
Most data will be transfered locally (with node affinity) and only on failure would the pods spread out.
The problem that remained in this was storage which is why I'm here looking for options.
I guess the network will be a bottleneck on Garage too. If you want high performance you might need a hybrid solution, like clustering of stateful apps on local storage as well as periodic full backups on a distributed storage.
Rook-ceph for sure. And echoing another comment, come join homes operations discord, we have a heap of info and people experienced with kubernetes with homelabbing
https://discord.gg/home-operations
One thing I recently found out is that ceph wants whole drives. I could not get it to work with partitions. I got it to work with longhorn, though I'm still setting things up.
I tried Longhorn, and ended up concluding that it would not work reliably with Volsync. Volsync (for automatic volume restore on cluster rebuild) is a must for me.
I plan on installing Rook-Ceph. I'm also on 1Gb/s network, so it won't be fast, but many fellow K8s home opsers are confident it will work.
Rook-ceph does need SSDs with Power Loss Protection (PLP), or it will get extremelly slow (latency). Bandwidth is not as much of an issue.
Find some used Samsung PM or SM models, they aren't expensive.
Longhorn isn't fussy about consumer SSDs and has its own built-in backup system. It's not good at ReadWriteMany volumes, but it sounds like you won't need ReadWriteMany. I suggest you don't bother with Rook-Ceph yet, as it's very complex.
Also, join the Home Operations community if you have a Discord account, it's full of k8s homelabbers.
I'll try Rook-Ceph, Ceph has been recommended quite a lot now, but my nvme drives sadly don't have PLP. Afaict that should still work because not all nodes will face power loss at the same time.
I'd rather start with the hardware I have and upgrade as necessary, backups are always running for emergency cases and I can't afford to replace all hard drives.
I'll join Home Operations and see what infos I can find
The problem with non-PLP drives is that Rook-Ceph will insist that its writes get done in a way that is safe wrt power loss.
For regular consumer drives, that means it has to wait for the cache to be flushed, which takes aaaages (milliseconds!!) and that can cause all kinds of issues.
PLP drives have a cache that is safe in the event of power loss, and thus Rook-Ceph is happy to write to cache and consider the operation done.
Again, 1Gb network is not a big deal, not using PLP drives could cause issues.
If you don't need volsync and don't need ReadWriteMany, just use Longhorn with its builtin backup system and call it a day.
I want the failover to work in case of internet or power outage, not local cluster node failure. Multiple clusters would make configuration and failover across locations difficult or am I wrong?
I guess i shouldn't have answered, I do have experience with multiple storage-classes, but none of the classes you mention (so like i don't really know anything about them). I envisioned you dealing with pod level storage issues and thought that'd be something most programs would have lots of difficulty dealing with, where as a more service oriented approach would expect remote failures (hence the recommendation).
All of the things you mentioned don't seem like they have provisioners, so maybe you mean your individual nodes would have these associated remote fs'. At that point i don't think kubelet cares, you just mount those on the machines and tell kubelet about it via host mount