I got 4 20TB drives from Amazon around Black Friday that I want to get setup for network storage. I’ve got 3 descent Ryzen 5000 series desktops that I was thinking about setting up so that I could build my own mini-Kubernetes cluster, but I don’t know if I have enough motivation. I’m pretty OCD so small projects often turn into big projects.
I don’t have an ECC motherboard though, so I want to get some input if BTRFS, ZFS, TrueNAS, or some other solution should be relatively safe without it? I guess it is a risk-factor but I haven’t had any issues yet (fingers crossed). I’ve been out of the CNCF space for a while but Rook used to be the way to go for Ceph on Kubernetes. Has there been any new projects worth checking out or should I just do RAID and get it over with? Does Ceph offer the same level of redundancy or performance? The boards have a single M.2 slot so I could add in some SSD caching.
If I go with RAID, should I do RAID 5 or 6? I’m also a bit worried because the drives are all the same so if there is an issue it could hit multiple drives at once, but I plan to try to have an online backup somewhere and if I order more drives I’ll balance it out with a different manufacturer.
Based on the hardware you have I would go with ZFS (using TrueNAS would probably be easiest). Generally with such large disks I would suggest using at least 2 parity disks, but seeing as you only have 4 that means you would lose half your storage to be able to survive 2 disk failures. Reason for the (at least) 2 parity disks is (especially with identical disks) the risk of failure during a rebuild after one failure is pretty high since there is so much data to rebuild and write to your new disk (like, it will probably take more than a day).
Can’t talk much about backup as I just have very little data that I care enough about to backup, and just throw that into cloud object storage as well as onto my local high-reliability storage.
I have tried many different solutions, so will give you a quick overview of my experiences, thoughts, and things I have heard/seen:
Single Machine
Do this unless you have to scale beyond one machine
ZFS (on TrueNAS)
FreeNASTrueNAS GUI.MDADM
BTRFS
UnRaid
Raid card
JBOD
Multi-machine
Ceph
SeaweedFS
Tahoe LAFS
MooseFS/LizardFS
Gluster
Kubernetes-native
Consider these if you’re using k8s. You can use any of the single-machine options (or most of the multi-machine options) and find a way to use them in k8s (natively for gluster and some others, or via NFS). I had a lot of luck just using NFS from my TrueNAS storage server in my k8s cluster.
Rook
Longhorn
OpenEBS
Thanks so much for the very detailed reply. I think at this point I’m conflicted between using TrueNAS or going all in and trying SDS. I’m leaning towards SDS primarily because I want to build experience, but heck maybe I’ll end up doing both and testing it out first and see what clicks.
I’ve setup Gluster before for OpenStack and had a pretty good experience, but at the time the performance was substantially worse than Ceph (however it may have gotten much better). Ceph was really challenging and required a lot of planning when I used it last in a previous role, but it seems like Rook might solve most of that. I don’t really care about rebuild times… I’m fine if it takes a day or two to recover the data as long as I don’t lose any.
As long as I make sure to have an offsite backup/replica somewhere then I guess I can’t go too wrong. Thanks for explaining the various configurations of Gluster. That will be extremely helpful if I decide to go that route, and if performance can be tuned to match Ceph then I probably will.