Btrfs is still unacceptably less reliable than ZFS, after _decades_ of development. This is unacceptable, IMHO. I've lost so much data due to btrfs corruption issues that I've (almost) stopped to use it completely nowadays. It's better to fight to keep the damned OpenZFS modules up to date and get an actual _reliable_ system instead of accepting the risk again.
> I've lost so much data due to btrfs corruption issues that I've (almost) stopped to use it completely nowadays.
Just out of curiosity: is there a specific reason you're not using plain-vanilla filesystems which _are_ stable?
Personal anecdote: i've only ever had serious corruption twice, 20-ish years ago, once with XFS and once with ReiserFS, and have primarily used the extN family of filesystems for most of the past 30 years. A filesystem only has to go corrupt on me once before i stop using it.
Edit to add a caveat: though i find the ideas behind ZFS, btrfs, etc., fascinating, i have no personal need for them so have never used them on personal systems (but did use ZFS on corporate Solaris systems many years ago). ext4 has always served me well, and comes with none of the caveats i regularly read about for any of the more advanced filesystems. Similarly, i've never needed an LVM or any such complexity. As the age-old wisdom goes, "complexity is your enemy," and keeping to simple filesystem setups has always served my personal systems/LAN well. i've also never once seen someone recover from filesystem corruption in a RAID environment by simply swapping out a disk (there's always been much more work involved), so i've never bought into the "RAID is the solution" camp.
- ZStandard compression is a performance boost on crappy spinning rust
- Snapshots are amazing, and I love being able to quickly send and store them using send and receive
- I like not having to partition the disk at all, and still be able to have multiple datasets that share the same underlying storage. LVM2 has way too many downsides for me to still consider it, like the fact that thin provisioning was quite problematic (i.e. ext4 and the like have no idea they're thin provisioned, ...)
- I like not having to bother with fstab anymore. I have all of my (complex) datasets under multiple boot roots, and I can mount pools from a live with an altroot and immediately get all directories properly mounted
- AFAIK only ZFS and Btrfs support checksums out of the box. I hate the fact that most FS can in fact bitrot and silently corrupt files. With ZFS and Btrfs in theory you can't easily restore your data, but at least you'll know if it got corrupted and restore it from a backup
- I like ZVOL; I appreciate being able to use them as sparse disks for VMs that can be easily mounted without using loopback devices (you get all partitions under /dev/zvol/pool/zvol-partN)
- If you have a lot of RAM,the ZFS ARC can speed up things a lot. ZFS is somewhat slower most of the time than "simpler" FS, but with 10+ GB availble to ARC it's been faster in my experience than any other FS
I do use "classic" filesystems for other applications, like random USB disks and stuff. I just prefer ZFS because the feature set is so good and it's been nothing but stable in day to day use. I've literally had ZERO issues with it in 8+ years - even when using the -git version it's way more stable than Btrfs ever was.
> Just out of curiosity: is there a specific reason you're not using plain-vanilla filesystems which _are_ stable?
I'd guess that it is the classic case of figuring out if something works without using it being a lot harder than giving it a go and seeing what happens. I've accidentally taken out my own home folder in the past with ill-advised setups and it is an educational experience. I wouldn't recommend it professionally, but I can see the joy in using something unusual on a personal system. Keep backups of anything you really can't afford to lose.
And one bad experience isn't enough to get a feel for how reliable something is. It is better to stick with it even if it fails once or twice.
> And one bad experience isn't enough to get a feel for how reliable something is.
For non-critical subsystems, sure, but certain critical infrastructure has to get it right every time or it's an abject failure (barring interference from random cosmic rays and similar levels of problems). Filesystems have been around for the better part of a century, so should fall into the category of "solved problem" by now. i don't doubt that advanced filesystems are stupendously complex, but i do doubt the _need_ for such complexity beyond the sheer joy of programming one.
> It is better to stick with it even if it fails once or twice.
Like a pacemaker or dialysis machine, one proverbial strike is all i can give a filesystem before i switch implementations.
If the file isn't in source control, a backup, or auto-synced cloud storage, it can't be _that_ important. If it was in either, it could be recovered easily without replacing one's filesystem with one which needs hand-holding to keep it running. Shrug.
ZFS is the mechanism by which I implement local (via snapshots) and remote (via zfs send) backups on my user-facing machines.
- It can do 4x 15-minute snapshots, 24x hourly snapshots, 7x daily snapshots, 4x weekly snapshots, and 12x monthly snapshots, without making 51 copies of my files.
- Taking a snapshot has imperceptible performance impact.
- Snapshots are taken atomically.
- Snapshots can be booted from, if it's a system that's screwed up and not just one file.
- Snapshots can be accessed without disturbing the FS.
In my experience it hasn't required more hand-holding than ext4 past the initial install, but the OSes that most of my devices use either officially support ZFS or don't use package managers that will blindly upgrade a kernel past what out-of-tree modules I'm using will support, which I think fixes the most common issue people have with ZFS.
Funny because I have the opposite experience. The main issue with btrfs is a lack of tooling for the layperson to not require btrfs-developer level knowledge to fix issues.
I've personally had drive failures, fs corruptions due to power loss (which is supposed not to happen on a cow filesystem), fs and file corruption due to ram bitflips, etc. All the times btrfs handled the situation perfectly, with the caveat that I needed the help from the btrfs developers. And they were very helpful!
So yeah, btrfs has a bad rep, but it is not as bad as the common feeling makes it look like.
(note that I still run btrfs raid 1, as I did not find real return of experience regarding raid 5 or 6)
Someone correct me if I'm wrong but to my understanding FB uses Btrfs in either RAID 0, 1, or 10 only and not any of the parity options.
RAID56 under Btrfs has some caveats but I'm not aware of any annecdata (or perhaps I'm just not searching hard enough) within the past few weeks or months about data loss when those caveats are taken under consideration.
> RAID56 under Btrfs has some caveats but I'm not aware of any annecdata (or perhaps I'm just not searching hard enough) within the past few weeks or months about data loss when those caveats are taken under consideration.
Yeah this is something that makes me consider trying raid56 on it. Though I don't have enough drives to dump my current data while re-making the array :D (perhaps this can be changed on the fly?)
Out of curiosity, how much total storage do you get with that drive configuration? I've never tried "bundle of disks" mode with any file system because it's difficult to reason about how much disk space you end up with and what guarantees you have (although raid 1 should be straightforward, I suppose).
I get half of the raw capacity, so 7.5TB. Well a bit less due to metadata, 7.3TB as reported by df (6.9TiB).
For btrfs specifically there is an online calculator [1] that shows you the effective capacity for any arbitrary configuration. I use it whenever I add a drive to check whether it’s actually useful.
Just want to do a follow up and make a correction that the command to go from whatever to RAID 6 for data and RAID 1c3 for metadata in Btrfs is instead: `btrfs balance -dconvert=raid6 -mconvert=raid1c3 /` instead of what I originally posted
fs corruption due to power loss happens on ext4 because the default settings only journal metadata for performance.
I guess if everything is on batteries all the time this is fine, intolerable on systems without battery.
The FS should not be corrupted, only the contents of files that were written around the time of the power loss. Risking only file contents and not the FS itself is a tradeoff between performance and safety where you only get half of each. You can set it to full performance or full safety mode if you prefer.
This. I may still give up on running ZFS on Linux due to the common (seemingly intentional from the Linux side) breakage, but for my existing systems switching them over to CachyOS repos has been a blessed relief.