> A hard power cycle on a 3 device pool (data single, metadata DUP, DM-SMR disks...

scottlamb · 2026-04-06T15:04:09 1775487849

Might be true, but I don't see any aspect of that which is relevant to this event:

* Data single obviously means losing a single drive will cause data loss, but no drive was actually lost, right?

* Metadata DUP (not sure if it's across 2 disks or all 3) should be expected to be robust, I'd expect?

* I certainly eye DM-SMR disks with suspicion in general, but it doesn't sound like they were responsible for the damage: "Both DUP copies of several metadata blocks were written with inconsistent parent and child generations."

zootboy · 2026-04-06T16:08:34 1775491714

> Metadata DUP (not sure if it's across 2 disks or all 3) should be expected to be robust, I'd expect?

No. DUP will happily put both copies on the same disk. You would need to use RAID1 (or RAID1c3 for a copy on all disks) if you wanted a guarantee of the metadata being on multiple disks.

scottlamb · 2026-04-06T17:02:04 1775494924

Wow, yuck. (The "Why do we even have that lever?!" line comes to mind.)

...even so, without a disk failure, that probably wasn't the cause of this event.

zootboy · 2026-04-06T17:27:18 1775496438

The DUP profile is meant for use with a single disk. The RAID* profiles are meant for use with multiple disks. Both are necessary to cover the full gamut of BTRFS use cases, but it would probably be good if mkfs.btrfs spat out a big warning if you use DUP on a multi-disk filesystem, as this is /usually/ a mistake.

singron · 2026-04-07T18:58:19 1775588299

ZFS has similar configurations possible (e.g. copies).

You can end up in this state with btrfs if you start with a single device (defaults to data=single,metadata=dup), and then add additional devices without changing the data/metadata profiles. Or you can choose this config explicitly.

I really wish the btrfs-progs had a --this-config-is-bad-but-continue-anyway flag since there are so many bad configurations possible (raid5/raid6, raid0/single/dup). The rescue tools are also bad and are about as likely to make the problem worse as fix it.