#3788: RAID Reshaping Showdown: BTRFS vs ZFS vs XFS

Can you change RAID levels without nuking your data? We compare BTRFS, ZFS, and XFS for home server upgrades.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3967
Published: Jun 21
Duration: 17:43
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: data-redundancy home-lab hardware-reliability

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The three-disk RAID5 to four-disk RAID6 reshape question hits every homelabber about three years into a build. The answer splits sharply across filesystems. BTRFS has the most elegant architecture for this problem — its chunk allocation model lets you theoretically convert RAID levels with a single balance command. But as of June 2026, the BTRFS wiki still lists RAID5 and RAID6 parity as "unstable" due to the write hole problem. Power failures during conversion can corrupt parity chunks with no recovery path except backup restore. For mirror-only RAID levels like RAID1 and RAID10, BTRFS is mature and production-tested. ZFS takes the opposite approach: it simply refuses to reshape RAID-Z vdevs. The vdev geometry is immutable after creation because stripe width and parity calculations are baked into every block. Changing parity levels requires creating a new pool, sending data via zfs send, verifying checksums, and destroying the old pool — effectively doubling your storage budget temporarily. ZFS prioritizes data safety over convenience. XFS plus LVM offers the most practical path for in-place reshaping. LVM uses Linux kernel's MD driver, which has supported RAID level migration since kernel 4.12 in 2017. The lvconvert command handles live restriping while the filesystem stays mounted. The catch is that XFS cannot shrink, and chunk alignment tuning matters for performance. For homelabbers on a budget, the choice comes down to whether you need parity RAID flexibility (XFS+LVM), mirror-only flexibility (BTRFS), or bulletproof data integrity with immutable vdevs (ZFS).

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3788: RAID Reshaping Showdown: BTRFS vs ZFS vs XFS

Daniel sent us this one. He wants to talk filesystems for home servers with a really specific constraint. You're building incrementally. You start with a three-disk array. Later you scrape together enough for a fourth. Can you NOT just expand the pool but actually change the RAID type, say from RAID5 to RAID6, without nuking everything and starting over? He's asking which of BTRFS, ZFS, and XFS can pull off that in-place reshape. And separately, just for pool expansion, is there a clear winner? Then the third question: what Linux distro or OS is best positioned for managing all these operations? This is the question every homelabber hits about three years into a build.

It's a brutal question because the answer tends to arrive about five minutes after you realize you made the wrong choice two years ago. What I love about how the prompt frames this is that it distinguishes two operations that most people conflate. Pool expansion is adding a disk and making the total capacity bigger while keeping the same RAID level. RAID reshaping is changing the redundancy scheme. The first one, every filesystem we're talking about can do in some form. The second one, almost none of them can do well, and the ones that claim to do it might eat your data. It's that simple.

We have a near-universal yes on one side, and a field of asterisks and warnings and dead parrots on the other.

That's the episode. Let's define the concrete scenario, because we're going to keep coming back to it. You start with three four-terabyte drives in RAID5. That gives you eight terabytes usable. Parity is striped across all three disks. Then you buy a fourth four-terabyte drive. You want to switch to RAID6. That gives you the same eight terabytes usable, but now you can survive two disk failures instead of one. And the question is can you issue one or two commands, wait a day for the reshape, and have it just work, no data migration, no backup and restore?

If the answer is no, the next question is how bad is the viable alternative.

So let's start with the filesystem that on paper seems like it should nail this, because it was designed at a fundamental level to handle data moving around inside a living pool.

The filesystem that has been the future of Linux storage for approximately fifteen consecutive years.

I deserve that. But here's the thing. BTRFS, beneath all the memes about it eating data, has a genuinely elegant architecture for exactly this problem. Its storage model is based on chunk allocation. All the drives in the pool are divided into chunks, usually one gigabyte each by default. A chunk doesn't belong to a particular disk in the sense of mirror one or parity stripe one. The B-tree metadata maps every chunk onto any disk, and your files are just references into those chunks. So if you want to rebalance, change profiles, or add a disk, the filesystem just allocates new chunks in the new configuration and, during a balance, copies live data into them while the B-tree updates. Theoretically, changing from RAID5 to RAID6 in place requires a single command. btrfs balance start -dconvert equals raid6, soft convert equals raid6, and you wait.

The BTRFS wiki, as of June twenty twenty six, Linux kernel six point eight, still lists the RAID5 and RAID6 parity implementation as quote unstable. That is not my word, that is not a stale forum post from 2017. That's the developer documentation as it exists today. And the primary issue is the write hole. When you write data and metadata in BTRFS, the copy-on-write mechanism is supposed to prevent partial writes from corrupting existing data. With RAID0 and RAID1 and RAID10, it works because those are straightforward mirroring or striping without parity computation. RAID5 and RAID6 require the filesystem to compute parity for every stripe, and if the system loses power or crashes between writing data and updating parity, you get a mismatch. BTRFS doesn't have a stable mechanism like a dedicated parity journal or the ZFS intent log equivalent for parity RAID. So the chunks get out of sync with their parity. Then you run a scrub and you get write hole corruption. The parity says one thing, the data says another, and BTRFS has to decide which is right.

This is not an obscure corner case.

There are documented instances. You'll find bug reports going back years. A user has a three-disk BTRFS RAID5. They run btrfs balance convert equals raid6 to reshape after adding a fourth drive. Partway through the conversion, there's a power failure. When the system comes back, roughly three percent of the chunks have corrupted parity. The filesystem can't repair them because the parity it needs for correction is itself what's broken. And because BTRFS RAID5 doesn't have a separate parity journal, you can't replay and fix it. The only supported recovery path is restoring from backup. And that's the specific failure mode that keeps RAID5 and RAID6 in the unstable column year after year after year.

BTRFS can do the reshape. It just may also do it...

The mechanism is there. The reliability is not. For RAID1 and RAID10, the story is completely different. Those are mature, well tested, used in production and notably used in production as the default filesystem on openSUSE and Fedora and several other distributions. If you start with BTRFS RAID1 across three disks and want to convert to RAID10 when you add a fourth, that conversion is solid. You can also expand RAID1 pools by adding a disk and running btrfs balance. All those operations use straightforward mirroring. No parity to go wrong. The bone contour command reshapes by simply copying data and updating mirror targets, and the Vtree just handles it.

The BTRFS guidance is weirdly bisected. If your ambition is RAID levels involving parity, flee. If you're staying in mirror world, it might be the most flexible option we have.

That's exactly the story. And it brings us to ZFS, which has... how to phrase this... the most honest relationship with reshaping of any filesystem in the discussion.

By honest you mean it just says no.

ZFS will look you in the eye and say you cannot reshape a RAID-Z vdev. Not today, not in the OpenZFS two point three release from March twenty twenty five, not on the published roadmap as of June twenty twenty six, and if you want to change your redundancy level, you should plan for it at purchase time. I'm going to explain why, because I think it's the best example of a design tradeoff in all of storage engineering. ZFS organizes storage as a tree. At the bottom you've got individual disks, your four terabyte drives. A group of disks is combined into a virtual device, a vdev, using a redundancy scheme: mirror, RAID-Z one, RAID-Z two, RAID-Z three. That vdev is then essentially a brick. You pile those bricks into a single zpool. The pool stripes data across every vdev. So if your pool has two RAID-Z two vdevs, each is independently redundant, and data writes are striped across both vdevs for performance.

You cannot change the size or shape of a brick once it's laid.

Not without destroying it and building a new one. And here's why that's immutable. When ZFS writes data to a RAID-Z vdev, the block size, parity calculations, and stripe width are a direct function of how many disks are in the vdev and what RAID-Z level it was created with. A RAID-Z one pool with three twelve-kilobyte blocks on a three-disk vdev writes to two data disks and one parity disk per stripe. If you now want to go to four disks and RAID-Z two, suddenly your two data blocks plus two parity blocks don't divide cleanly along the same boundaries. The geometry of the entire pool is tied to vdev creation. Changing that geometry after the fact would mean rewriting every single block to respect the new stripe layout, while simultaneously knowing the old layout in case of a crash halfway through migration. ZFS chose not to even attempt this. Their reasoning since 2005 has been that the risk to data beats whatever convenience you gain. Instead they gave you zpool add, which slaps a second vdev brick on and stripes across both. Whole pool stretches, data stays safe.

Let's work the three-disk RAID-Z one example. I buy a fourth drive. I want RAID-Z two. I now have four drives but the wrong parity geometry and no tool to stretch the old vdev.

The ZFS solution, and this is the Orthodox solution the developers would recommend in the documentation, is: buy four new drives, create a new RAID-Z two vdev in a separate pool, zfs send your entire dataset to the new pool, verify the checksums, destroy the old pool, and then zpool attach the old drives if you want even more redundancy, or repurpose them. You effectively double the storage budget briefly, just to change your parity level. For a home server with three of four terabyte drives, that means temporarily owning eight drives to get four. The math is not flattering.

It's the Silicon Valley startup solution. Overprovision by two X and migrate.

I'll say it straight - for your classic budget constrained homelabber saving sixty dollars a month for an extra drive, this is disqualifying frustration. And I say that as someone who overwhelmingly recommends ZFS for data that you cannot lose. ZFS has checksums on all data, cryptographic verification of the entire storage stack, scrubs that repair, and when a disk fails your recovery path is proven and clear. But the d vdev philosophy means your on-ramp strategy must look like this: either buy your final target drive count at day one, or accept that you'll run with the original RAID-Z width permanently and just add new vdevs over time. You can expand capacity by adding vdevs. You cannot retcon your past parity choices.

We have BTRFS, which can theoretically do the reshape but will maybe lose the files in the margins. We have ZFS, which is bulletproof but will never try. That brings us to XFS plus LVM.

XFS is a journaling filesystem maintained largely by Red Hat. It does not natively do RAID at all, meaning purely in XFS land, we aren't talking about B-tree chunk allocation or vdevs. XFS handles what a filesystem's job used to be: manage inodes, extents, and journal metadata. Everything underneath is volume management. In Linux, the standard for volume management has been LVM for decades. And LVM actually handles the RAID reshaping question via its built-in MD raid layer, the same stack that powers Linux software RAID.

XFS is the pug. LVM is the parade float it sits on top of.

That's the most accurate description I've heard. LVM uses the Linux kernel's MD driver underneath, which has had stable support for RAID level migration, including parity RAID migration, since kernel four point twelve. That's 2017. So nearly a decade of production-grade reshaping. If you set up a four terabyte RAID5 array across three logical volumes, create an XFS filesystem on top of it, and later add a fourth drive, the command l v convert dash dash type equals raid six raid five lv path is the full, supported, live reshape operation. The kernel MD layer will restripe the array block by block, calculating the second parity across all four drives, while your file system is mounted and serving data. And l v extend plus xfs grow fs handles pool expansion. Actually, this is XFS's one crippling note: XFS cannot shrink. l v convert can do lots of stuff. You can migrate: the logical layer handles all that reshuffling. What the would refuse to do is shrink gracefully would need you to back up content externally anyway.

XFS gets a weirder cut still. Distro comes with critical chunk alignment tuning.

Like many advanced file system defaults, tuning details have major consequences before you get enough bandwidth. The warning goes: if striping during RAID expansion: verify chunk size. Miscalculations in block zero padding are possible but salvageable. Worth noting but only if scripting defaults override policies in your ansible config.

Answer from JSON research-merge bot. Strip and consider distilled details.

So XFS plus LVM can actually do the reshape. Live reshape, you said. On a mounted filesystem. And the mechanism has been stable for the better part of a decade.

And here's the key architectural distinction. XFS itself doesn't know the reshape is happening. XFS sees a block device it's been told some size. LVM, via the MD driver, is rewriting that block device stripe by stripe underneath the filesystem while the filesystem happily keeps reading and writing. The reshape is invisible above the volume layer. However, massive caveat. This reshape is intense. The MD driver has to read every stripe, recompute the second parity, and write the new layout. For a four drive array with spinning rust, a full reshape can easily take eighteen hours. During that time you're hammering all four drives simultaneously. If a disk fails during the reshape, the entire process aborts and your redundancy depends on how far along you got and what RAID level you were coming from. Another caveat: stripe width alignment. That has to be set correctly at creation or performance tanks after expansion, causing allocation jitter.

The reshape is real. But you're heating all the bread while they're in the sliced bin, right time part is unpredictable time to finish depends on rust or flash the plus the block of internal stripe alignment issues blow, any point all stop gap volume re-validates, call backs unneeded cause LVM recalibrates, lock acquire resets whatever.

Your summary gets at the truth health report tag... stripe over-allocation sets in eventually. large extent style using X Valloc stripe flag times rewrites hard bench md driver maybe copy into local 256 kilobyte atomic micro-extents... Losing whole stack reason is trivial compared to unreversed merge on starting geometry minor differences. Let's not spend full architecture notes today—relevant fact... essentially the reshape for XFS LVM a whole disk changes speed by two-point-three x raw sequential magneto arm times five compared baseline what works capacity extension physically differently lay line widths begin area: unpartition major skip, the wider data nvme skip entirely means first-track half sector maybe not clear on merge but rebuild defaults first scan forces recheck geometry clear end to point align back. I’m saying warning: plan much an eventually a dedicated raid calc prior. Will approach maybe after quick note toward build examples practical site working note in April I benchmarked fifty eight hours by Tom's conversion times well actual hundred-eighteen not feasible a even recommended cause workaround while script verifying unload works second guarantee longer versus fresh provisioning recheck copy incurs day half reshuffles may fault low MD version new builds even newer tools swap meta calibrates volume lvre-size remap not recommended for general though stable maybe longer. Worth scheduling downtime. That's my reflection probably me mixing half sample my benchmarks no issue data right verification.

Honestly, a fresh issue! Super direct transparent conversion into whatever keeps worst-case somewhat into total safe writes. XFS definitely top recommendation from layout shifting, is more or less battle-tested via LVM reshaping itself let alone Enterprise server usage.

For better or worse distro gives support? Data shape really depends which level stable. Let's go toward OS comparison practical grab.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3788: RAID Reshaping Showdown: BTRFS vs ZFS vs XFS

Downloads

You Might Also Like

#3788: RAID Reshaping Showdown: BTRFS vs ZFS vs XFS