#3776: ZFS Mirroring: Why Your RAID Card Is the Weak Link

A hardware RAID card makes ZFS less safe. Here's why an HBA and a simple mirror are the real upgrade.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3955
Published: Jun 20
Duration: 25:05
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: hardware-reliability data-integrity home-lab

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

A home server user running ZFS on a single drive had a power supply failure that caused corruption. He recovered his data, but the experience raised a critical question: what comes next? The answer, it turns out, is not what most people assume. The common instinct is to add a hardware RAID controller, the kind of enterprise card that feels like a serious upgrade. But ZFS has a different philosophy: it wants to see bare metal. A hardware RAID card hides the real disk geometry, intercepts commands, and maintains its own write cache. When ZFS can't see the physical drives, it can't run its checksums properly, it can't scrub at the sector level, and its entire self-healing architecture breaks down. The real vulnerability is the "premium" solution. For a simple two-drive mirror, the motherboard's SATA ports are completely adequate. ZFS will see the drives, checksum everything, and self-heal if a bad sector appears. The only reason to add an HBA (host bus adapter) is if you need more ports than the motherboard provides, or if you want cleaner cabling and expandability. The sweet spot for home users is a used LSI SAS 2008-series card flashed to IT mode—available for $30–$60 on eBay. The flashing process is a one-evening project, or you can buy pre-flashed cards for a small premium. When adding a mirror drive, the procedure is straightforward: plug in the new drive, run zpool attach, and ZFS resilvers the data. The pool never goes offline. Redundancy transforms ZFS from a historian of your data's demise into an active repairman.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3776: ZFS Mirroring: Why Your RAID Card Is the Weak Link

Daniel sent us this one — he runs a home server on ZFS and just had the kind of near-miss that makes your stomach drop. Power supply failed, he rebuilt the machine, carried over the storage, and it came up corrupted. No redundancy, no RAID, single drive. He thinks he recovered everything, but he felt that cold wind blow through. He's already upgraded from consumer desktop SSDs to a proper server-grade drive with power-loss protection and DRAM cache, and the machine is on a UPS. So the low-hanging fruit is picked. His real question is what comes next. He's starting with a single drive, planning to add a second for mirroring, and he wants to know — is there a hardware controller you can add to the motherboard that would make multi-disk ZFS genuinely more robust? And given he's adding storage to an existing machine, what's the right order of operations? His constraint is he's a home user, not a data center — whatever we recommend has to be affordable and actually obtainable.

The thing about this question — and I love this question — is that it sounds like it's about hardware, but it's really about philosophy. ZFS has a philosophy. And the answer is going to sound almost backwards the first time you hear it.

The card you want is not a RAID card.

The card you want is absolutely not a RAID card. What you want is an HBA — a host bus adapter — flashed to IT mode, which just means it passes the raw disks straight through to the operating system without doing anything to them. No RAID logic, no caching layer, no firmware sitting between ZFS and the drives. ZFS wants to see bare metal.

The thing that makes it more robust is... getting out of the way.

A hardware RAID controller sits between ZFS and the drives and presents a virtual disk — it hides the real geometry, intercepts commands, has its own write cache with its own battery or capacitor, and ZFS can't see any of it. ZFS can't run its own checksums properly because the RAID card might silently rewrite a sector. It can't scrub at the physical level because it doesn't know which drive has a problem. If the RAID card's cache has a hiccup during a power event — and we just heard about a power event — ZFS might think data was committed when it wasn't. The whole self-healing architecture collapses.

The hardware RAID controller, which a normal person would assume is the serious enterprise solution, is actually the vulnerability.

It's the vulnerability dressed up as a solution. This is one of the most common misconceptions in home server land. People come from a Windows or traditional NAS background, they think hardware RAID is the premium tier, and they go buy a used Dell PERC or LSI MegaRAID thinking they've leveled up. And they've actually made their ZFS pool less safe than if they'd just plugged the drives into the motherboard SATA ports.

Which brings us to the counterintuitive truth. For a simple two-drive mirror, the motherboard SATA ports are completely fine. You don't even need an add-in card yet.

For two drives in a mirror, the cheap SATA ports on a consumer motherboard are adequate. ZFS will see the drives, checksum everything, scrub, and self-heal if one drive has a bad sector. The HBA earns its keep when you want more drives than your board has ports, or when you want cleaner cabling and expandability, or when your motherboard SATA controller is doing something weird — which does happen on some budget boards, but it's rare.

Let's say he wants to go the HBA route anyway. Future-proofing, cleaner topology, maybe he adds more drives later. What does that look like for a home user on a real budget?

The sweet spot is a used LSI SAS card flashed to IT mode. These are decommissioned enterprise cards that flood the secondhand market for somewhere between thirty and sixty dollars. The classic recommendation is something in the LSI SAS 2008 family — a 9207-8i, for example, which gives you eight internal SAS or SATA ports on a PCIe 2.0 x8 interface. That's more than enough bandwidth for spinning drives and still fine for most SATA SSDs. You can find them on eBay all day long.

Thirty to sixty dollars for an eight-port card that was probably five hundred dollars new a decade ago.

The decommissioned enterprise gear market is one of the few good things about the tech industry's upgrade cycle. These cards are built like tanks, widely supported in FreeBSD and Linux, and the flashing process — turning a card from its original RAID firmware to IT mode — is well-documented. It's not plug-and-play, I want to be honest about that. You're going to be booting into a UEFI shell or a FreeDOS USB stick and running a command-line flashing tool. It's an evening project. But it's a one-time thing, and there are step-by-step guides that walk you through every keystroke.

The project is: acquire a used LSI card, flash it to IT mode, confirm the OS sees raw disks, and only then attach the new drives through it.

That order is important. If he's going to add an HBA at all, the least disruptive path is to install, flash, and verify the controller first, while the pool is still simple — just the one drive on the motherboard SATA port. Make sure the card shows up in lspci, make sure a test drive shows up as a plain block device. Then power down, connect the new mirror drive to the HBA, boot back up, and attach it to the pool.

Rather than building the mirror on the motherboard ports and then trying to migrate it onto an HBA later.

Which is possible — ZFS is portable in ways that are almost magical — but it's unnecessary friction. You'd have to export the pool, physically recable, import the pool, and hope everything comes up clean. It probably will, but why risk it? Do the HBA first, let it prove itself, then expand.

If he skips the HBA entirely and just uses the motherboard ports for his two-drive mirror — what's the procedure for adding that second drive to an existing single-drive pool?

He's got a pool — let's call it tank — with one drive. He plugs in the second drive. He runs zpool attach tank existing-drive new-drive. ZFS starts resilvering immediately — copying all the data to the new drive to create the mirror. The pool was never offline, the data was accessible the whole time, and now he's got redundancy.

Resilver time on a home server with, what, a few terabytes? Hours, not days.

Probably three to eight hours depending on how full the drive is and whether it's spinning rust or SSD. And ZFS is smart about resilvering — it only copies data that's actually in use, not the empty space. If he's only using two terabytes of a four terabyte drive, the resilver only moves two terabytes.

Let's talk about what happens when something goes wrong in a mirror versus what happened to him. He had a single drive, no redundancy, and the power supply failure caused corruption. With a mirror, the same scenario plays out differently.

The power supply fails, the machine goes down hard. He rebuilds, boots back up. ZFS sees both drives in the mirror. If one drive has corrupted sectors from the unclean shutdown, ZFS detects the checksum mismatch, figures out which copy is good by comparing against the other drive, and silently repairs the bad sectors. He might not even know anything went wrong unless he checks zpool status. Without redundancy, a checksum error is just a checksum error — ZFS can tell you the data is bad but it can't fix it.

Redundancy turns ZFS from a tattletale into a repairman.

That's the phrase. ZFS without redundancy is a very accurate historian of your data's demise.

Which is not nothing. Knowing which file is corrupted is better than silently serving bad data for months until you open the photo and it's half gray. But it's cold comfort.

The prompt mentions he thinks he recovered everything — and I'm glad — but the uncertainty of "I think" is the thing that keeps you up at night. A mirror eliminates that uncertainty for most failure modes.

Let's go deeper on the hardware. You mentioned the LSI 2008 family. Are there other options? What about newer cards? And what's the obtainability story for someone who's not camping on eBay?

The LSI SAS 3000 series is the newer generation — PCIe 3.0, twelve gigabit per second SAS. Cards like the 9300-8i. They're faster, they run cooler, but they're also more expensive — usually a hundred to a hundred fifty dollars used. For a home server with SATA SSDs or spinning drives, the extra bandwidth is overkill. The 2008 series at six gigabit per second is already faster than any SATA SSD can push. So the 2008 cards remain the pragmatic sweet spot.

What about new cards? Something you can buy on Amazon without spelunking through used enterprise gear?

There are new cards — Broadcom owns LSI now — but a new 9400-8i is three hundred to four hundred dollars. That's data-center pricing. For a home two-drive mirror it's like buying a Formula One engine for your Corolla.

The used LSI card is the move, with the honest caveat that flashing it is a project.

It's a project. You get the card — and you want to make sure it's flashable. Some OEM variants from Dell or HP have locked firmware and need a cross-flashing procedure that's more involved. The safest bet is an LSI-branded card. The 9207-8i in particular has a very clean flashing path. You download the IT mode firmware from Broadcom's site, make a bootable USB with FreeDOS or the UEFI shell, run sas2flash, erase the existing firmware, flash the IT mode firmware, flash the BIOS if you want boot support, and you're done. Twenty minutes if you've done it before, maybe two hours the first time while you read forum posts.

If someone reads that and thinks, "I don't want to spend an evening in a UEFI shell," there are sellers on eBay who pre-flash these cards to IT mode and sell them for a small premium.

That's a completely reasonable path. You'll pay maybe seventy or eighty dollars instead of forty, and the card arrives ready to go. For a lot of home users, that premium is money well spent.

Let's talk about something the prompt didn't ask but is sitting right there.

Oh, I was hoping we'd get to this.

The infamous ZFS ECC debate. Does he need it?

Here's the honest answer. ZFS does not require ECC RAM any more than any other filesystem requires ECC RAM. The checksums ZFS stores on disk protect your data at rest. But — and this is the but that matters — without ECC, there's a theoretical failure pattern where bad RAM flips a bit in a data block before ZFS computes the checksum, so ZFS writes bad data with a good checksum, and you never know.

The "scrub of death" scenario.

That's what people call it, though I think the name is a bit dramatic. This failure pattern exists on every filesystem. If your RAM is silently corrupting data, ext4 will write the bad data, NTFS will write the bad data. ZFS is not uniquely vulnerable — it's just uniquely honest about the fact that it can't protect against memory corruption before the checksum is calculated.

ECC is good practice for any server, not a ZFS-specific requirement.

For a home server, the practical question is whether ECC is worth the cost of a platform change. ECC requires motherboard and CPU support. On the AMD side, most Ryzen chips support ECC unofficially, and some AM4 and AM5 motherboards will run it — but you have to check the specific board. On the Intel side, you generally need a Xeon or a Core i3 — bizarrely, the i3 supports ECC but the i5 and i7 don't — plus a workstation chipset like W680. If he's already on a platform that supports ECC, a couple of ECC UDIMMs are maybe twenty or thirty percent more than non-ECC and it's a no-brainer. If he'd need a whole new motherboard and CPU, that's a much harder sell for a two-drive home mirror.

The grounded advice is: if your current board and CPU support it, get ECC. If they don't, don't rebuild your whole machine around it — the motherboard SATA ports and a mirror are already giving you vastly more protection than you had before.

Don't let the perfect be the enemy of the massively improved.

Let's talk about scrub schedules. He's building a mirror. How often should he scrub?

The default on most ZFS distributions is a monthly scrub, and for a home server that's totally reasonable. A scrub reads every block on every drive and verifies every checksum. On a mirror, if it finds a mismatch, it can repair it immediately from the good copy. On a two-drive mirror of a few terabytes, a scrub might take an hour or two if it's all SSDs — it's not a heavy lift.

The point of regular scrubs is catching silent corruption early, before the other drive also develops a problem.

The nightmare scenario is: one drive develops a bad sector, you don't know because you never scrub, six months later the other drive develops a bad sector in the same spot, and now you've got a permanent checksum error. Regular scrubs catch the first failure while the second drive is still clean.

What about the scheduling? Middle of the night?

Set it for 2 AM on a Sunday. ZFS scrubs are designed to run while the pool is in use — they'll throttle themselves if the system is busy. On a home server that's mostly idle overnight, it'll just rip through it.

You mentioned scrub of death earlier. Since we're myth-busting — does scrubbing actually stress drives to the point of failure?

There's a kernel of truth here that got blown way out of proportion. A scrub does read every sector, so if a drive is on the verge of mechanical failure, the additional activity could push it over the edge. But that drive was going to fail anyway — probably the next time you watched a movie off it or ran a backup. The scrub didn't cause the failure, it revealed it. And revealing it during a scheduled maintenance window when you're awake and have a replacement plan is vastly better than having it fail silently and discovering it when the second drive also fails.

The scrub is the stress test that tells you the bridge is unsafe, not the truck that broke the bridge.

That's the right framing.

Let's zoom out to something the prompt alludes to but doesn't state directly. He's planning a mirror. When he eventually wants more space, is mirror still the right topology, or should he be thinking about RAIDZ?

This is the classic home server dilemma. A mirror — ZFS calls it a mirror vdev — gives you fifty percent storage efficiency. Two four-terabyte drives give you four terabytes usable. It's simple, it resilvers fast, it's easy to expand — you can add another mirror vdev later, or attach more drives to make a three-way mirror if you're paranoid. Performance is great, especially for random reads.

RAIDZ1 is like RAID 5 — single parity. You need at least three drives, and you get the capacity of all drives minus one. Three four-terabyte drives give you eight terabytes usable. RAIDZ2 is double parity — at least four drives, capacity of all drives minus two. The storage efficiency is better than mirrors, especially as you add more drives. But expanding a RAIDZ vdev has historically been difficult.

1, released in late 2021, added RAIDZ expansion. You can now add a single drive to an existing RAIDZ vdev and it'll reflow the data. But it's a relatively new feature, it's more complex under the hood than adding a mirror vdev, and the reflow process takes a long time because it has to rewrite essentially all the data. For a home user starting with one drive and planning to add a second, a mirror is the natural path. Down the road, if he wants to go to four or six drives, he could add more mirror vdevs — a pool of mirrors is a perfectly valid and very performant topology. Or he could create a new RAIDZ vdev and zfs send the data over.

For the foreseeable future — a two-drive or four-drive home server — mirrors are the pragmatic choice.

Mirrors are simple, fast, easy to understand, and easy to recover from. For a home user, those qualities matter more than squeezing out an extra terabyte of usable space.

Let's talk about the thing that RAID is not.

RAID — and ZFS mirrors — protect against drive failure. They do not protect against accidental deletion, ransomware, a power surge that takes out the whole machine, a fire, a flood, a toddler with a screwdriver. If you delete a file, ZFS happily deletes it from both sides of the mirror instantly.

The prompt mentions he recovered his data from a single corrupted drive. A mirror would have made that recovery unnecessary or automatic. But neither a single drive nor a mirror saves you from "oops, I rm -rf'd the wrong directory.

That's where an actual backup comes in. For a home server, the sweet spot is some kind of external drive or a second small machine that pulls ZFS snapshots. zfs send and zfs receive are incredibly efficient — they only transfer the blocks that changed between snapshots. You can set up a cron job to take a snapshot every hour or every day, and then replicate it to a backup target. If you accidentally delete something, you go back to the snapshot from before you deleted it.

The backup target doesn't need to be fancy.

An external USB drive formatted with ZFS is fine. A second small machine is fine. A cloud backup if your upload bandwidth can handle it. The point is that the backup is separate from the primary machine. A mirror inside the same case is redundancy, not a backup.

The power supply failure that started this whole story — if the machine had taken a voltage spike that fried both drives, the mirror would have been just as dead as the single drive.

Redundancy inside the box protects you from drive failure. It does not protect you from box failure. That's what a backup is for.

Let's circle back to the HBA question with something practical. Say he buys the used LSI card, flashes it, installs it. He's got his existing drive on the motherboard SATA port and he's about to connect the new mirror drive to the HBA. Is it okay for the two halves of a mirror to be on different controllers?

ZFS does not care which controller a drive is attached to. It sees block devices. It could be one drive on the motherboard SATA, one on the HBA — no problem at all. In fact, there's an argument that having the two drives on different controllers reduces the risk of a controller failure taking out both sides of the mirror simultaneously.

The migration path could be even simpler than we described. Install the HBA, connect the new drive to it, run zpool attach, done. The existing drive stays on the motherboard port. No recabling, no exporting the pool.

That's the cleanest path. And down the road, if he adds more drives, he can move everything to the HBA at his leisure — or not. ZFS does not care.

What about cables? If he's using SATA drives with a SAS HBA, does he need special cables?

The LSI cards use SFF-8087 connectors — those little rectangular SAS connectors. To connect SATA drives, you need an SFF-8087 to four SATA forward breakout cable. That's a cable that plugs into one SAS port on the card and splits out to four individual SATA connectors. They're about ten to fifteen dollars each, widely available, and they just work — SAS ports speak SATA natively.

The shopping list for the full HBA path is: one used LSI card, one breakout cable, and optionally a few extra dollars for a pre-flashed card.

That's the list. Under a hundred dollars all-in if you go pre-flashed, under sixty if you flash it yourself. For a home server, that's the right price point.

The alternative — which we should keep saying — is zero dollars, use the motherboard SATA ports, and still get all the ZFS mirror goodness.

The motherboard ports are not a compromise. For two drives, they are the option, and the HBA is the "I want to add more drives later" option. Both are correct.

Let's talk about one more thing that's easy to overlook. After he creates the mirror, how does he know it's working? How does he verify the redundancy is actually there?

zpool status is your friend. It'll show you the pool layout, the state of each drive, and any errors. After creating the mirror, you should see both drives listed under the mirror vdev with an ONLINE state. You can also run zpool scrub and then check zpool status again — it should show zero checksum errors. And here's a slightly nerve-wracking but definitive test: after the mirror is built and scrubbed, power down, physically disconnect one drive, boot back up. The pool should come up in a DEGRADED state but all your data should be accessible. Then reconnect the drive, run zpool online, and ZFS will resilver any changes.

The pull-the-plug test. The home user's version of chaos engineering.

I'd recommend doing it once, during the setup phase, when there's no important data on the line. It builds confidence in the configuration and teaches you what a degraded pool looks like so you recognize it if it happens for real.

It also surfaces any weirdness — like if your motherboard BIOS is set to something strange and the machine won't boot with a missing drive, better to find out during a test than during an actual failure at two in the morning.

Test your failure pattern. It's the least fun part of system administration but it pays off exactly once and that once makes all the difference.

To land this. The card he's asking about — the hardware controller that makes ZFS more robust — is an HBA in IT mode, which is fundamentally the opposite of a RAID card. A used LSI SAS 2008 series card is the pragmatic choice. It's thirty to sixty dollars, it's on eBay, it requires a flashing project that's well-documented. Or he can skip the card entirely and use his motherboard SATA ports, which for a two-drive mirror are completely adequate. The order of operations: if he's adding an HBA, install and flash it first, then connect the new drive through it, then zpool attach. ECC RAM is nice if his platform supports it, not worth a motherboard swap if it doesn't. Regular scrubs, and a backup that lives outside the machine.

That's the summary. I'd add one thing — the near-miss he described, the power supply failure and the corrupted drive, that's the kind of experience that either teaches you the lesson or doesn't. He clearly learned it. He's asking the right questions. The fact that he's thinking about this before adding the second drive, rather than after another failure, means he's already ahead of where most home server operators are.

The cold wind that makes you buy a jacket.

And now: Hilbert's daily fun fact.

Hilbert: In the early fifteen hundreds, fishermen on Lake Baikal used a knot called the Siberian lock hitch to lash their nets to wooden frames under the ice. The technique was lost for centuries until a preserved section of netting was pulled from the lake sediment in two thousand nineteen, still holding its original tension.

A knot that survived five hundred years underwater and still held tension. That's either incredible craftsmanship or Lake Baikal has very good archival properties.

I'm now wondering what else is at the bottom of Lake Baikal holding tension.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. You can find us at myweirdprompts dot com or on Spotify. If you've got a home server story or a question about ZFS — or anything else — send it our way.

Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3776: ZFS Mirroring: Why Your RAID Card Is the Weak Link

Downloads

You Might Also Like

#3776: ZFS Mirroring: Why Your RAID Card Is the Weak Link