#2475: Docker Volumes: Why They Can't Move and What To Do

Docker made apps portable but left your data stuck. Here's how to actually move volumes between hosts.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2633
Published: Apr 27
Duration: 24:02
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: docker backup-strategies data-storage

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Docker made application portability look easy: write a Dockerfile, build an image, run it anywhere. But anyone who's tried to move a running service to a new host discovers the hard truth — your data doesn't come with you.

Docker volumes are not portable by design. The default local volume driver stores everything under /var/lib/docker/volumes/, a path tied to the host's filesystem and specific storage driver (overlay2, aufs, etc.). You can't just SCP that folder to another server and expect it to work.

The Workarounds Everyone Discovers

The most common fix is the tar over SSH pipeline: spin up a temporary Alpine container, mount the volume, archive it to stdout, pipe that over SSH, and extract into a new volume on the destination. One command, no intermediate files. It works, but it's fragile — for large volumes, a dropped connection means starting over.

For incremental transfers, rsync is better. Use docker volume inspect to find the mount point, then rsync directly from the underlying data directory. Only changed blocks transfer, making it far more efficient for ongoing syncs.

The Database Trap

The most dangerous mistake is copying a database volume while the container is running. The files on disk are in an inconsistent state — restore that backup and your database won't start because the write-ahead log is incomplete.

The rule: if it's a database, use the database's native dump tool. pg_dumpall for PostgreSQL, mysqldump for MySQL. Transfer the dump file, restore into a fresh container. Flat files can be rsynced, but ideally stop the container first.

The Spectrum of Solutions

There's a clear tradeoff spectrum:

Cloud storage drivers (AWS EBS, Azure Files, Portworx): maximum portability, maximum complexity, maximum cost and vendor lock-in.
Named Docker volumes with manual scripts: the middle ground — tar, rsync, and shell scripts. Works but feels like duct tape.
Bind mounts: mount a host directory directly into the container. Your data becomes a regular directory — rsync it, back it up with any tool, move it between hosts without Docker involved. The tradeoff: you lose Docker's volume management features like docker volume prune.

Backup Strategies That Actually Work

For small config volumes (a few hundred MB that barely change), a full tar copy every six months is fine. For growing database volumes, it's wildly inefficient — you're transferring everything even when 99% is unchanged.

Incremental tools like restic, Borg, and Duplicati solve this. Restic encrypts backups, deduplicates across snapshots, and supports S3, SFTP, and local filesystem backends. The catch: restore requires the full backup plus every incremental in sequence. A corrupted incremental breaks the chain.

The pragmatic approach: use restic for daily incrementals, but supplement with periodic full volume copies to limit blast radius.

The Bigger Lesson

The people happiest with Docker in production treat volumes as disposable or externalize state entirely. Use managed databases, object storage (S3), or NFS mounts. Docker becomes a stateless compute layer — which is what it's best at.

For the solo developer or small team running Docker on a home server or two hosts, the honest answer is: learn rsync, write shell scripts, and accept that Docker's brilliant abstraction of runtime environments came with a deliberate punt on data portability.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2475: Docker Volumes: Why They Can't Move and What To Do

Daniel sent us this one, and it's the kind of question that only comes from someone who's been burned by Docker at two in the morning. He's pointing out the fundamental irony — Docker solves the "works on my machine" problem beautifully for applications, but when you need to move the actual data, the volumes, between hosts, suddenly you're on your own. He wants to know why that gap exists, what tools actually work for migrating volumes between environments, and whether using full volume copies as a periodic backup strategy makes any sense. There's a lot to unpack here.

Before we dive in, quick note — today's script is being written by DeepSeek V four Pro. So if anything sounds unusually coherent, that's why.

I was going to say, if I sound smarter than usual, don't get used to it.

Alright, so this is one of those topics where the frustration is completely justified. Docker volumes are not portable by design. That's not a bug exactly, but it's definitely not a feature anyone celebrates. The default local volume driver stores everything under var lib docker volumes, and that path is tied to the host's filesystem and the specific storage driver — overlay two, aufs, whatever that machine is running. You can't just SCP that folder to another server and expect it to work.

This is where the promise and the reality collide, right? Docker says, here's your container, it runs the same everywhere. And you think, great, I'll just take the whole thing — app, data, everything — and move it. But the container is just the runtime. The data lives in this other thing that Docker treats as almost an afterthought. It's like being promised a fully furnished apartment and arriving to find the furniture is bolted to the floor of the old building.

The tar over SSH pipeline is the workaround everyone eventually discovers. You spin up a temporary Alpine container, mount the volume, archive it to standard output, pipe that over SSH, and extract it into a new volume on the destination. One command, no intermediate files written to disk. It works, but calling it elegant would be generous.

It's the kind of solution where you feel clever for about thirty seconds, and then you realize you're going to have to do this every time.

For large volumes, it's painful. If you have a two terabyte database volume, that tar pipeline is going to take hours, and if the connection drops halfway through, you start over. That's where rsync becomes the better option. You find the volume mount point with docker volume inspect, and then rsync directly from the underlying data directory to the destination. It only transfers changed blocks, so for incremental syncs it's far more efficient.

Here's the thing Daniel is really getting at — why is this the state of affairs? Docker had years to build a docker volume push command, or docker volume replicate, something. They didn't. Was that a deliberate choice or just neglect?

I think it was deliberate in the sense that Docker's philosophy has always been to defer to existing tools where they already work well. Backup and migration are OS level concerns. Docker's position was, we're not going to reinvent those wheels poorly. The problem is that by not providing even a thin wrapper, they left users with a manual, error prone process that feels like it should have been solved.

It creates this strange inversion. The application layer becomes trivially portable — docker compose up and you're running — but the state, the thing that actually matters, is pinned to a specific host. You can move the app anywhere, but it arrives amnesiac.

Unless you handle the data separately. And that's the key insight. The people who are happiest with Docker in production are the ones who treat volumes as disposable or who externalize state entirely. If your database is on a managed service, or your files are in S three, or you're using NFS mounts, then Docker volumes become an implementation detail you barely think about.

Let's talk about that. Daniel mentioned moving entire environments between computers. If you're a small team or a single developer, you might not have a managed database service. You might just have two servers and Docker on both. What does a realistic migration actually look like?

The OneUptime guide from February this year lays out a full migration script that covers the whole process. Step one, you export all your images with docker save. Step two, you stop the containers so the data is consistent. Step three, you archive all the volumes with tar. Step four, transfer everything with rsync. Step five, restore on the destination. Step six, verify with health checks and log inspection. It's thorough, but it's also a lot of steps that you have to get right.

Step two is doing a lot of heavy lifting there. Stopping the containers. If you don't stop the database container before copying its files, you're almost guaranteed to corrupt the data. The files on disk are in an inconsistent state while the database is running.

And that's one of the biggest traps people fall into. They think, it's just files, I'll copy them while the container is running, what's the worst that could happen? The worst is you restore that backup and your database won't start because the write ahead log is incomplete. For PostgreSQL, the safe approach is pg dump all while the container is running, transfer the dump file, and restore into a fresh container. Same idea with MySQL dump. Use the database's native tools.

The rule is, if it's a database, use the dump tool. If it's flat files, you can rsync, but ideally stop the container first. That's already two different workflows for what should conceptually be one operation — move my stuff.

It gets more complicated when you factor in volume drivers. The default local driver is the least portable option. If you switch to the NFS driver — which is technically the local driver with NFS options — you get medium portability within a network. Your data lives on an NFS share, so multiple hosts can access it. Move the container to another host, point it at the same NFS mount, and it just works. But now you've introduced a network dependency and a single point of failure.

If you want real portability across regions, you're looking at cloud storage drivers. AWS EBS, Azure Files, Portworx, REX-Ray. Those give you managed replication, snapshots, cross region copies. But now you're paying for managed storage, and you're locked into a specific cloud provider's driver. The simplicity of "it's just Docker on a server" evaporates.

There's a real gap between "Docker on one machine" and "full orchestration with Kubernetes persistent volume claims." For the single server user, the advice is basically, learn rsync and write shell scripts. For the Kubernetes user, you have a whole abstraction layer for storage. The middle ground is thin.

Which brings us to bind mounts. Every guide I've read that tries to solve this problem pragmatically ends up saying, just use bind mounts instead of named Docker volumes. Mount a directory from the host filesystem directly into the container, and suddenly your data is just a regular directory. You can rsync it, back it up with any tool, move it between hosts without Docker even being involved.

The trade off is you lose Docker's volume management features. Docker volume prune won't clean up bind mounts. You can't use volume drivers with bind mounts. You're essentially opting out of Docker's storage abstraction entirely. But for portability, it's the simplest answer. The blog post from November twenty twenty four by Stephen Foskett explicitly recommends using native filesystem storage instead of Docker volumes specifically for this reason.

We have this spectrum. On one end, managed volumes with cloud drivers — maximum portability, maximum complexity, maximum cost. On the other end, bind mounts — zero Docker features, maximum simplicity, maximum portability. And in the middle, named Docker volumes with manual tar and rsync scripts. None of these feel like the obvious right answer.

Because the right answer depends on what you're actually doing. If you're running a production application that needs high availability and disaster recovery, you should not be relying on Docker volumes at all. Use a managed database, use object storage, externalize your state. Docker becomes a stateless compute layer, which is what it's best at.

Daniel's question is coming from a different place. He's talking about the developer who has a few services running on a home server, or a small business with a couple of Docker hosts. They've containerized everything, they feel good about it, and then they need to move to a new machine. And they discover that the containerization that made everything feel portable didn't actually solve the hard part.

That's the "works on my machine" irony. Docker abstracted the runtime environment brilliantly. The same image runs on your laptop and your server. But it punted on data portability entirely. The result is that a containerized application is easy to move until you remember its database lives in a host specific directory that can't follow.

I want to dig into the backup angle specifically, because Daniel raised an interesting scenario. He said, what if I just copy the entire Docker volume to another machine once a month or once every six months as a second level backup? File level backups are the primary strategy, but this would be the belt and suspenders approach. Does that make sense?

It depends entirely on the size and change rate of the volume. If it's a small config volume — a few hundred megabytes that barely changes — then a full tar copy every six months is completely reasonable. It's a blunt instrument, but for small data, blunt instruments work fine. The problem is when you apply that same thinking to a database volume that's growing by gigabytes per day.

Right, because you're not just transferring the whole thing once. You're transferring the whole thing every time, even if only a tiny fraction changed. For a large volume, that's wildly inefficient. And if you're doing it over the public internet, you're saturating your bandwidth for hours to move data that's ninety nine percent unchanged.

There's the consistency problem again. If you copy the volume while the container is running, you might get a corrupted backup. If you stop the container for the duration of the copy, you have downtime. For a six monthly backup, maybe you can schedule that downtime. But if you're doing it more frequently, it becomes untenable.

The smarter approach for backups is incremental. Tools like restic, borg, Duplicati — they do change detection, deduplication, and only transfer what's actually new. You still need to handle the consistency issue, usually by snapshotting or briefly pausing the container, but the transfer itself is efficient.

Restic is particularly good for this. It encrypts the backup, deduplicates across snapshots, and supports multiple backends — S three, SFTP, local filesystem. You can run it from the host against a bind mounted directory, or you can run it inside a container that has the volume mounted. The restore process is straightforward. It's not Docker native, but it's battle tested.

The downside of incremental backups is restore complexity. You need the full backup plus every incremental in sequence. If one incremental in the chain is corrupted, everything after it is lost. That's why people still do periodic full backups even with incremental strategies — it limits the blast radius of a corrupted incremental.

That's where Daniel's six monthly full volume copy might actually make sense as part of a larger strategy. Use restic for daily incrementals, but once every six months, take a full consistent copy of the volume and ship it to a different physical location. It's your break glass in case of emergency backup. The inefficiency is tolerable if it's rare.

The answer to "should I copy the whole volume as a backup" is yes, but only if the volume is small or the copy interval is long, and only as a complement to incremental backups, not as your primary strategy. And you still have to stop the container or use database dump tools to get a consistent snapshot.

Let's talk about Docker Compose specifically, because Daniel mentioned moving entire environments. If you're using docker compose, you have additional friction around external volumes. To use a volume that persists after docker compose down, you have to create it manually with docker volume create, specify driver options, then declare it as external true in the compose file. It's a two step process that can't be fully defined in the YAML.

If you forget to create the volume before running docker compose up, it fails with an unhelpful error message. Or worse, if multiple projects use the same external volume name, you get collisions. It's one of those paper cuts that reminds you Docker Compose was designed for development, not production.

The workaround, again, is bind mounts. Instead of declaring a named volume, you mount a host path directly. That path is just a directory. It survives docker compose down, it's trivial to back up, and you can move it between hosts with rsync. The trade off is that you have to manage the directory permissions yourself, and it won't work on Docker for Mac or Docker for Windows without additional configuration because the host filesystem isn't directly accessible from the VM.

For Linux servers, bind mounts are the pragmatic choice. For cross platform development setups, named volumes with the local driver are simpler, but you pay for that simplicity when it's time to migrate.

There's one more angle I want to cover — docker context. It's not a data migration tool, but it does let you manage multiple Docker hosts from a single client. You can switch contexts and run commands against different daemons. It doesn't solve the volume portability problem, but it reduces the friction of managing multiple hosts. Combined with a shared NFS mount, you can effectively move workloads by redeploying containers on different hosts that all point to the same data.

Docker context plus NFS gets you part of the way there for stateless containers. But for stateful workloads, you're back to rsync and tar scripts.

That's the reality. Docker is a container runtime, not a data management platform. It never claimed to be. The frustration comes from the fact that it feels like it should be. The abstraction is so clean for compute that the gap on the data side is jarring.

What would a good solution even look like? If Docker had built a docker volume sync command, what would it need to handle?

It would need to handle consistency — automatically pausing the source container or using filesystem snapshots. It would need to handle incremental transfers — only sending changed blocks. It would need to handle encryption for transfers over untrusted networks. It would need to handle different storage backends transparently. That's a lot of complexity. It's basically what restic does, but integrated into the Docker CLI.

The reason Docker didn't build that is probably that the people who need it badly enough are already using Kubernetes with CSI drivers, and the people who don't need it that badly can get by with rsync. The middle ground of users who want a simple, built in solution is real, but maybe not large enough to justify the engineering effort.

Or maybe it's a failure of imagination. Docker Swarm had a moment where it looked like it might bridge that gap — overlay networks, shared volumes, service replication. But Swarm lost the orchestration war to Kubernetes, and now we're left with this bifurcated ecosystem. Kubernetes for people who need serious storage orchestration, raw Docker for people who don't, and not much in between.

If Daniel is sitting there with two servers and a bunch of Docker Compose files, what's the practical advice? What should he actually do?

First, separate your concerns. Stateless services — your web servers, your API containers, your workers — those are trivially portable. Docker Compose up on the new host and you're done. For stateful services, externalize what you can. Use a managed database if possible. Use object storage for files. Use an NFS share for things that need a filesystem interface.

If you can't externalize, use bind mounts for anything that needs to be portable between hosts. Write a simple rsync script that syncs those directories to your backup server. Test the restore process. Actually test it, don't just assume it works.

For databases specifically, use the native dump tools. Schedule a cron job that runs pg dump or mysqldump, writes the dump to a bind mounted directory, and then rsync that directory offsite. The dump file is a fraction of the size of the raw database files, it's guaranteed to be consistent, and it's portable across database versions.

For the six monthly full volume backup that Daniel mentioned, that's your safety net. Use the tar over SSH method, stop the container first, and keep those archives somewhere physically separate. It's not your primary backup, it's your "everything else failed" backup.

The tar command is worth spelling out, because it's genuinely useful. You run docker run, dash dash rm, dash v your volume name colon data, alpine, tar czf dash dash C data dot, pipe that over SSH, and on the other end, docker run dash dash rm dash i dash v your volume name colon data alpine tar xzf dash dash C data. One line, no intermediate files, works for any volume Docker can mount.

If the volume is large, swap tar for rsync. Find the mount point with docker volume inspect, format, mount point, and then rsync dash avz from that path to the destination. If you do it regularly, rsync will only transfer the changes.

One thing to watch out for with rsync — file permissions and ownership. Docker containers often run as specific user IDs, and if those IDs don't match between hosts, you'll get permission errors. You might need to use the dash dash numeric ids flag or chown the files after restoring.

That's the kind of detail that turns a thirty second operation into a two hour debugging session at three in the morning.

Which is exactly the experience Daniel is trying to avoid. And honestly, the fact that we're having this conversation, that there's enough material for a full episode on how to move Docker volumes between hosts, is itself an indictment of the state of things. It shouldn't be this hard.

It is, and knowing the landscape is better than being surprised by it. The core lesson is that Docker volumes are a convenience for local development and single host deployments. The moment you need portability across hosts, you have to think about data separately from containers. That's not a bug you can fix with a clever script — it's an architectural reality.

The tools exist to handle it. Restic, rsync, tar, NFS, cloud storage drivers. The frustration comes from the fact that none of them are integrated into the Docker workflow. You're stitching things together yourself, and that feels like it should have been solved by now.

Alright, let's take a quick detour. And now: Hilbert's daily fun fact.

The average cumulus cloud weighs approximately one point one million pounds, roughly the same as one hundred elephants.

What can listeners actually do with all this? First, audit your volumes. Figure out which ones actually need to be portable and which ones are ephemeral. If a volume can be rebuilt from source code or regenerated from other data, don't waste time backing it up. Second, for databases, switch to dump based backups today. It's one cron job and it will save you from the corrupted file copy scenario. Third, if you're planning a migration, test it on a non critical service first. The tar over SSH method works, but you want to have done it at least once before you're doing it with production data at two in the morning.

Fourth, consider bind mounts for anything that needs frequent syncing between hosts. You lose docker volume prune and driver plugins, but you gain the ability to use rsync, restic, borg, or literally any file backup tool without Docker specific ceremony. For small teams and single server setups, the simplicity is usually worth the trade off. Fifth, if you're growing to the point where manual volume migration is a recurring pain, that's the signal to look at external storage. NFS for local network, S three compatible object storage for cloud, or a managed database service. The complexity of setting those up is front loaded, but the operational simplicity afterwards is real.

For Daniel's specific question about the six monthly full volume backup — yes, it can work, but only as a secondary strategy. Your primary backups should be file level or database dump level, running daily or weekly. The full volume copy is your insurance policy. Use the tar over SSH method, stop the container first, and store the archive somewhere physically separate from your primary backups. Test the restore at least once a year.

The broader point is that Docker gave us application portability, but data portability is still a do it yourself project. The sooner you accept that and plan around it, the less painful it will be when you actually need to move.

One open question I keep coming back to — will this ever change? Containers are fifteen years old at this point. Kubernetes has persistent volume claims and CSI drivers, but those are complex abstractions for cluster environments. Will we ever get a simple, built in docker volume sync command for the small scale user? Or is the market telling us that anyone who needs that has already moved to Kubernetes or to managed services?

I suspect the ship has sailed. Docker's focus is on developer tooling, not production data management. The ecosystem has filled the gap with third party tools, and the people who need more have graduated to orchestration platforms. It's not satisfying, but it's stable. The workarounds we have today are probably the workarounds we'll have in five years.

Knowing them is half the battle. Thanks to our producer Hilbert Flumingtop for the daily fun fact and for keeping this show running. This has been My Weird Prompts. You can find every episode at myweirdprompts dot com. I'm Corn.

I'm Herman Poppleberry. Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2475: Docker Volumes: Why They Can't Move and What To Do

The Workarounds Everyone Discovers

The Database Trap

The Spectrum of Solutions

Backup Strategies That Actually Work

The Bigger Lesson

Downloads

You Might Also Like

#2475: Docker Volumes: Why They Can't Move and What To Do