The question of whether you can build your own private cloud has shifted from "can you" to "which pieces do you pick, and in what order." The software stack has caught up to the ambition, with MinIO providing single-node S3-compatible object storage using Reed-Solomon erasure coding, Ceph scaling to petabyte clusters with CRUSH map-based policy placement, and Garage purpose-built for geo-distributed setups across multiple properties with latency tolerance up to 200ms. For compute, Proxmox VE 8.3 offers a Swiss Army knife approach with native software-defined networking and ZFS-over-iSCSI, while Incus 6.0 provides lighter-weight container-first GPU passthrough with a single command. The vgpu_unlock-rs project now enables NVIDIA vGPU on consumer cards, and AMD's ROCm 6.3 supports RX 7000 series GPUs natively, meaning a single RTX 4090 can serve Llama 3.2 70B at 4-bit quantization through vLLM to an entire private cloud. The key insight is that for one to three nodes, Kubernetes is unnecessary — Proxmox or Incus with Ansible playbooks provides simpler, more reliable orchestration.
#3218: Building Your Own Cloud in 2026
The software and hardware for a DIY private cloud have never been more feasible. Here's how to pick the right pieces.
Episode Details
- Episode ID
- MWP-3388
- Published
- Duration
- 28:50
- Audio
- Direct link
- Pipeline
- V5
- TTS Engine
-
chatterbox-regular - Script Writing Agent
- deepseek-v4-pro
- Topics
- diyhome-labgpu-acceleration
AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.
Downloads
Transcript (TXT)
Plain text transcript file
Transcript (PDF)
Formatted PDF with styling
Never miss an episode
New episodes drop daily — subscribe on your favorite platform
New to the show? Start here#3218: Building Your Own Cloud in 2026
Daniel sent us this one — and it's basically the question anyone who's ever stared at a cloud bill or a terms-of-service change has asked themselves. He's backing up the show to Wasabi, plus a home NAS, but wants to go further. The core question: what platforms let you build your own private cloud — real S3-compatible object storage, compute in VMs and containers, maybe GPU resources — across multiple properties? And practically, how do you monitor and recover a machine that's completely down when you're not physically there? The mechanics of creating your own cloud. Which, I'll say it, has never been more feasible than right now.
It's not just feasible — the economics have genuinely flipped in the last eighteen months. Backblaze B2 crossed three exabytes stored in Q1 of this year. AWS dropped S3 Glacier Instant Retrieval pricing by fifteen percent in January. The public cloud is cheaper than ever. But so is building your own. Ten-gigabit networking is dirt cheap, used enterprise hardware is abundant, and the software stack has matured to the point where most of the pain points that made DIY cloud a hobbyist nightmare are gone.
The cloud is cheap, and the anti-cloud is cheap. Which means the decision is no longer about money. It's about control.
And that's the tension here. You trust the cloud, but do you trust the cloud? Do you trust that your data will still be accessible in five years when the provider changes their API or gets acquired or jacks up egress fees? The prompt is asking about permanence, and permanence means you own the infrastructure.
Let's define what we actually mean by a private cloud, because it's not what most people think. When someone says "I have a private cloud," they usually mean a NAS with a web interface and maybe Docker. That's a file server. That's not a cloud.
A private cloud, in the sense we're talking about, is a system that provisions S3-compatible object storage, virtual machines, containers, and optionally GPU compute — all self-managed, all accessible through APIs. The key word is provisions. A NAS stores files. A cloud lets you say "give me a bucket, give me a VM with eight gigs of RAM, give me a container with GPU access," and it happens programmatically. That's the distinction.
The three pillars are storage, compute, and orchestration. Storage meaning MinIO, Ceph, or Garage. Compute meaning Proxmox, Incus, or OpenStack. Orchestration meaning Kubernetes, Nomad, or just good old Ansible if you're keeping it simple.
Here's why this is more feasible now than even two years ago. 0 shipped in March with native GPU passthrough support. Proxmox VE 8.3 landed the same month with native software-defined networking and ZFS-over-iSCSI for VM storage. The vgpu_unlock-rs project — which lets you do NVIDIA vGPU on consumer cards — reached stable status late last year. And AMD's ROCm 6.3, which shipped in November, supports RX 7000 series GPUs. The software has caught up to the ambition.
The question isn't "can you do this." The question is "which pieces do you pick, and in what order." Let's get into the three layers — storage, compute, and GPU — and the specific tools that make each work.
Storage first, because that's where most people start, and it's where most people get it wrong. You have three main contenders for self-hosted S3-compatible object storage: MinIO, Ceph, and Garage. They solve different problems.
The misconception is that they're interchangeable. They're not. MinIO is the answer if you want a single-node S3-compatible store that Just Works with the AWS SDK. It uses Reed-Solomon erasure coding — you specify k data drives and m parity drives, and it can tolerate m failures. For a home server with four drives, you might do two-plus-two, meaning half your raw capacity goes to redundancy, but you can lose any two drives and not lose data.
MinIO's erasure coding is elegant. It stripes data across drives at the object level, not the block level, which means it can reconstruct missing shards without reading the entire object. For a four-drive setup with two-plus-two coding, a single drive failure means it only needs to read half the remaining data to rebuild. Compare that to traditional RAID, where you're reading the entire array. MinIO also does bit-rot detection with HighwayHash checksums on every object, which is something a basic NAS won't do.
It speaks native S3. You point any S3-compatible tool at it — the AWS CLI, rclone, Duplicati — and it works. No translation layer, no weird compatibility matrix.
The catch with MinIO is multi-node. If you want redundancy across two physical machines, MinIO's erasure coding needs to span both nodes. That means you need at least two drives per node for a minimum four-drive pool. And MinIO doesn't do replication in the traditional sense — it does erasure coding across nodes. For two nodes, that works fine. For three or more, Ceph becomes more interesting.
Let's talk Ceph, because Ceph is the thing people name-drop and then quietly abandon after two weeks of trying to configure it.
Ceph is brilliant and also the DevOps equivalent of adopting a feral cat. It will do exactly what you want, but it will also demand things from you that you didn't know you had to give. Ceph uses CRUSH maps — Controlled Replication Under Scalable Hashing — to determine where data lives. Unlike MinIO's straightforward erasure coding, CRUSH lets you define rules like "store three copies, and make sure no two copies are on the same rack." It's policy-driven placement.
For a two-node setup, Ceph is overkill. You need at least three monitor nodes for quorum, otherwise a single monitor failure can take down the whole cluster. You can run monitors on the same machines as your storage nodes, but the moment you lose one node, you've lost quorum and everything goes read-only.
That's the Ceph trap. People see "software-defined storage" and think it scales down. It doesn't. Ceph's sweet spot is five nodes or more, where the overhead of monitors and managers is amortized, and the CRUSH map actually has meaningful topology to work with. For a home lab with two or three machines, MinIO or Garage are better fits.
Garage is the dark horse here. Built by a French collective called Deuxfleurs, it's purpose-built for geo-distributed S3. It handles latency up to two hundred milliseconds between nodes, which means you can run a Garage cluster across properties in different cities and it won't fall over. It uses a consensus algorithm based on the Raft protocol, and it's designed for exactly the scenario Daniel is describing: two or three locations, modest hardware, S3 compatibility.
Garage's replication model is refreshingly simple. You set a replication factor — say, two — and it ensures every object exists on at least that many nodes. It doesn't do erasure coding. It does straight replication. For a three-node geo-distributed setup, replication factor two means you can lose any one node and still have all your data. The tradeoff is storage efficiency — fifty percent overhead for two-way replication versus potentially thirty-three percent for erasure coding. But the simplicity is worth it.
The decision tree is: single location, one or two nodes, want S3 with minimal fuss? Three or more locations, geo-distributed, okay with replication overhead? Five or more nodes in one location, need petabyte scale, have a dedicated ops person? And if you're just backing up a podcast and some family photos, MinIO on a single node is probably the entire answer.
Here's the thing — you don't run MinIO or Garage on bare metal. You run it inside a VM or a container on your compute layer. Which brings us to compute.
Proxmox VE 8.3 versus Incus 6.This is where the "private cloud" part actually materializes.
Proxmox is the Swiss Army knife. It manages virtual machines and containers through a single web UI, and the 8.3 release in March added native software-defined networking — you can now create virtual networks with VXLAN or VLAN segmentation without touching a config file. It also added ZFS-over-iSCSI, which means you can provision block storage for VMs from a ZFS pool and export it over the network. For a home setup, this is absurdly capable.
Incus is the lighter-weight alternative. It's a fork of LXD, now under the Linux Containers project, and version 6.0 added native GPU passthrough with a single command. Incus is container-first — no VM overhead unless you specifically need a VM. For GPU workloads, that matters.
The practical difference is resource overhead. Proxmox runs a full Debian base with its own kernel modules and management stack. Incus runs as a daemon on whatever Linux you've already installed. For a machine with a hundred twenty-eight gigs of RAM and a Ryzen 7950X, the overhead difference is negligible. For a repurposed NUC with sixteen gigs, Incus leaves more room for actual workloads.
The GPU question is where this gets interesting. The prompt asks: if you want GPU resources in your private cloud, does that change your platform recommendation?
It changes everything about how you think about the compute layer. Consumer GPUs — RTX 4090, RX 7900 XTX — were never designed for multi-tenant virtualization. NVIDIA wants you to buy their enterprise cards for that. But the vgpu_unlock-rs project, which stabilized late last year, patches the NVIDIA kernel driver to enable vGPU on consumer cards. You can slice a single RTX 4090 into multiple virtual GPUs and assign them to different VMs or containers.
That's a single RTX 4090 running, say, Llama three-point-two seventy-billion at four-bit quantization through vLLM, serving inference requests to your whole private cloud. One card, one model, multiple users hitting it through an API.
The numbers on this are real. Llama three-point-two seventy-billion at four-bit quantization fits in about forty gigabytes of VRAM. An RTX 4090 has twenty-four gigs, so you'd need to offload some layers to system RAM, which slows inference. But the point stands — you can run a seventy-billion-parameter model on consumer hardware in your basement. Two years ago, that required a data center GPU.
For AMD, the path is simpler. 3 supports the RX 7000 series natively. No unlock required. AMD's been quietly building out their ROCm stack, and while it's still rougher than CUDA for some workloads, for inference it's solid. An RX 7900 XTX with twenty-four gigs of VRAM costs about nine hundred dollars and runs most models without modification.
Here's where Incus shines for GPU workloads. With Incus, GPU passthrough to a container is one command: incus launch ubuntu twenty-four-oh-four gpu-container, then incus config device add gpu-container gpu gputype physical. That's it. The container sees the GPU as if it were native. No PCIe passthrough, no IOMMU groups, no kernel parameter fiddling. Compare that to Proxmox, where GPU passthrough to a VM requires enabling IOMMU, blacklisting drivers, and assigning the GPU's PCIe address — doable, but more moving parts.
The overhead difference is measurable. A container with GPU access has near-bare-metal performance. A VM with PCIe passthrough adds maybe two to three percent overhead from the hypervisor. For inference, that's negligible. For training, where you're saturating the PCIe bus for hours, it adds up.
The orchestration question is the third piece, and this is where people over-engineer catastrophically. The prompt asks about bundling compute, storage, and GPU resources, which sounds like it demands Kubernetes. It doesn't.
The misconception is that you need Kubernetes to have a private cloud. For one to three nodes, you absolutely don't. Proxmox or Incus with Ansible playbooks is simpler, more reliable, and easier to debug at three in the morning when something breaks.
Kubernetes — specifically k3s for lightweight deployments — makes sense when you have five or more nodes and you need automatic scheduling, self-healing, and rolling updates. At three nodes, you're spending more time managing Kubernetes than you're saving. Nomad by HashiCorp is a good middle ground — it's a single binary, it schedules containers and batch jobs, and it has native GPU support. But even Nomad is more than most home clouds need.
Ansible is the unglamorous answer that actually works. You write a playbook that says "ensure MinIO is running on node one, ensure Incus is running on node two, ensure this GPU container exists on node three." You run it once, it converges to the desired state. You run it again, nothing changes. It's not dynamic scheduling, but for a home cloud with a fixed workload, dynamic scheduling is a feature you don't need.
Here's a case study that ties it all together. Imagine a single machine: Ryzen 7950X, a hundred twenty-eight gigs of RAM, four twenty-terabyte hard drives, an RTX 4090. You install Proxmox as the hypervisor. You create a VM for MinIO, pass through the four hard drives directly, and configure two-plus-two erasure coding. You create an Incus container — yes, Incus running inside Proxmox, because Proxmox can host LXC containers natively — and pass through the GPU for vLLM. You create a Windows VM for gaming with GPU passthrough when the AI container isn't using it. All of this managed through a single web UI, all of it on one machine.
That's a private cloud in a box. Object storage, GPU compute, a general-purpose VM, all provisioned and managed from one interface. The incremental cost beyond the hardware is zero dollars.
The S3 API is the universal glue. Your backup software talks to MinIO. Your AI application talks to MinIO for model storage. Your phone's photo sync app talks to MinIO through an S3 gateway. Everything speaks S3. That's why object storage, not file storage, is the foundation of a private cloud.
You've built your cloud. Now imagine it's in a house you visit twice a year. How do you keep it running?
This is where the prompt gets practical. You have a machine in a property you don't live in full-time. It goes down. Not a service crash — the whole machine. No network, no SSH, no web UI. How do you even know it's down, and how do you bring it back?
Out-of-band monitoring is the answer, and the first rule is: your monitoring must not live on the thing it's monitoring. If your uptime checker runs on the same machine that might go down, you have a single point of failure that can't report on itself.
The minimum viable setup is a separate device on the same network that pings your server and alerts you if it stops responding. Uptime Kuma running on a Raspberry Pi is the entry-level version. It's a single Docker container that does ping checks, HTTP checks, and sends alerts through email, Telegram, or SMS via Twilio. For ten dollars in hardware and an hour of setup, you have basic heartbeat monitoring.
If the machine is hard-down — no power, no network — a ping check from a Pi on the same network tells you nothing you don't already know. You need something that can see the machine from a different angle.
Enter the Shelly Pro 3EM. It's a power monitor that clamps onto your electrical panel and measures consumption per circuit. For about a hundred dollars, you can see exactly how much power your server rack is drawing. If that number drops to zero, the machine is off — not crashed, not unresponsive, off. That's a different class of alert.
That alert needs to reach you even if the local network is down. Which means a cellular failover modem. A Cradlepoint is the enterprise option, but for a home setup, a four-G USB dongle on a separate carrier from your phone, plugged into a Raspberry Pi that lives on a different power circuit, gives you an independent path to the outside world.
The Pi Zero 2W is perfect for this. It draws less than two watts, it can run off a USB battery pack for days if the power goes out, and it has Wi-Fi and GPIO pins. You connect a four-G modem via USB, you write a script that checks the Shelly API every sixty seconds, and if power draw drops below a threshold, it fires off an SMS via the modem. Total cost: about sixty dollars plus the modem and a prepaid SIM.
That's detection. What about recovery? You know the machine is down. Now you need to turn it back on.
Remote power cycling has gotten surprisingly accessible. The enterprise answer is a smart PDU — something like the PDU8100 series that gives you per-outlet control, power metering, and a web interface. Those run three to five hundred dollars used. The home-lab answer is a Sonoff S31 smart plug flashed with ESPHome, which costs about eight dollars per outlet and can be controlled via MQTT or a simple HTTP request.
Here's where IPMI becomes non-negotiable for the server itself. IPMI — Intelligent Platform Management Interface — is a separate little computer on the motherboard that runs even when the main system is off. It gives you remote console access, power control, and sensor monitoring over the network. Server-grade boards from ASRock Rack and Supermicro have it built in. Those boards cost three to five hundred dollars.
For consumer boards without IPMI, PiKVM is the answer. PiKVM v4 shipped late last year with HDMI capture and full ATX control. It's a Raspberry Pi with an HDMI input and USB output that pretends to be a keyboard, mouse, and monitor. You plug it into your server's HDMI port and USB port, and you have full remote console access — BIOS, bootloader, everything — over a web browser. Cost is about two hundred fifty dollars for the pre-built unit.
PiKVM can physically press the power and reset buttons. It connects to the motherboard's front-panel header, so you can issue a hard reset even if the operating system is completely hung. That's the difference between "I'll fix it when I visit next month" and "I'll fix it right now from my phone.
OpenBMC on a Raspberry Pi 5 is the open-source alternative. It's more work to set up, but it gives you the same IPMI-like interface for zero software cost. The Pi 5's extra processing power means the web interface is actually responsive, unlike some of the earlier attempts.
The full out-of-band stack for a remote property looks like this: PiKVM connected to the server for console and power control, a Raspberry Pi Zero 2W with a four-G modem on a separate power circuit for heartbeat monitoring, a Shelly power monitor on the server's circuit for ground-truth power data, and a Sonoff smart plug as a last-resort power cycle if IPMI itself is unresponsive.
That whole stack costs maybe four hundred dollars. Four hundred dollars to be able to recover a machine that's in another country without getting on a plane. That's not expensive — that's the cheapest insurance policy you'll ever buy for your data.
The prompt asks about using a local MSP — a managed service provider — to physically access the property. This is the human layer of out-of-band.
It's more practical than people think. You find a break-fix IT shop in the city where your property is. You sign a simple contract: twenty-four-hour response, you provide the procedure, they provide the hands. You give them a locked cabinet with a keypad code, a printed document with the static IP of the BMC, and a flowchart of what to do. "If you get an SMS from me, go to the property, enter code one-two-three-four on the cabinet, press the blue button on the PDU, wait five minutes, call me.
That's not a trust exercise in the abstract. They don't have the root password. They don't have the encryption keys. They have physical access to a locked box with a power button. The worst they can do is unplug something, and they're a registered business with a contract.
Here's a case study that ties the whole multi-property scenario together. A listener has properties in Berlin and Barcelona. Each site has a Proxmox node with IPMI, a PiKVM, and a cellular Raspberry Pi for heartbeat. Garage replicates S3 data across both sites with two-way replication. When the Berlin node lost power during a storm, the Barcelona node detected the heartbeat failure — because Garage's internal health checks stopped seeing the Berlin peer — and sent an SMS via Twilio. The listener called the Berlin MSP, who drove over, checked the PDU, found the breaker tripped, reset it, and the node came back online. Total downtime: four hours. Without that setup, it would have been down until the next visit, which could have been months.
The data was fine. Garage's replication meant every object existed in Barcelona. When Berlin came back, Garage re-synced automatically. The listener didn't lose a single byte.
That's the power of this approach. It's not just about convenience — it's about data durability across geographic fault lines. A fire in Berlin doesn't destroy the Barcelona copy. A flood in Barcelona doesn't touch Berlin. That's the promise of the public cloud, but you own the hardware.
Let's distill this into something you can actually do this week. Because the eighty-twenty rule applies hard here. Eighty percent of the benefit comes from the first node.
Start with a single-node Proxmox plus MinIO setup. Don't over-engineer. You can add Ceph or Garage later. Install Proxmox on a spare machine — even an old desktop with a couple of drives. Create a VM, install MinIO inside it with the official Docker image, expose port nine thousand and port nine thousand one. Then test S3 compatibility from your laptop: aws s3 cp, endpoint-url pointing to your server's IP on port nine thousand. If that command works, you have a private cloud.
Out-of-band is not optional. It's not phase two. It's phase one-point-one. Budget two to four hundred dollars for a PiKVM plus a smart plug plus a cellular modem. Without it, your private cloud is a single point of failure you can't recover from remotely. With it, you can fix almost anything from anywhere.
For GPU workloads specifically, use Incus containers with GPU passthrough rather than full VMs. Lower overhead, faster provisioning, and better GPU utilization. The incus config device add command I mentioned earlier is all you need. And if you're running an LLM, vLLM with four-bit quantization on an RTX 4090 will serve multiple users simultaneously. You don't need a data center GPU to have useful AI infrastructure.
The concrete next step for tonight: if you have a spare machine, install Proxmox. It's a bootable ISO, it takes twenty minutes. Spin up a MinIO container from the Proxmox template library. Test it with the AWS CLI. Then install Uptime Kuma on a Raspberry Pi or an old laptop and set up a ping check to your router. That's the minimum viable private cloud, and it costs nothing but time.
Once you've done that, the rest is incremental. Add a second node. Add Garage for geo-distribution. Add GPU passthrough. Add the cellular heartbeat monitor. Each step builds on the last, and none of them require you to re-architect anything.
The trajectory here is worth watching. Twenty-five gigabit Ethernet is becoming consumer-accessible — Intel E810 cards are now a hundred fifty dollars used. As that trickles down, the line between a home lab and an enterprise cloud gets blurrier. The question isn't "should you build a private cloud." It's "when will it be easier than renting one.
There are open questions about where this is heading. Apple's Private Cloud Compute architecture — where they offload AI processing to their own servers with verifiable privacy guarantees — could influence how self-hosted GPU clouds are designed. If Apple can prove that a third-party node can process your data without seeing it, that model trickles down.
The EU's Data Act takes effect in September. It mandates cloud portability and interoperability standards. If the major providers have to make it easy to move your data between clouds, that also makes it easier to move your data to your own cloud. The regulatory environment is slowly tilting toward self-hosting as a first-class option, not a hobbyist niche.
None of this means the public cloud is going away. AWS, Backblaze, Wasabi — they're excellent at what they do. The prompt itself acknowledges that. Backblaze storing three exabytes is a testament to how much trust people place in cloud storage. But trust and control are different things. A private cloud gives you control. The public cloud gives you convenience. The sweet spot, for the kind of setup we've been describing, is both: critical data replicated across your own nodes, with a cloud backup as the final safety net.
To wrap this into something actionable: build the first node this week. Proxmox, MinIO, Uptime Kuma. That's the foundation. Add out-of-band next month. Add a second location when you're ready. The tools are mature, the hardware is cheap, and the knowledge is out there. The only thing missing is the weekend to set it up.
Now: Hilbert's daily fun fact.
Hilbert: In the early 1500s, a Spanish navigator exploring the coast of what would later be called Tasmania sketched plans for a tide-predicting mechanical computer using interlocking brass gears and a camshaft driven by a water wheel. The device was never built — his ship sank in a storm off the coast of Chile, and the plans were lost until a partial copy surfaced in a Seville archive in 1923. Had it been constructed, it would have predated Lord Kelvin's tide-predicting machine by nearly four centuries.
Hilbert: In the early 1500s, a Spanish navigator exploring the coast of what would later be called Tasmania sketched plans for a tide-predicting mechanical computer using interlocking brass gears and a camshaft driven by a water wheel. The device was never built — his ship sank in a storm off the coast of Chile, and the plans were lost until a partial copy surfaced in a Seville archive in 1923. Had it been constructed, it would have predated Lord Kelvin's tide-predicting machine by nearly four centuries.
...right.
This has been My Weird Prompts, produced by Hilbert Flumingtop. If you want more episodes, the archive lives at myweirdprompts.We'll be back next week.
This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.