I’ve been exploring homelabbing and using platforms like Proxmox and Synology NAS. I’m interested in the idea of aggregating resources—not just storage, but also RAM and CPU—from multiple physical computers to make them appear as one coherent system. Besides Ceph, what are the other approaches and protocols for unifying these hardware resources, and how would we actually connect different computers into a unified node, especially considering the high-speed data interlinks used in professional data centers?

Episode #603

Building a Unified Supercomputer: From SSI to CXL

Herman and Corn explore how to turn separate servers into a unified supercomputer using high-speed interlinks and resource pooling.

0:00/0:00

Download Episode

Episode Details

Published: Feb 12, 2026
Duration: 26:00
Audio: Direct link
Pipeline: V4
TTS Engine
LLM
Topics: architecture networking distributed-systems

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

In the world of "homelabbing"—the hobby of running enterprise-grade hardware and software in a home environment—there is a recurring dream that haunts every enthusiast. It usually starts with a stack of aging office PCs and a simple question: Why can't we just make all these separate boxes act like one giant, unified supercomputer? In this episode, hosts Herman Poppleberry and Corn dive into the complexities of distributed systems, exploring the history, the physical limitations, and the modern technologies that attempt to turn a collection of "silicon islands" into a single, cohesive ocean of compute power.

The Dream of the Single System Image

Herman opens the discussion by defining the "Holy Grail" of this field: the Single System Image (SSI). The goal of SSI is to aggregate raw compute power—CPUs and RAM—so that the operating system perceives them as a single pool. Herman uses a vivid analogy to distinguish this from storage: while sharing a library (storage) among many people is relatively straightforward, getting ten people to share a single brain to solve a math problem (compute aggregation) is a much more difficult task.

Historically, this was attempted through software projects like Kerrighed or Mosix for Linux in the late 1990s. These systems aimed to allow a process to start on one node and transparently migrate to another as resources shifted. However, Herman explains that these projects largely faded into niche research because of two insurmountable enemies: latency and cache coherency. As processors became faster, the "gap" between internal chip speeds and network speeds widened. When a CPU has to wait milliseconds for data from a remote machine’s RAM—rather than nanoseconds from its own—the system stalls, rendering the "unified brain" approach inefficient for general computing.

Breaking the Bottleneck: InfiniBand and RDMA

If software-based SSI is limited by physics, the solution must lie in the hardware. Corn and Herman shift the conversation toward the high-speed interlinks used in massive data centers. For a home labber like Daniel, whose question prompted the episode, a standard one-gigabit Ethernet cable is the first "straw" that will cause a bottleneck.

Herman suggests that enthusiasts look toward the used enterprise market for InfiniBand hardware. Unlike Ethernet, which was designed for moving packets over long distances with significant overhead, InfiniBand was built as a system bus for data centers. When paired with Remote Direct Memory Access (RDMA), the game changes. RDMA allows one computer to reach directly into the memory of another without involving the CPU of either machine. This "direct straw" into remote RAM significantly reduces latency and CPU overhead, making distributed systems feel like local hardware.

Storage Aggregation: Beyond Ceph

While the episode focuses on compute, storage is the area where aggregation has seen the most success. Corn brings up Ceph, a popular distributed storage system, but asks for alternatives. Herman highlights GlusterFS and MooseFS as two viable contenders for the home lab.

GlusterFS is described as a distributed file system that "bricks" local disks together into a single volume. While easier to set up than Ceph for small clusters, it remains heavily dependent on the network interconnect. On the other hand, MooseFS is praised for its ability to handle heterogeneous hardware—allowing users to mix and match disks of different sizes and speeds. However, Herman warns of the "master server" vulnerability in MooseFS; if the central node keeping track of data locations fails, the entire cluster goes blind, necessitating a high-availability setup.

The Modern Shift: Resource Disaggregation and CXL

The conversation then moves to the cutting edge of data center architecture: resource disaggregation. Instead of trying to make multiple computers act like one, modern engineers are taking the components out of the boxes entirely. In this model, you have separate chassis for processors, memory, and storage, all connected by an ultra-high-speed fabric.

A key player in this shift is Compute Express Link (CXL). Herman explains that CXL 3.0, built on PCIe 5.0 and 6.0, allows for true memory pooling. While it doesn't necessarily "combine" two small RAM sticks into one across a network, it allows a central pool of memory to be dynamically mapped to whatever server needs it most. This ensures that resources are never sitting idle, even if they aren't physically located on the server's motherboard.

Orchestration vs. Execution

Finally, the duo clarifies the difference between "orchestration-based aggregation" and true distributed computing. Corn asks if Kubernetes counts as aggregating resources. Herman explains that while Kubernetes makes it feel like you are deploying to one giant computer, the execution is still bound by the physical limits of a single node. You cannot run a process requiring 64GB of RAM on a 16GB node, even if your cluster has a total of 1TB of RAM.

For tasks that truly need to span multiple nodes—like weather simulations or training Large Language Models—the industry uses the Message Passing Interface (MPI). This requires the software to be specifically written to coordinate its work across a cluster, manually sending messages between nodes.

Conclusion

The episode concludes with a realistic takeaway for home labbers. While the dream of a single, unified "brain" remains elusive due to the laws of physics, technologies like RDMA, InfiniBand, and the emerging CXL standard are bringing us closer than ever to a world of seamless resource pooling. For the average enthusiast, the path forward isn't necessarily about making ten computers act like one, but about using high-speed fabrics to ensure that every scrap of silicon in the rack is working at its highest potential.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Cover · OG · Instagram

Episode #603: Building a Unified Supercomputer: From SSI to CXL

You know, Herman, I was looking at that stack of old Dell OptiPlex computers in the corner of our living room this morning, and it hit me. We have all this silicon just sitting there, each one acting like its own little island. And it reminds me of that prompt our housemate Daniel sent over. He has been diving deep into the world of homelabbing, playing around with Proxmox and Synology servers, and he is asking a question that I think every tech enthusiast eventually hits. Why can't we just make all these separate boxes act like one giant, unified supercomputer?

Herman Poppleberry here, and Corn, you are speaking my language. Daniel is hitting on the holy grail of distributed systems. It is the dream of the Single System Image, or SSI. We have gotten really good at aggregating storage, which is what Daniel mentioned with Ceph, but aggregating the raw compute power—the actual central processing units and the random access memory—so that it looks like one massive pool to the operating system? That is a whole different beast. It is the difference between having a bunch of people sharing a library, which is storage, and trying to get ten people to share a single brain to solve a math problem.

Right, and Daniel specifically mentioned that he is looking for things besides Ceph. He wants to know about the protocols and the hardware interlinks that the big players use, the stuff you find in massive data centers. Because if you are just using a standard one gigabit ethernet cable, you are going to hit a wall pretty fast, right?

Oh, absolutely. The bottleneck isn't the processor usually, it is the straw you are trying to suck the data through. If you want to talk about true resource aggregation, we have to look at how high-performance computing clusters actually talk to each other. We are talking about technologies like InfiniBand and Remote Direct Memory Access, or RDMA. These are the things that make the distance between two physical servers feel like the distance between two chips on the same motherboard.

So let us break this down for Daniel and everyone else listening. If we want to move beyond just a Proxmox cluster where you move virtual machines from one node to another, and we actually want to aggregate the resources, where do we start? What are the architectures that actually allow this?

Well, first, we have to acknowledge the two main paths here. There is the software approach, where you use a specialized operating system or a middleware layer to hide the fact that there are multiple machines. And then there is the hardware approach, which focuses on making the interconnects so fast that the software doesn't even realize it is talking to a different machine. On the software side, the SSI concept was huge in the late nineties and early two thousands. There were projects like Kerrighed or Mosix for Linux. The idea was that you could start a process on node A, and if node A got too busy, the operating system would just transparently migrate that process to node B without the process even knowing it happened.

That sounds like magic, but I assume there is a reason we don't see everyone running Mosix on their home labs today. What happened to those projects?

Latency happened. And cache coherency. Think about it this way. Your processor is incredibly fast. It expects to talk to its memory in nanoseconds. When you try to span that across a network, even a very fast one, you are suddenly talking in microseconds or even milliseconds. If the processor on machine A needs a piece of data that is sitting in the random access memory of machine B, it has to wait. And while it is waiting, it isn't doing any work. It is just stalling. As processors got faster and faster, the gap between the speed of the chip and the speed of the network actually got wider in relative terms. So, true Single System Image clusters mostly faded into niche research or very specific high-performance computing applications.

So if the true single brain approach is too slow because of physics, what is the middle ground? Daniel mentioned aggregating resources to make them appear as one coherent system. If it isn't a single operating system, what are we actually looking at in modern data centers here in twenty-twenty-six?

In modern data centers, we have moved toward what we call resource disaggregation. Instead of trying to make ten computers act like one computer, we take all the parts out of the computers and put them in their own boxes. You have a box full of processors, a box full of memory, and a box full of storage. Then, you connect them all with an ultra-high-speed fabric. This is where things like NVMe over Fabrics, or NVMe-o-F, come into play. It allows a server to connect to a remote solid-state drive over the network and talk to it using the exact same protocol it would use if the drive were plugged directly into the motherboard. To the operating system, that remote drive looks just like local storage.

Okay, so that handles storage very well. But what about the memory and the processing power? Can I have a box of random access memory that my Proxmox node just treats as local memory?

We are finally getting there with Compute Express Link, or CXL. As of this year, CXL three-point-zero is starting to hit the enterprise market in a big way. It is built on top of the physical PCIe fifth and sixth generation wires. CXL allows for true memory pooling and even memory sharing. You can have a giant chassis of RAM that multiple servers can access dynamically. If one server needs an extra hundred gigabytes for a massive database task, the CXL switch can just map that memory to that server. It is not quite "aggregating" in the sense of combining two small sticks of RAM into one big one across a network, but it is "pooling" so that resources aren't wasted.

That is fascinating. But for someone like Daniel, who is working with existing hardware like a Synology NAS and some Proxmox nodes, he is probably not going to go out and buy a CXL-enabled rack tomorrow. If he wants to connect his existing machines with something faster than ethernet, what are his options for those high-speed interlinks he mentioned?

This is where we get into the fun stuff. If you want to play with the big boy toys in a home lab, you look at the used enterprise market for InfiniBand or high-speed ethernet cards. InfiniBand is the standard for supercomputers. Unlike ethernet, which was designed to move packets of data across long distances with a lot of overhead, InfiniBand was designed from the ground up to be a system bus for a data center. It has incredibly low latency and very high throughput. You can find used Mellanox ConnectX-four or ConnectX-five cards on eBay for relatively cheap now. We are talking twenty-five, forty, or even a hundred gigabits per second.

And what does that actually give you in a Proxmox environment? Does it just make your backups faster, or does it change how the nodes interact?

It changes everything if you use Remote Direct Memory Access, or RDMA. Normally, when you send data over a network, the processor has to get involved. It has to wrap the data in a packet, send it to the network card, and then on the receiving end, the other processor has to unwrap it and move it into memory. This takes time and uses up CPU cycles. RDMA allows one computer to reach directly into the memory of another computer and pull data out, or push data in, without bothering the processor on either side.

Wait, so the network card is basically acting like a direct straw into the other machine's RAM?

Exactly. And when you combine that with something like Ceph, it becomes incredibly powerful. Ceph is a distributed storage system. Normally, it can be a bit slow in a home lab because every time you write data, it has to be replicated across the network, which adds latency. But if you run Ceph over a network that supports RDMA—like InfiniBand or RoCE, which stands for RDMA over Converged Ethernet—that latency drops significantly. It makes the distributed storage feel almost as fast as a local drive.

That is a great point. I think we should dig a bit deeper into the "besides Ceph" part of Daniel's question. If he wants to unify these resources, what are the other protocols or file systems he should be looking at? I know we have talked about GlusterFS in the past, but how does that compare?

GlusterFS is another big one. While Ceph is an object-based store that can provide block devices and file systems, Gluster is more focused on being a distributed file system. It is often easier to set up than Ceph for smaller clusters. It basically takes the local disks on your servers and "bricks" them together into a single volume. If you have three servers with two terabytes each, Gluster can present that as a single six-terabyte folder to your network. But again, the performance of Gluster is heavily dependent on your network interconnect. If you are running Gluster over a standard one-gigabit link, your write speeds are going to be miserable because the servers have to talk to each other to keep everything in sync.

There is also MooseFS and LizardFS, right? I remember reading that those are often used in environments where you have a lot of heterogeneous hardware, which sounds like exactly what a home labber like Daniel might have.

Yeah, MooseFS is great because it is very forgiving. You can throw disks of different sizes and speeds into the pool, and it just handles it. It uses a master server to keep track of where all the data is, and then chunk servers to actually store the bits. The cool thing about MooseFS is that it is very easy to scale. You just add another chunk server, and the capacity of your unified system grows. But the downside is that master server. If it goes down, your whole cluster is essentially blind. You have to set up a high-availability master to avoid that single point of failure, which adds complexity.

So we have covered storage aggregation pretty well. But Daniel also asked about aggregating RAM and CPU. We talked about why a single system image is hard, but what about things like Kubernetes? Does that count as aggregating compute resources?

That is a really insightful question, Corn. Kubernetes is what I would call "orchestration-based aggregation." It doesn't make two CPUs act like one giant CPU with thirty-two cores. Instead, it treats your entire cluster as a pool of available resources. When you want to run an application, you tell Kubernetes, "I need two cores and four gigabytes of RAM." Kubernetes looks at all your servers, finds one that has those resources available, and drops the container there. To the person deploying the app, it feels like they are deploying to one giant computer. But the individual task is still limited by the physical hardware of the single node it lands on. You can't run a single process that requires sixty-four gigabytes of RAM on a cluster made of nodes that only have sixteen gigabytes each.

Ah, so that is the distinction. You can aggregate the management of the resources, but the execution is still bound by the physical limits of each box. Unless, I suppose, you are doing something like distributed computing for scientific research?

Right, like the Message Passing Interface, or MPI. This is the standard in the high-performance computing world. If you are running a massive weather simulation or training a large language model, you use MPI. It allows a single program to be split into thousands of pieces that run across hundreds of different servers. The program is specifically written to know that it is running on a cluster. It manually sends messages back and forth between the nodes to coordinate. This is the opposite of the Single System Image approach. Instead of the operating system hiding the network, the application is fully aware of the network and manages the distribution itself.

So if Daniel wants to "unify" his hardware, he kind of has to decide what he is trying to achieve. If he wants to run one giant virtual machine that spans all his hardware, he is probably out of luck with current consumer tech. But if he wants a cluster where he can just throw tasks at it and not care which box they run on, then something like a Proxmox cluster with shared storage or a Kubernetes cluster is the way to go.

Exactly. And to make that experience feel seamless, he needs to look at the interconnects. We mentioned InfiniBand, but even moving to ten-gigabit or twenty-five-gigabit ethernet can be a game changer. Most people don't realize that the latency on a one-gigabit network is actually quite high because of the way the packets are processed. When you move to ten-gigabit or higher, especially with cards that support hardware offloading, the "snappiness" of your cluster improves dramatically.

Let us talk about the physical side of those interlinks for a second. Daniel mentioned the high-speed data interlinks used in professional data centers. Beyond just the speed, there is the cabling. You see those thick, orange or aqua cables in server rooms. Those are usually SFP-plus or QSFP connectors.

Yeah, and for a home lab, the secret weapon is the Direct Attach Copper cable, or DAC. If your servers are in the same rack or right next to each other, you don't need expensive optical transceivers and fiber optic cables. A DAC cable is basically a copper wire with the connectors already attached. It is cheap, it has incredibly low latency because there is no optical conversion happening, and it is very reliable. You can get a ten-gigabit DAC cable for like fifteen bucks. If Daniel wants to connect his Proxmox nodes to his Synology, and they both have SFP-plus ports, that is the absolute best way to do it.

That is a great practical tip. I actually think we should take a moment to talk about the "why" here. Why is this so much harder for RAM and CPU than for storage? I think it comes down to the frequency of access.

Spot on. Think about a hard drive. Even a fast NVMe drive is slow compared to your CPU. The CPU can do billions of things a second, while the drive might only be able to do hundreds of thousands. So, the CPU has plenty of time to wait for the data to come over the wire. But RAM is different. The CPU talks to RAM constantly. It is like the difference between waiting for a package to arrive in the mail versus waiting for your hand to move when you think about it. If there is even a tiny delay in your hand moving, you feel it immediately. That is what happens when you try to aggregate RAM across a network. The "wait state" of the processor just kills performance.

This is making me think about those multi-socket motherboards, the ones where you can have two or four Xeon or EPYC processors on a single board. Even there, they aren't perfectly "unified," right? They use something like Infinity Fabric or Ultra Path Interconnect.

Right! That is called NUMA, or Non-Uniform Memory Access. Even on a single motherboard with two processors, each processor has its own "local" RAM. It can talk to the other processor's RAM, but it takes slightly longer. The operating system has to be smart enough to try and keep a program's data in the RAM that is physically closest to the processor it is running on. If professional engineers have to work that hard to manage latency across a few inches of circuit board, you can imagine how hard it is to do it across six feet of ethernet cable.

So, for Daniel's setup, if he is looking to "aggregate" these resources, he should probably focus on a few specific things. One, getting a high-speed backplane for his storage. Two, using an orchestration layer like Proxmox or Kubernetes to manage the "pool" of compute. And three, maybe exploring some of the more exotic file systems if he is feeling adventurous. Have you heard of JuiceFS?

JuiceFS is really interesting! It is a bit different because it uses a database, like Redis or PostgreSQL, to store the metadata, and then it can use almost anything for the actual data storage, including a Synology NAS or even S3 cloud storage. It is designed to be high-performance and cloud-native. For a home labber, it is a cool way to have a unified file system that can span across your local machines and the cloud seamlessly.

That sounds like it would fit right into a modern homelab. I also want to touch on something Daniel mentioned about Proxmox specifically. In a Proxmox cluster, you have this thing called "High Availability." If one node dies, the others take over. But for that to work well, you need that unified storage we have been talking about.

Definitely. If your virtual machine's hard drive is only on node A, and node A dies, node B can't start that machine because it doesn't have the data. This is why everyone in the homelab community is obsessed with Ceph. It replicates that data across all the nodes so that any node can pick up the slack at any time. But, as Daniel noted, Ceph is heavy. It wants a lot of RAM and a lot of dedicated networking. If you are running on smaller nodes, like those tiny NUCs or older OptiPlexes, Ceph might eat up half your resources just managing itself.

So what is the lightweight alternative for the "Proxmox plus Synology" crowd?

The most common approach is just using NFS, the Network File System, or iSCSI, the Internet Small Computer Systems Interface, to connect to the Synology. The Synology acts as the "unified" storage, and all the Proxmox nodes talk to it. It is simple, it is reliable, and it works. But the Synology becomes your single point of failure. If the NAS goes down, your whole cluster goes down. That is the trade-off. Ceph gives you resilience but costs you resources. A central NAS is efficient but creates a bottleneck and a failure point.

It is always about the trade-offs, isn't it? I feel like we have given Daniel a lot to chew on. We have talked about the dream of the single system image, the reality of latency, the power of InfiniBand and RDMA, and the different ways to slice the storage pie.

We have. And I think the big takeaway for anyone looking at this is: don't try to fight physics. You can't turn three slow computers into one fast computer for a single task. But you can turn three computers into a very resilient, very flexible system that handles many tasks simultaneously.

That is a great way to put it. Before we wrap up, I think we should talk about the future a bit. We mentioned CXL earlier. Do you think we will ever see "CXL for the home lab"? Like, could we have a consumer-grade memory pooling switch in five or ten years?

I really hope so. As we move toward more modular computing, it makes sense. Imagine buying a "CPU pod" and a "RAM pod" and just plugging them into a high-speed fabric in your house. We are already seeing the beginnings of this with things like Thunderbolt five, which is essentially external PCIe lanes. People are already using external GPUs over Thunderbolt. Using external RAM or pooling resources over a high-speed local fabric isn't that far off. The challenge will always be the software. We need operating systems that are designed to handle memory that might suddenly disappear or have variable latency.

It is such a fascinating time to be into this stuff. Daniel, thanks for sending that in. It really got us thinking about the architecture of our own setup here in Jerusalem. Maybe we should finally get around to networking those OptiPlexes properly, Herman.

Hey, I have been telling you, we just need a couple of Mellanox ConnectX-five cards and a bit of luck with the drivers! But really, it is a rabbit hole you can fall down for months.

Well, if any of you listening have your own weird prompts or if you have actually built a crazy InfiniBand-powered cluster in your basement, we want to hear about it. You can get in touch with us through the website at my-weird-prompts-dot-com. We have a contact form there, and you can also find our full archive of episodes.

And if you are finding these deep dives into the guts of technology useful, or even just entertaining, please do us a favor and leave a review on Spotify or whatever podcast app you are using. It really does help the show grow and helps other people find us.

It really does. We have been doing this for nearly six hundred episodes now, and the community feedback is what keeps us going.

Absolutely. Well, I think that covers the "unified node" dream for today.

I think so too. This has been My Weird Prompts. I am Corn.

And I am Herman Poppleberry.

Thanks for listening, and we will talk to you next time.

Goodbye everyone!

So, Herman, about those InfiniBand cards... do you think they would fit in the small form factor cases?

We might have to use some risers and do a bit of case modification, but where there is a will, there is a way!

I figured you would say that. Alright, let's go see what's on eBay.

Already ahead of you. I have a tab open.

Of course you do. See ya.

Bye!

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.