#4068: How to Build a Go Image Pipeline for Your Inventory App

Stop treating your inventory app like a photo dump. Here's how to build a smart image pipeline in Go.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-4247
Published: Jul 2
Duration: 28:25
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: image-generation software-development open-source

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Self-hosted inventory apps live or die by one metric: how much friction stands between you and finding what you need. If the catalog view takes three seconds to render because it's pulling five-megabyte JPEGs, you stop using the app entirely. This episode explores a practical Go-based image processing pipeline designed for exactly this problem.

The core insight is treating every uploaded image as part of a data pipeline rather than a photo dump. Each image has a job to do, and anything that doesn't serve that job gets stripped away. The pipeline downscales all uploads to two megapixels (roughly 1920x1080), burns a timestamp overlay directly into the pixel data for tamper-evident insurance documentation, converts everything to WebP for 25-35% storage savings, and generates thumbnails only for images explicitly marked as the primary photo.

The architecture uses Go's built-in concurrency — parallel upload handlers feed jobs into a buffered channel, while a worker pool of four to eight goroutines processes each job through conditional stages. A failed image never crashes the batch. The selective thumbnail generation is a database schema decision before it's a processing one: an is_primary boolean flag on the images table determines whether the thumbnail stage fires. This pattern appears across self-hosted apps like PhotoPrism, Immich, and Paperless-ngx, and the engineering challenge is knowing when a buffered channel and four goroutines are enough without reaching for Redis or Kubernetes.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#4068: How to Build a Go Image Pipeline for Your Inventory App

Let me tell you about a moving bag. Two dollars from the building supply store, enormous capacity, absolute workhorse of a move. And the reason I added it to my home inventory isn't the bag itself. It's the label. That tiny printed sticker has the store name and the exact SKU. Photograph it once, and six months from now when I want three more, I'm not squinting at a faded logo trying to remember which aisle I found it in. I just pull up the photo. Thirty seconds, done.

That label is maybe two inches wide. The barcode is smaller than your thumbnail. Your phone camera captures twelve megapixels of noise, a gigabyte of lawn and driveway and motion blur, and somewhere in there is the one thing you actually need to read.

That mismatch is the whole problem. Everyone with a self-hosted inventory app eventually hits the same wall. Taking photos is easy. What happens after you press the shutter is where people give up. You end up with a folder full of twelve-megapixel JPEGs, each one four or five megabytes, and the app slows to a crawl every time you open the catalog view.

This matters more now than it did even two years ago, because self-hosted AI tools are going mainstream. The photos you take today — serial numbers, labels, condition shots of the GPU pins — that's not just documentation. That's training data for the personal assistant you'll be running locally in eighteen months. Store those images inefficiently and the economics collapse before you even start. You're paying for storage you don't need, your backups take forever, and your AI agent is churning through noise instead of signal.

Daniel sent us this one. He's been using HomeBox, an open-source home inventory app written in Go, available on GitHub. He forked it, customized it heavily, and now he's asking the question that comes after you commit to actually maintaining one of these things. What's the best practice for a custom image processing pipeline?

He's got specifics. He wants to downscale every uploaded image to two megapixels, add a timestamp overlay, generate thumbnails only for images explicitly selected as the primary photo, convert everything to WebP, and discard the originals entirely. All of this happening in a parallel upload pipeline, in Go.

Which sounds like a lot of moving parts, but it's really one coherent idea. Stop treating your inventory app like a photo dump. Start treating it like a data pipeline where every image has a job to do, and anything that doesn't serve that job gets stripped away.

The thing I love about this prompt is that it's not theoretical. He's in the middle of a move right now. The moving bags are literally in the room. And he's looking at this process and thinking, I can make this better for next time.

That moving bag story isn't really about the bag. It's about a question that every self-hoster eventually faces. How do you handle the photos?

The answer isn't "just use a CDN with automatic resizing." For a self-hosted app, that misses the point entirely. You're not serving images to a global audience. You're serving them to yourself, on your local network, maybe through a VPN from your phone. A CDN adds latency, cost, and a dependency on someone else's infrastructure for a problem you can solve in about two hundred lines of Go.

Right, and the CDN approach also means you're uploading full-resolution originals to a third party before any processing happens. For a home inventory app, that's a non-starter. You're photographing serial numbers, warranty cards, the inside of your electrical panel. That data never needs to leave your control.

The real problem this pipeline is solving is deceptively simple. Home inventory apps live or die by one metric: how much friction stands between you opening the app and you finding what you need. If the catalog view takes three seconds to render because it's pulling down five-megabyte JPEGs, you stop using the app. And an inventory app you don't use is worse than no inventory app at all, because it's actively misleading.

That's the quiet death of every self-hosted project. Not that it breaks. That it becomes just annoying enough that you drift away from maintaining it. Six months later you've moved house and half your boxes are unaccounted for because the last time you updated the inventory was before the move.

Daniel's specific pipeline design is smart because every stage targets a specific friction point. Downscaling to two megapixels — roughly nineteen twenty by ten eighty — solves the loading-speed problem. That resolution is more than enough to read a serial number or a SKU on a label. The moving bag label Corn mentioned? You could read that at half a megapixel.

The timestamp overlay is the one I find genuinely clever. It's not just metadata in the EXIF header, which anyone can strip or alter. It's burned into the image pixels. If you ever need to file an insurance claim, you've got a tamper-evident record of when that photo was taken and what condition the item was in. Insurance adjusters love that.

The selective thumbnail generation — that's the part where most people over-engineer. The instinct is to generate three thumbnail sizes for every single image at upload time. Daniel's asking for thumbnails only on the image explicitly marked as the primary photo. That's not a processing optimization, it's a database schema decision. You need a boolean column on the image record, something like is_primary, and the pipeline checks that flag before spawning the thumbnail generation step.

Which means the pipeline isn't just a dumb image resizer. It's reading application state. That's the difference between a script and a proper backend component.

In Go, that architecture maps cleanly to a pipeline that reads from the database, not just the filesystem. Your upload handler inserts a row with the temp file path and is_primary set to false by default. The user marks one image as primary through the frontend, which flips that flag. Then the processing worker picks up the job, checks the flag, and decides whether to generate thumbnails.

We're not really talking about one guy's HomeBox fork anymore. This pattern shows up everywhere in self-hosted apps. PhotoPrism does it for photo management. Immich does it. Even Paperless-ngx does something similar for document thumbnails. You've got user-uploaded media, you need multiple resolutions for different contexts, and you need to make decisions about what to keep and what to throw away.

The question that makes it interesting is: how do you build this so it's good enough for personal use without becoming a distributed systems thesis project? Because the temptation with Go is to reach for a job queue, a message broker, a separate worker service, maybe Kubernetes to orchestrate the whole thing. And for a home inventory app processing maybe fifty images per upload session, that's architectural cosplay.

I'm keeping that one.

The real engineering challenge is restraint. Knowing that a buffered channel and four goroutines will handle your entire workload just fine, and that adding Redis to the stack doesn't make you more professional, it makes your docker-compose file longer for no reason.

The episode is really about that line. Where does "build on components" stop and "you're just collecting dependencies" begin? And for image processing specifically, what does the right answer look like in Go?

Let's start with the thumbnail trap, because it's the design decision that cascades into everything else. The naive approach is: user uploads an image, you generate three thumbnail sizes, store all four files, move on. For a hundred images, that's four hundred files. Most of those thumbnails will never be rendered on screen. You're burning CPU cycles on thumbnails for the photo of the back of a power strip. Nobody is browsing to the power strip in the catalog view.

Daniel's instinct here is exactly right. Generate thumbnails only for images explicitly marked as the primary photo. But here's the part that's easy to miss — this is a database schema decision before it's a processing decision. You need an is_primary boolean on the images table. The pipeline checks that flag. If it's false, the thumbnail generation stage is a no-op.

Which means the frontend needs a way to set that flag. Probably a little star icon or "set as cover image" button on each photo. Not complicated, but it has to exist before the pipeline logic matters. And in Go, the cleanest implementation uses a pipeline that reads from the database. Your upload handler receives the files, writes them to a temp directory, and inserts rows with is_primary defaulting to false. The user picks the hero image through the UI. Then the processing worker queries for unprocessed images, checks the flag, and routes accordingly.

The pipeline stages are conditional. That's the key insight. Every image goes through downscale, timestamp overlay, and WebP conversion. But the thumbnail stage only fires if that boolean is true.

Let me lay out the Go architecture, because it's surprisingly straightforward. You've got parallel uploads happening in HTTP handler goroutines — Go gives you that concurrency for free. Each handler writes the temp file, inserts the database row, and writes a job struct to a buffered channel, something like make chan ImageJob, fifty. That buffer absorbs a burst of uploads without blocking the handlers. On the other side, you spin up a worker pool — four to eight goroutines, each running an infinite loop that reads from the channel and processes the job through all four stages. That's the fan-out/fan-in pattern.

If a worker crashes on one image?

It logs the error, updates the database row with a failed status, and moves to the next job. You never fail the whole batch because one image was corrupted. The user sees nineteen of twenty images processed successfully, and one with a red error badge they can investigate.

For the actual image manipulation, what's the library stack look like?

Three libraries, and two of them are in the standard library. Go's image/jpeg and image/png packages handle decoding. For resizing, github.com/disintegration/imaging is the go-to — over five thousand GitHub stars, actively maintained, dead simple API. You call imaging.Fit with the source image, max dimension of nineteen twenty pixels, and it handles the aspect ratio for you.

The WebP encoding?

com/chai2010/webp. The API mirrors Go's standard image encoding packages, so you call webp.Encode with a quality parameter. Quality eighty is the sweet spot for label photos — visually indistinguishable from the source at a fraction of the size.

What's the actual storage savings on WebP?

Google's published benchmarks show twenty-five to thirty-five percent smaller files than JPEG at equivalent quality. For a two-megapixel image of a label, you're looking at maybe three hundred to five hundred kilobytes in WebP versus six hundred to eight hundred in JPEG. Across five hundred inventory items, that's the difference between two hundred fifty megabytes and four hundred megabytes. Not earth-shattering for a single user, but it adds up.

The encoding is slower, right? That's the tradeoff.

It is slower. WebP encoding is more computationally intensive than JPEG. But for a personal inventory app processing maybe fifty images per upload session, we're talking about a difference of seconds, not minutes. The storage savings win. If you were building a multi-user SaaS with thousands of simultaneous uploads, you'd want to make the format configurable. But for HomeBox on a home server, just use WebP and move on.

The timestamp overlay is the feature I want to dig into, because it's the one that sounds like a gimmick and turns out to be valuable.

It's a forensic feature disguised as a UI tweak. You're burning the date and time into the bottom-right corner of the image pixels. Not in the EXIF metadata, which any photo editor can strip or modify in two clicks. In the actual image. If you're filing an insurance claim and you hand the adjuster a photo of a damaged laptop with "June fifteenth, twenty twenty-six, fourteen thirty hours" visible in the corner, that carries weight. Because it's tamper-evident. You'd have to photoshop the timestamp out, and that leaves artifacts.

Implementing it is maybe fifteen lines of Go. You use the standard library's image/font package — or golang.org/x/image/font for more control — and draw the timestamp string onto the decoded image after resizing but before WebP encoding. The whole overlay step sits between the downscale stage and the format conversion stage.

You mentioned the storage structure earlier. What does the directory layout actually look like?

A temp directory for the raw uploads — those get deleted after processing. A processed directory for the two-megapixel WebP files with timestamps. And a thumbnails directory that only gets populated for images where is_primary is true. The database stores the relative paths, so the frontend knows where to look.

If you want to swap out local disk for Cloudflare R2 later? That's the beauty of it. R2 has an S3-compatible API, so your Go code uses the AWS SDK with a different endpoint. You abstract the storage layer behind an interface — something like type StorageBackend interface with a Store method and a Retrieve method. Local disk implements it, R2 implements it, and you can switch between them with a config flag. For a typical household inventory of five hundred images at about five hundred kilobytes each, you're talking two hundred fifty megabytes total. On R2 at one and a half cents per gigabyte per month, that's less than half a cent per month. With no egress fees.

The storage cost is essentially zero regardless of where you put it.

Which is why the processing pipeline matters more than the storage decision. You're optimizing for CPU and developer time, not disk space. Get the pipeline right and the storage takes care of itself.

We've got the pipeline working. But the interesting question is: what does this architecture mean for how you actually use the app day to day? Because the technical choices we just walked through — they're not just implementation details. They shape your behavior.

The moving bag label is the perfect test case. I'm photographing a two-inch sticker. I don't need twelve megapixels of color depth and bokeh. I need the SKU to be readable and the photo to load instantly when I search for it six months later. Two megapixels is more than enough for that.

This is where most people get stuck on the wrong question. They ask "what if I need the original resolution someday?" instead of asking "what am I actually going to do with these images?" For a home inventory app, the use case is catalog browsing and insurance documentation. You're photographing text, barcodes, and condition details. You're not shooting gallery walls.

There's a subtler version of that trap. It's the "build on components" philosophy taken to its logical extreme. Daniel mentioned that agentic coding makes it easy to glue libraries together, and he's right. But the trap is thinking that means importing a library for every sub-task. A timestamp overlay function in Go is maybe twenty lines. If you import a twelve-thousand-line image annotation library to do it, you've added build complexity and attack surface for something you could have written during a coffee break.

The real skill is knowing which components earn their keep. The imaging library for resizing? Absolutely worth the import — image resampling is subtle and easy to get wrong. The WebP encoder? Same thing, the compression algorithm is non-trivial. But the timestamp overlay? The file naming convention? The directory structure? Those are twenty-line functions you write yourself and never touch again.

The rule of thumb is: import for algorithmic complexity, write for glue logic. If the library contains math you'd struggle to explain, use it. If it's a for loop and a string format, type it out.

That's the heuristic. And it connects directly to the thumbnail question, because there's an even smarter approach than selective batch generation. It's called thumbnail-on-demand.

Which is exactly what it sounds like. You don't generate any thumbnails at upload time. You generate them lazily, the first time the catalog view requests one, and then you cache the result.

For a personal app, this is almost always the right tradeoff. The processing cost shifts from upload time to first-view time. Uploads stay fast, and the catalog pays a tiny one-time penalty the first time you browse to a new item. After that, the thumbnail is cached and served instantly. For a hundred images where maybe thirty ever get viewed in the catalog, you've eliminated seventy percent of your thumbnail processing entirely.

The numbers make this obvious. Batch generation of three thumbnail sizes for a hundred images — that's three hundred files created, most of which will never appear on a screen. Lazy generation means a hundred processed images, and maybe thirty thumbnails ever get generated. That's the difference between a pipeline that feels snappy and one that makes you wonder if the upload hung.

Implementing it in Go is straightforward. Your image serving handler checks if the requested thumbnail exists on disk. If it does, serve it. If it doesn't, generate it from the processed image, write it to the thumbnails directory, and serve it. The next request hits the cache. You can wrap the whole thing in a mutex to avoid generating the same thumbnail twice if two requests arrive simultaneously.

Let's walk through the real-world workflow Daniel described, because it ties all of this together. He uploads twenty photos of a new drill. The pipeline processes all twenty — downscale to two megapixels, timestamp overlay, WebP conversion. Originals get discarded. One photo, the hero shot of the drill, gets marked as primary. No thumbnails are generated yet. When he later searches "drill" in the catalog, the thumbnail is generated on first view and cached. The other nineteen photos are available if he clicks through to the detail view, but they never needed thumbnails at all.

That hero photo of the drill — it's doing double duty. It's the catalog thumbnail, but it's also the image he's going to paste into Claude or GPT when he needs to figure out how to change the chuck. Two megapixels is the sweet spot for current OCR and visual analysis models. They can read the model number, identify the chuck type, and return instructions. You don't need more resolution for AI to work with. In fact, larger images just mean more tokens and slower responses.

That's the AI pipeline extension Daniel hinted at. Take photo of the drill's model number, pipeline processes it, paste the image into an AI tool with "how do I change the chuck on this," and the model reads the number straight from the image. The pipeline isn't just saving storage. It's making the images AI-ready by stripping them down to exactly what the model needs.

Here's where the parameterization matters. Two megapixels works today for every major vision model. But eighteen months from now, maybe you're running a local model that handles higher resolution natively, or maybe you want to feed images into a 3D reconstruction pipeline that needs more detail. If you hard-code the max dimension, you're stuck. If you make it a config parameter — an environment variable or a database setting — you can bump it to four megapixels or eight megapixels without touching the pipeline code.

The same goes for the WebP quality setting and the thumbnail dimensions. Anything that might reasonably change as the AI landscape evolves should be configurable. Anything that's a fixed requirement — like discarding originals — can be hard-coded.

This is the deeper point about building on components. The components you choose should expose the knobs you need. The imaging library lets you set max dimensions. The WebP encoder lets you set quality. If you wrap those behind your own config layer, you can swap the underlying library later without changing the rest of the pipeline. That's the real value of the interface pattern — not just for swapping storage backends, but for keeping your options open on every stage of the pipeline.

The pipeline isn't finished when it works. It's finished when it's parameterized for the future you can't predict but know is coming.

Let's make this concrete. If you're sitting down this weekend to build the thing, here's exactly what you implement. Four stages, in order. Stage one: parallel upload to a temp directory. Go's HTTP server handles this natively — each request is already in its own goroutine. Write the bytes, insert the database row with the temp path, push a job onto a buffered channel. Stage two: decode the image, downscale to two megapixels max dimension using imaging.That's one function call. Stage three: draw the timestamp in the bottom-right corner using the standard library's image drawing packages. Stage four: encode to WebP at quality eighty using the chai2010 webp package, write to the processed directory, delete the temp file. If is_primary is true, generate thumbnails. If not, skip that step. Store everything behind an S3-compatible interface so you can point it at local disk today and Cloudflare R2 tomorrow.

The beauty of that spec is that it's maybe two hundred lines of Go, total. The libraries do the heavy lifting, your code is mostly orchestration and error handling.

And here's the eighty-twenty rule of image processing, because it keeps you from spiraling into feature creep. Eighty percent of the value comes from twenty percent of the pipeline features. Downscaling and format conversion give you the biggest storage wins — that's where the megabytes disappear. Timestamp overlays give you the biggest insurance value — that's the feature that pays for itself if you ever file a claim. Thumbnail generation gives you the biggest UX improvement — fast catalog browsing is what keeps you using the app. Everything else — facial blurring, EXIF stripping, color correction, auto-rotation — is noise for a home inventory app.

You don't need to auto-rotate a photo of a serial number sticker. If it's upside down, you'll rotate it manually when you take the next one. Don't build features for edge cases you can solve by retaking the photo.

The other thing to bake in from day one is machine-readable identifiers. Every processed image should have the timestamp and item ID in the filename — something like item_forty_two_dash_twenty_twenty_six_zero_seven_zero_two_dash_fourteen_thirty.That makes it trivial to build retrieval-augmented generation systems later. Your AI assistant can search your inventory by filename pattern matching before it even touches the image pixels.

That's not speculative. The RAG pattern is already the standard way to give AI agents access to personal data. If your images are named with timestamps and item IDs, you've done half the indexing work before you write a single line of search code. Your future self will thank you.

The fork strategy is worth naming explicitly too. HomeBox is a solid foundation — it handles the inventory data model, the web interface, the basic CRUD operations. But no open-source project can anticipate everyone's media management needs. Forking it is the right call. The key is to keep your fork mergeable.

The way you do that is by isolating your pipeline changes behind an interface. Define something like type ImageProcessor interface with a single Process method that takes an upload and returns a processed result. HomeBox's existing upload handler calls your interface. Your implementation lives in a separate package. When upstream HomeBox releases a new version, you pull the changes, and as long as the upload handler's call site hasn't changed, your pipeline code doesn't even need to recompile.

It's the software equivalent of building an addition onto a house without knocking down any load-bearing walls. The original structure stays intact, your custom work is self-contained, and you can renovate either side independently.

You've got your pipeline. Now let's zoom out and ask: where is this all heading? Because the thing Daniel's really building here isn't just an inventory app with nice thumbnails. He's building a knowledge base that an AI agent can reason over.

The drill chuck question is the canary in the coal mine. Right now it's "I'll paste this photo into Claude and ask how to change the chuck." But the next step is your personal AI agent querying your inventory directly. "Find the drill, tell me the model number, pull up the chuck replacement guide, and add the replacement part to my shopping list." That's not science fiction. That's a RAG pipeline with a tool-use loop.

When that's the use case, the image pipeline needs to think differently. Resolution matters less than metadata density. You want every processed image to carry its item ID, timestamp, and a text description of what's in the frame. The AI doesn't need twelve megapixels. It needs to know this is the serial number sticker for item forty-two, captured on July second.

The open question is whether we'll even be browsing catalogs manually in five years. If the AI can answer "where's the spare air filter for the HVAC" by searching your inventory photos, the thumbnail gallery becomes a fallback interface, not the primary one. The pipeline you build today has to serve both.

Which is why the parameterization matters so much. Build the pipeline so you can add metadata extraction stages later — OCR, object detection, label reading — without rewriting the core flow. The four-stage pipeline we described is the skeleton. The AI features are the organs you'll attach later.

If you're running a self-hosted app and your image pipeline is "dump everything into an uploads folder," spend one evening this week implementing those four stages. Downscale, timestamp, WebP, selective thumbnails. Your future self — and your future AI assistant — will thank you.

Now: Hilbert's daily fun fact.

Hilbert: In nineteen eighty-three, a Chinese highway crew widening a mountain pass in eastern Tibet uncovered a perfectly preserved Roman-era road segment — complete with drainage ditches and mile markers in Latin — suggesting a Roman expeditionary force may have reached the Tibetan Plateau nearly two thousand years before any documented European contact. The road was promptly paved over to meet a construction deadline, and no archaeological survey was conducted.

...they paved over it.

This has been My Weird Prompts. Our producer is Hilbert Flumingtop. If you enjoyed this, rate us five stars and tell a friend who's moving.

I'm Herman Poppleberry.

I'm Corn. Talk to you next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#4068: How to Build a Go Image Pipeline for Your Inventory App

Downloads

You Might Also Like

#4068: How to Build a Go Image Pipeline for Your Inventory App