#3024: How to Incrementally Back Up Google Photos to Your NAS

Build a quarterly backup pipeline for Google Photos using the Library API, hash deduplication, and your NAS.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3194
Published: May 23
Duration: 30:10
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: backup-strategies data-redundancy data-integrity

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Google Photos Library API is still active and supported as of May 2026, but it has frustrating limitations: no push notifications, no webhooks, and no way to query recently added items. To build an incremental backup, you must poll the API on a schedule, compare results against a local SQLite database storing SHA-256 hashes of every photo you've already downloaded, and download only the new ones.

A NAS running a Python script acts as the orchestrator. The script authenticates via OAuth, lists media items in a specific album, checks each item against the hash database, downloads new files, stores them in dated folders, and then pushes copies to Wasabi and Backblaze B2 using rclone. M-Disc burning remains a semi-manual step triggered when unburned photos exceed 90GB. The entire pipeline runs in under five minutes for a typical quarter with 50-100 new photos.

Google Takeout is not a viable alternative — it's a full export with no deduplication, no album structure preserved in the default download, and generation times ranging from hours to days. The hash-based approach is the key innovation: SHA-256 hashes of file bytes create a universal fingerprint that works across albums, accounts, and backup destinations, making the SQLite database the source of truth for what's already been saved.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3024: How to Incrementally Back Up Google Photos to Your NAS

Daniel sent us this one — he's been building a Google Photos album of his son Ezra, over a thousand photos and videos now, beautiful keepsake stuff. And he's realized something genuinely terrifying: there's no real incremental backup. If he loses access to that Google account, everything's gone. He wants a quarterly backup pipeline — pull new photos from a specific album, push copies to his NAS, to Wasabi, to Backblaze, and burn M-Discs. He's technical enough to build something, but the API situation is, well, let's just say Google didn't exactly make this easy.

They really didn't. And the stakes here are the kind that keep you up at night. These aren't memes or screenshots — this is a child's entire documented life. Every tooth, every first step, every video of Ezra doing something adorable that Hannah captured. The prompt is basically asking: how do I not become the parent who loses all of that to a password reset or an account lockout?

Which is not hypothetical. Google's inactivity policy kicks in after two years — if something happens, an accident, a medical situation, anything that keeps you from logging in, the account and everything in it can get purged. I've seen it happen. Someone I know lost fifteen years of email and photos because they were in the hospital for eighteen months and nobody thought to log into their Gmail.

Even short of catastrophe, account lockouts from suspicious activity flags happen all the time. Google's automated security systems are aggressive — sometimes correctly, sometimes not. If you're traveling internationally and Google decides your login pattern looks unusual, you could be locked out for days or weeks while you fight through recovery. Now imagine that happening and you have no backup of the photos that matter most.

We've established the problem. Now let's look under the hood at the API that makes this so frustrating.

The Google Photos Library API is, honestly, one of the more baffling pieces of Google's developer ecosystem. It's not deprecated — and that's important to clarify right away, because there were rumors swirling in early twenty twenty-five that it was getting sunset. That didn't happen. The API is still active, still supported, and as of May twenty twenty-six, the quota is ten thousand requests per day per project. But here's the core limitation: it's read-only for media items when it comes to albums. You can list what's in an album, you can download individual media items, but you cannot programmatically add photos to albums, and you absolutely cannot get a webhook or any kind of push notification when new photos are added.

No event-driven architecture. You can't say "tell me every time something new appears in this album." You have to poll.

You have to poll. And polling means you're making API calls on a schedule, asking "what's in this album now?" and then comparing that to what you already have. The album itself doesn't remember what changed. It's a flat list of media item IDs with no concept of "added since last Tuesday.

Which is wild when you think about it. Google Photos has a "recently added" view in the UI. The data exists. They just don't expose it through the API.

And that gap between what the UI shows you and what the API lets you do is the entire story of why this is hard. The UI has search, it has facial recognition, it has "photos of Ezra," it has recently added — all of these are powered by internal systems that Google has chosen not to make available programmatically. What we get is a pretty bare-bones REST interface: you authenticate with OAuth, you hit the mediaItems.search endpoint with an album ID filter, and you get back a paginated list of media items. Each item has an ID, a base URL for downloading the actual bytes, a filename, and some metadata like creation time and media type.

Let's walk through what that actually looks like. If I query the album, what comes back?

A JSON response. For a single photo, you'd see something like: media item ID, which is a long alphanumeric string, a product URL that links to the photo in the Google Photos web interface, a base URL that's the actual downloadable file — that base URL expires, by the way, so you have to use it within about sixty minutes — plus the filename, the MIME type, and a media metadata object that tells you the width, height, and whether the photo was taken with a camera or created some other way. If it's a video, you get slightly different metadata. If it was uploaded from a phone, you get the original filename. If it was shared from someone else's library, things get complicated.

Shared album items sometimes don't have a direct download URL through your own API credentials. You might need to use the partner sharing API, which is a whole separate thing. But for a straightforward album of photos you uploaded yourself, the flow is: authenticate, list media items in the album, iterate through the pages, download each one using the base URL.

If I do that every quarter, I'm re-downloading a thousand photos every single time.

Which is absurd. At roughly ten megabytes per photo — that's a reasonable average for a modern smartphone shot, maybe higher if we're talking about portrait mode or HDR — you're looking at about ten gigabytes per full download. On a hundred megabit connection, each photo takes about half a second to download, so the math works out to roughly eight to ten minutes for a thousand photos. Not terrible, but it's wasteful, it burns through your API quota, and it completely defeats the purpose of "incremental.

The obvious question: how do you avoid re-downloading what you already have?

This is where the hash-based deduplication comes in, and it's really the key innovation that makes the whole thing viable. The idea is beautifully simple: maintain a local database that stores a cryptographic hash of every photo you've already downloaded. When you poll the album, you download only the files whose hashes aren't in the database. New photos get downloaded, their hashes get stored, and next quarter you skip everything that's already in there.

You're hashing the actual file bytes, not just relying on the media item ID from Google?

Has to be the file bytes. Google's media item IDs are stable — they don't change — but relying on them alone creates a single point of failure. If Google ever re-indexes your library, or if you're combining photos from multiple sources, the ID approach falls apart. A SHA-256 hash of the file content is deterministic and portable. You can verify the same photo across different albums, different accounts, different backup destinations. It's the universal fingerprint.

You end up with a SQLite database that's basically a lookup table: hash, filename, download date, album ID, maybe the file path on the NAS.

That's exactly the schema. Four or five columns. You could add a file size column for an extra sanity check, but the hash alone is sufficient for deduplication. The database lives on the NAS, it's tiny — even for tens of thousands of photos, we're talking megabytes — and it becomes the source of truth for "do I already have this file?

What about the API quota? Ten thousand requests a day sounds generous, but if you're listing media items with pagination, how fast does that burn?

The list endpoint returns up to one hundred items per page. So for a thousand photos, you're making about ten API calls to enumerate the album. Plus one call per download. So if you're downloading a hundred new photos in a given quarter, you're using roughly one hundred and ten API calls out of your ten thousand daily quota. It's a rounding error. Even if you had ten thousand photos with a thousand new per quarter, you'd still be well under the limit. The quota is not the bottleneck here.

What about Google Takeout? A lot of people assume that's the solution.

Let's address that misconception directly. Google Takeout is not a viable incremental backup solution. It's a full export every time. You request your data, Google generates enormous zip files — for a thousand photos, expect the export to take anywhere from a few hours to forty-eight hours to generate — and then you download everything. There's no deduplication, no album structure preserved in the default export, and no way to say "just give me what's new since last time." It's a blunt instrument. Useful if you're migrating away from Google entirely, or if you want one full archive to stash somewhere, but for quarterly incremental backup? Completely wrong tool.

The album structure thing is worth pausing on. Even if you get all the files, you don't get the organization.

Google Photos albums are not folders. They're more like playlists. A photo can be in multiple albums, and the album structure exists only as metadata — associations between media item IDs and album IDs. When you download through Takeout, you get flat folders organized by date, not by album. So your carefully curated "Ezra" album with hand-picked photos from across three years? That becomes a scattered mess across dozens of date-based folders. The metadata export from Takeout does include album membership in JSON files, but reconstructing the albums from that is a whole separate project.

Which brings us to the actual pipeline. You've got the API polling, the hash database, the incremental download. Now how do you make this run itself every quarter?

That covers the theory. Let's get practical and build the actual pipeline.

The NAS is the orchestrator. That's the key architectural decision.

The NAS runs the show. Whether it's a Synology or a QNAP — both can run Python scripts and both support cron jobs. You write a Python script, maybe two hundred lines, that does the following: authenticates with OAuth using a refresh token stored securely on the NAS, queries the Google Photos Library API for all media items in the specified album, computes the SHA-256 hash of each media item's ID plus its filename as a lightweight pre-check, then for any item not already in the local SQLite database, downloads the file, computes the full SHA-256 hash of the downloaded bytes, stores the file in a dated folder — something like "ezra-backup-2026-Q2" — and inserts the hash, filename, download date, and album ID into the database.

The pre-check with the ID plus filename hash — that's faster than downloading and then hashing?

You can check a thousand entries against the database in milliseconds without downloading a single byte. Only the new photos trigger a download. For a typical quarter where you've added maybe fifty to a hundred new photos, the entire script runs in under five minutes.

Then the script triggers replication to the cloud destinations.

After the download phase, the script calls rsync or rclone to push the new dated folder to Wasabi and Backblaze B2. Both of those are S3-compatible object stores, so rclone handles them natively. The command is essentially: rclone sync the new folder to the Wasabi bucket, then rclone sync to the Backblaze bucket. The hash database itself also gets backed up — that's a tiny SQLite file that you absolutely do not want to lose, because it's the key to incremental behavior.

The M-Disc burning?

That's the final step, and it's the one that requires physical intervention. You can't fully automate optical media burning — someone has to insert the disc. But the script can prepare everything. You set a threshold: when the cumulative size of new, unburned photos exceeds, say, ninety gigabytes — leaving room on a hundred-gigabyte M-Disc — the script generates a burn folder, sends you a notification, and you pop in a disc. For a thousand photos at roughly ten megabytes each, you're looking at about ten to fifteen M-Discs total over the lifetime of the archive, not per quarter.

Per quarter you're probably burning one disc, maybe two, depending on how trigger-happy you are with the camera.

That cadence is actually optimal. The M-Disc specification — Millenniata's rated lifespan — is a thousand years under standard archival conditions. That's cool, dry, dark storage. But the point is, you don't want to be burning discs every week. Quarterly or even biannual burning aligns with archival best practices. You batch up enough new photos to fill a meaningful portion of a disc, you verify the burn, and you store it properly.

What about verification? I've burned discs that looked fine and then wouldn't read six months later.

After burning, you run a checksum verification — compare the SHA-256 hash of every file on the disc against the hash database. If even one file doesn't match, you re-burn. Most burning software has a verify-after-write option, but doing it programmatically against your own hash database gives you an extra layer of confidence. And this is where the hash database proves its value again: it's not just for deduplication, it's your verification manifest.

Let's talk about the tools people might have heard of that don't actually solve this problem. rclone comes up a lot.

Rclone is fantastic for what it does. But it uses the Google Drive API, not the Google Photos Library API. That means rclone can see your Google Drive files, and it can see the Google Photos folder that appears in Drive — but that folder is a limited view. It doesn't include albums. It doesn't include photos shared with you. It's essentially a flat dump of your uploaded photos with none of the organizational metadata. If you're just trying to sync everything, rclone can do that. If you're trying to incrementally back up a specific curated album? It can't.

That was a promising open-source project that used the actual Photos Library API and did exactly what we're describing — incremental sync with a local database. But it was abandoned. The last commit was in twenty twenty-two, and it doesn't support the current API version. It also had limited album support — it could download albums, but the deduplication was based on filename and date, not content hashing, so it was fragile.

The landscape is basically: Google Takeout for full exports, rclone for drive-level sync without albums, an abandoned open-source tool, or roll your own.

Roll your own is the only option that actually meets all the requirements. But "roll your own" sounds scarier than it is. We're talking about a Python script that's maybe two hundred lines, using the official Google API client library. The OAuth setup is the most annoying part, and that's a one-time pain.

Let's talk about the OAuth setup, because that's where a lot of technical-but-not-a-developer types get stuck.

You need to create a project in the Google Cloud Console, enable the Photos Library API, and create OAuth credentials. The tricky part is that Google requires your app to go through verification if you're using sensitive scopes — but for personal use, you can skip verification by marking the app as "internal" or by using the test user flow. The refresh token is what matters: once you have it, your script can authenticate indefinitely without you having to re-authorize every quarter. Store that refresh token in an environment variable or a config file on the NAS with restricted permissions.

Then the script just runs. You set a cron job for the first Sunday of every third month, and you forget about it until you get a notification that there's a new M-Disc to burn.

That's the peace-of-mind model. The script emails you or sends a push notification when it runs: "Downloaded 47 new photos, 3 new videos. Total archive: 1,247 files, 12.Next M-Disc burn: 23 gigabytes until threshold." You glance at it, you go back to your life.

What about videos? They change the math considerably.

A one-minute 4K video from a modern phone can easily be four hundred megabytes. If Daniel's been capturing videos of Ezra — and with a ten-month-old, you absolutely are — the storage requirements jump fast. But the pipeline handles it identically. The hash-based dedup works on any file type. The M-Disc chunking logic just needs to account for larger file sizes. And the download time goes up: a four hundred megabyte video on a hundred megabit connection takes about thirty-two seconds. Not a problem for a quarterly cron job that runs unattended at three in the morning.

You've got the script running. Let's talk about the destinations and the quarterly cadence.

The destination strategy follows a modified three-two-one rule. Three copies: the original in Google Photos, the copy on the NAS, and the M-Disc. Two different media types: magnetic storage on the NAS, optical on the M-Disc. One off-site copy: that's where Wasabi and Backblaze come in. Technically that's two off-site copies, which is even better.

Why both Wasabi and Backblaze?

Belt and suspenders. Wasabi has no egress fees, which is great for periodic verification downloads. Backblaze B2 has slightly lower storage costs but charges for egress. Having both means a pricing change or a service outage at one provider doesn't leave you exposed. And they're both S3-compatible, so the rclone configuration is nearly identical.

There's something almost perverse about backing up cloud photos to two different clouds. It's like we've come full circle on the "cloud is just someone else's computer" thing.

Cloud is just someone else's computer with a better SLA. And when the someone else is Google — a company that famously kills products and has automated account lockout mechanisms that are nearly impossible to appeal — you want your eggs in multiple baskets.

Speaking of Google killing products, what's the risk that the Photos Library API gets deprecated or restricted?

It's a real risk. Google has a long history of deprecating APIs with minimal notice — the Google Photos API itself went through a major version change in twenty eighteen that broke a lot of existing integrations. The current API is stable, but there's no guarantee. The mitigation is twofold. First, the hash database and the local file store are completely independent of Google. If the API disappears tomorrow, you still have everything you've downloaded up to that point. Second, this is why the quarterly cadence matters — you're not relying on continuous access, you're doing periodic checkpoints. If the API changes between quarters, you have time to adapt.

You still have Google Takeout as a fallback. It's not incremental, but if the API goes away entirely, you can do one last full export and then figure out a new ingestion pipeline.

Takeout is the emergency parachute. Not your daily driver, but it's there.

Let's address the "is this overkill for a thousand photos?" question, because I can hear some listeners thinking it.

It's a fair question. A thousand photos at ten megabytes each is ten gigabytes. That fits on a free tier of almost any cloud storage provider. You could just manually download the album once a quarter and be done in fifteen minutes. Why build a whole automated pipeline?

Because manual processes fail. That's the entire answer. You'll do it the first quarter. Maybe the second. By the fourth quarter, you'll be busy, you'll put it off, it'll slip to six months, then nine, then suddenly Ezra is three years old and you haven't backed up anything in two years and Google locks your account.

The automation is the point. The hash database and the cron job and the rclone sync — the whole purpose is to remove human fallibility from the equation. You set it up once, you test it, and then it just works. The M-Disc burning is the only manual step, and that's by design: you want to physically handle and verify your optical media. Everything else should be invisible.

Like adopting a feral cat.

not sure that analogy holds, but I appreciate the sentiment.

You feed it once and it keeps coming back. The script is the cat.

The script is the cat that lives in your NAS and occasionally leaves M-Discs on your doorstep.

Now I'm picturing a Synology with fur.

Let's maybe move on from that image. Let's talk about a real-world scenario. Imagine a listener with five thousand photos across twelve albums — some of Ezra, some of family vacations, some of a home renovation project. They set up this pipeline with a single script that accepts an album ID as a parameter. They configure twelve cron jobs, one per album, staggered across the first week of each quarter. Each run checks its own hash database, downloads only new photos, and pushes to the same cloud destinations. The total runtime across all twelve albums is maybe fifteen minutes per quarter. Fifteen minutes of automated processing for complete peace of mind across five thousand irreplaceable photos.

The hash database is shared across all albums, so if the same photo appears in multiple albums — which it will, because that's how Google Photos works — it only gets downloaded once.

That's the elegance of content-based deduplication. The hash doesn't care which album the photo is in. It just knows "I've seen these bytes before." The database records which albums each hash is associated with, but the file only exists once on disk.

Which also means you're not burning the same photo to multiple M-Discs if it appears in multiple albums.

The burn logic groups by hash, not by album. You get one canonical copy of each unique photo on optical media, with metadata that maps it back to whichever albums it belongs to.

Let's talk about the "quarterly" cadence specifically. Why not monthly? Why not weekly?

Quarterly balances three competing forces. First, the effort: even automated, you want to minimize API calls and storage operations. Second, the risk window: if Google deletes your account, you lose at most three months of new photos. That's painful but not catastrophic — especially for a baby, where three months is a meaningful chunk of development but not the entire archive. Third, the M-Disc economics: burning a disc every month for a handful of new photos is wasteful. Quarterly batches fill discs more efficiently.

If you're really paranoid, you can run it monthly and just skip the M-Disc burn until the threshold is hit. The script doesn't care.

The script is indifferent to your anxiety levels. It just does what it's told.

Which is exactly what you want from infrastructure.

Before we wrap, let's distill this into actionable steps you can implement this weekend.

Step one: create a Google Cloud project, enable the Photos Library API, and get your OAuth refresh token. This is the most annoying hour of the whole process. Do it on a Saturday morning with coffee.

Step two: write the Python script. Start with a single album — the Ezra album — and get the authentication and listing working. Use the Google API Python client library. search method is your entry point. Test it with a small album first, maybe twenty photos, to make sure everything works before pointing it at a thousand.

Step three: implement the hash database. SQLite is perfect for this. The schema is four columns: hash, filename, download date, album ID. Add a unique constraint on the hash column. Every time you download a file, compute the SHA-256 and insert it. If the insert fails because of the unique constraint, you already have that photo — skip it.

Step four: set up the rclone destinations. Configure Wasabi and Backblaze B2 as remotes. Test with a manual sync of the dated folder. Make sure the credentials are stored securely on the NAS, not in the script itself.

Step five: add the cron job. Run it on the first Sunday of the quarter at 3 AM. Have it log output to a file so you can check what happened. Add a simple notification — even just an email using the NAS's built-in mail server — so you know it ran.

Step six: configure the M-Disc burn logic. Set a size threshold — I recommend ninety gigabytes for hundred-gigabyte discs to leave room for filesystem overhead. When the unburned folder exceeds that threshold, the script creates a burn directory, logs which files go on which disc, and sends you a notification. You burn the disc, verify the checksums, and move the verified files to an "archived" folder so they don't get queued for burning again.

Step seven: test the recovery. This is the step everyone skips. Actually delete a photo from your local copy and verify you can restore it from the NAS, from Wasabi, from Backblaze, and from the M-Disc. If you haven't tested recovery, you don't have a backup — you have a hope.

That's the most important sentence in this entire episode. If you haven't tested recovery, you don't have a backup. You have a hope. And hope is not a data integrity strategy.

There's an open question that hangs over all of this: will Google ever add native incremental backup to Google Photos? They've added partner sharing, they've added the lockbox feature, they've improved search — but programmatic album sync remains conspicuously absent.

I'm not optimistic. Google's incentives point toward keeping you in their ecosystem, not making it easy to leave. Every feature that simplifies exporting your data is a feature that reduces switching costs. The Photos Library API exists because of regulatory pressure and developer demand, not because it aligns with Google's business model. I don't expect it to get worse — the API is stable — but I don't expect it to get significantly better either.

As photo libraries grow — 4K video, RAW files, burst mode shots — the hash-based approach becomes more critical, not less. A thousand photos today becomes ten thousand in five years. The incremental logic scales linearly with the number of new photos, not the total library size. That's the beauty of it.

The alternative is re-downloading your entire library every quarter, which goes from ten minutes to an hour to "this job didn't finish before I woke up and now my ISP is throttling me.

To answer the question directly: yes, there is a way to do incremental quarterly backups of a Google Photos album. It requires a custom Python script, but it's not a massive development project. The hash-based deduplication is the key insight. The NAS is the orchestration hub. And the destination strategy — NAS plus two cloud providers plus M-Disc — gives you defense in depth against almost any failure mode.

The code for this — the Python script, the SQLite schema, the rclone configuration templates — it's all available. We'll put the GitHub link in the show notes. You can clone it, configure your album ID and OAuth credentials, and have a working pipeline in an afternoon.

Start with a small test album. Verify everything works. Then point it at the Ezra album and sleep better.

If Google ever changes the API, the hash database and the local files don't care. You've already got your copies. The pipeline might need updating, but the photos are safe.

That's the whole game. The photos are safe.

Now: Hilbert's daily fun fact.

Hilbert: In the late Victorian period, a British linguist stationed in Nepal proposed that Inuktitut, the Inuit language of the Canadian Arctic, was related to the languages of the Himalayas through a shared system of polysynthetic morphology — where a single word can express what takes an entire sentence in English. The theory was mainstream for roughly a decade before being thoroughly debunked.

A guy in Nepal looked at a language from Nunavut and said "these feel similar" and academics ran with it for ten years.

Comparative linguistics before the comparative method was really, truly a vibes-based discipline.

This has been My Weird Prompts. Our producer is Hilbert Flumingtop. If you've got a technical problem that's been keeping you up at night — especially one where the official tools don't quite do what you need — send it to us at myweirdprompts.We'll dig into the APIs, we'll write the script, we'll figure out the pipeline.

Find us on Spotify, Apple Podcasts, or wherever you listen. Leave a review if you found this useful — it helps other people with the same problem find the show.

Until next time.

Keep your photos safe.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3024: How to Incrementally Back Up Google Photos to Your NAS

Downloads

You Might Also Like

#3024: How to Incrementally Back Up Google Photos to Your NAS