#2398: Your Taste, Your Data: Owning Your AI Preferences

Why can’t you describe your perfect movie—but you’d know it if you saw it? A vision for portable, user-owned AI taste profiles.

Featuring

Daniel

Corn

Herman

Listen

0:00

Episode Details

Episode ID: MWP-2556
Published: Apr 24
Duration: 24:11
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: DeepSeek v3.2
Topics: data-sovereignty local-ai digital-privacy

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

The Paradox of Preferences: Why You Can’t Describe Your Perfect Movie

Try describing your ideal movie to a friend. You might say something like, “I want something atmospheric but fast-paced, with deep characters but a tight plot.” It’s a meaningless string of words—yet if a service suggested the right film, you’d know instantly. This gap between recognition and articulation is the core challenge of recommendation systems. Today, platforms like Netflix and Spotify solve it by mining your behavioral data (what you watch, skip, or rewind) to build models that predict your tastes. But there’s a catch: that data isn’t yours. It’s siloed in corporate vaults, making you a permanent tenant in their algorithmic worlds.

The Case for Portable Taste Profiles

Imagine a different approach: a small, local database—say, a SQLite file—that stores your movie ratings, watch history, and even niche affinities (like a love for 1970s Italian political comedies). This file lives on your device, and you grant temporary access to recommendation services. They analyze your data, suggest options, and disconnect without retaining copies. The model is inverted: instead of your data going to the AI, the AI comes to your data.

This isn’t just theoretical. Developers have built prototypes using lightweight machine learning libraries, with surprising success. One hobbyist’s locally trained model outperformed Netflix for his obscure film tastes because it focused solely on his preferences, not the crowd’s. The trade-off? You lose the power of mass data but gain perfect alignment with your quirks.

The Roadblocks—and Blueprints

The hurdles are less technical than ecosystem-based. A standardized schema (like iCal for calendars) would let services interoperate, while tools to import existing histories (Netflix exports, Letterboxd ratings) could seed your profile effortlessly. The real challenge is adoption: convincing platforms to support user-owned data portability. But with AI infiltrating every digital experience, the stakes are high. Without this shift, we risk a future where our preferences are forever locked in corporate walled gardens—decorating a rented apartment we can never own.

Mentions

Airtable Spreadsheet-database hybrid tool
DBeaver Universal database management tool
Goodreads Book catalog and community platform
iCal Standard calendar data format
Letterboxd Social network for movie lovers
Notion All-in-one workspace with databases
Obsidian Local Markdown note-taking app
Sign in with Apple Privacy-focused authentication service
SQLite Lightweight SQL database for local data
vCard Standard contact data format

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

Featured In

Creator's Picks 304 episodes

#2398: Your Taste, Your Data: Owning Your AI Preferences

Daniel sent us this one. He's been thinking about the strange difficulty of defining our own preferences. You know, you can't really tell someone exactly what kind of movie you want to watch tonight, but you'd recognize it if a service suggested it. And this extends to food, wine, travel, all sorts of personal data. His vision is for people to own that preference data themselves, decouple it from the big platforms, and have their own little AI memory they can plug into different services. The question is, what do we call this model? And how do we actually build it so we keep control? It's not a big data problem—this could all fit in a simple SQLite file.

That's such a good prompt. It gets at something I've been feeling for years. I have this entire mental catalog of movies I love, but if you ask me to describe my taste, I just spout nonsense about mood and pacing. I’ll say, “I want something atmospheric, but not slow, with a tight plot… but also character-driven.” Which is a meaningless string of words.

You end up just handing over your entire viewing history to Netflix and hoping their black box gets you. Which, by the way, it increasingly does not. I saw a Reuters piece last week noting that over seventy percent of the personal data used in recommendation engines right now is siloed and owned by those centralized platforms. We're training their models for free.

Losing the thread of our own tastes in the process. It’s like… I know I love the film Children of Men, but I couldn’t tell you the precise combination of elements that makes it work for me. Is it the long-take cinematography? The bleak-but-hopeful tone? The fact that it’s a dystopia that feels utterly mundane? An external system that observed my reactions could pinpoint it, but I can’t articulate it. Oh, and fun fact for today—deepseek-v3.2 is writing our script. So maybe it has some opinions on data ownership.

I'm sure it's thrilled. So, the core paradox here is that we are terrible at articulating what we want, but we're brilliant at recognizing it. An ideal system wouldn't ask you to define your preferences; it would help you discover them, and then remember them for you, in a place you control.

That's the hook. Imagine an AI that knows your movie tastes better than you do, but you own all the data that lets it do that. Not Amazon, not Google.

Why this matters now more than ever is that AI is getting woven into everything. If we don't figure out ownership and portability of our personal data—our preferences, our quirks, our memories—we're just building a future where we're permanent tenants in someone else's smart world. We’re decorating a rented apartment.

How do we solve this paradox of preferences?

That’s where we start.

The paradox is we're unreliable narrators of our own tastes. I can tell you I love slow-burn political thrillers, but then I'll absolutely hate the next one Netflix serves me. My stated preference is wrong.

The platforms know this. That's why they've moved almost entirely to implicit signals—what you actually watch, how long you watch it, what you skip. Your explicit five-star ratings are basically decorative now. The real model is built from behavioral data you generate passively.

Which they then lock in their vault. So even if you wanted to leave, you can't take your taste profile with you. You'd have to start from zero somewhere else, re-teaching a new algorithm from scratch. It’s a huge switching cost.

That's the moat. And it's incredibly effective. Think about music. My Spotify Discover Weekly is scarily good, because it's had fifteen years of my listening habits. If I tried to move to Apple Music tomorrow, the recommendations would be useless for months. I'm a prisoner of my own data. But what if that data wasn’t in Spotify’s vault? What if it was in a file on my phone, and I could just… point Apple Music at it?

The current solution isn't a solution for us at all. It's a solution for them. Centralized models that require centralized data. What if we flipped it? What if the core data—the record of what I've liked, what I've skipped, my weird little niches—lived in a file on my device?

Then the recommendation service becomes a temporary consultant. It asks permission to read my preference file, does its analysis, suggests a movie, and that's it. It doesn't get to keep a copy. The data never leaves my control.

Which sounds like a pipe dream, until you remember that technically, this is a tiny dataset. We're not talking about training a large language model from scratch. We're talking about a few thousand data points—movie titles, maybe some genre tags, my watch history, my ratings. A SQLite database can handle a hundred and forty terabytes. I think my movie preferences will fit.

The challenge isn't storage. It's the model. Can a small, federated AI—one that runs locally or as a service I temporarily grant access to—actually do a better job than the Netflix monolith?

Maybe not better in raw power, but better in alignment. It works for me, not for Netflix's shareholder goal of keeping me subscribed at all costs. Its only job is to find me a movie I'll like tonight. That's a cleaner incentive—and actually, the SQLite approach I mentioned earlier could make that happen.

That alignment is the key. So how would this actually work? Picture a simple database file on your phone or computer. One table for movies you've watched, with columns for title, year, maybe an IMDB ID, and a personal rating. Another table for interactions—started watching but stopped after ten minutes, rewound to watch a scene twice.

And you'd build it up over time. You could even have it pull in data from your existing services—export your Netflix viewing history, your Letterboxd ratings—to seed it. The point is, it becomes your canonical source. The single source of truth for Corn's questionable cinematic tastes.

Now, the federated AI model. That's the service layer. You'd have a recommendation engine—maybe a small neural net, maybe a simpler algorithm—that you can point at your database. It reads the data, processes it, and spits out suggestions. But the model itself doesn't store your data. It's like a chef you invite into your kitchen to use your ingredients. They don't get to take your pantry home with them.

The technical architecture is inverted. Instead of my data going to the model, the model comes to my data. That's the API-like integration Daniel mentioned. I grant a service temporary, read-only access to my preference file via a secure token. It does its computation, gives me my recommendations, and the connection closes.

This is where it fundamentally differs from big data approaches. Netflix's model is trained on hundreds of millions of users' data. It finds patterns in the crowd. Our hypothetical local model would be trained only on my data. It's hyper-personalized by definition, because its entire world is my preferences.

Which addresses the "cold start" problem for new services, right? I don't have to teach every new app from scratch. I just say, "Here, read my file. Now you know me.

The technical hurdle, though, isn't the database or even the local model. It's standardization. We'd need a common schema—an agreed-upon set of data fields—so that different recommendation services can all read the same file. Otherwise, you're just creating a new kind of lock-in with a proprietary preference format.

That's the classic interoperability challenge. But it's solvable. Look at what happened with calendar data with iCal. Or contact information with vCard. Open standards emerged because the demand for portability was so high. We could do the same for taste profiles. A fun fact here: the vCard format, which powers your contacts, was first proposed in the 1990s and is still going strong because it solved a simple, universal need. We have the blueprint.

There's a case study I was thinking of that's almost a prototype for this. A developer built a movie recommendation tool for himself using a Python script and a SQLite database. He scraped his own viewing history, stored it locally, and used a lightweight machine learning library to find patterns and suggest similar films from a separate database of movie metadata. The entire thing ran on his laptop.

He reported it was surprisingly good for his niche interests—better than general streaming services at surfacing obscure foreign films he loved, because it was based entirely on his own ratings, not the crowd's. The model wasn't as good at predicting blockbusters he'd like, but he didn't care. That's the trade-off. You lose the power of the crowd, but you gain perfect alignment with your own eccentricities. For him, the system nailed it because it noticed his deep affinity for, say, 1970s Italian political comedies—a signal that gets drowned out in a global model.

Which, for a lot of us, is the whole point. I don't want recommendations for what the average person who watched one slow sloth documentary might like. I want recommendations for what I, specifically, will like. The technical hurdle seems less about raw compute power and more about building that initial ecosystem—the standard, a few key services that support it, and tools to easily build your own database.

Getting the data in. The manual entry would be a nightmare. But if you can import from existing sources, or if browsers and streaming apps could be convinced to export logs in this standard format, it becomes frictionless. Your preference file just grows passively in the background as you live your digital life. Think of it like a Fitbit for your taste. It logs your activity automatically.

The mechanism is feasible. The model is possible. The real question is whether anyone will bother to build it outside of a hobbyist's weekend project.

Which is the perennial question for anything that puts power back in users' hands. But let's assume the will exists. We've got this local database, this federated model that visits it. What do we even call this thing? "My local movie preference SQLite file" isn't exactly catchy.

We need a name for the model, the approach. I've been kicking around a couple. "Personal AI Memories" feels right—it's your curated digital memory of taste, something an AI can reference.

I like that. It frames the data as something experiential and valuable, not just behavioral logs. Though it sounds a bit… therapeutic. "And how does that memory make you feel, Corn?

More technical: "Federated Preference Nodes." Emphasizes the decentralized network aspect—each user is a node with their own data.

That's accurate but sounds like a whitepaper title. Maybe we just call it a "Taste Profile," and the innovation is that it's portable and ownable. The "what" is simple; the "how" is the breakthrough.

The naming matters for adoption, though. It needs to be understandable. I keep coming back to the Google Calendar analogy Daniel mentioned. That's a perfect case study. My calendar data lives in my Google account, but I can grant access to other apps—a flight booking site, a project management tool. They can read my calendar, even add events, via an API. But they don't get a permanent copy of my entire schedule. I can revoke access anytime.

Crucially, the calendar data format is standardized—iCal, CalDAV. Any app that wants to integrate knows how to read and write those events. That's the model. My "Taste Profile" becomes just another personal data stream I can choose to expose, in a standard format, to services that ask nicely. It’s not a radical new concept; it’s applying an old, proven pattern to a new type of data.

So the practical implications go way beyond movies. Once you have this framework, this concept of a user-owned, portable data pod, you can apply it to anything preference-based. Food and drink. You log meals you loved, wines you tried, restaurants. A recommendation app for your trip to Lisbon asks for read access to your food pod and instantly knows you seek out family-run tascas, not Michelin stars.

Travel is the big one. My travel pod would have all the places I've been, hotels I liked, types of vacations I enjoy—museums versus beaches, hiking versus lounging. A new travel site doesn't need my entire booking history from Expedia; I just connect my pod. It could even include granular things: "prefers Airbnb with a kitchen," "will pay more for a quiet room," "always visits a local bookshop.

It extends to shopping, reading, even clothing sizes and style preferences. The vision is a constellation of these small, personal datasets that together form a complete picture of you, owned by you. Services become plugins to your life, not platforms you live inside. But hold on—doesn’t that just create a different, maybe more dangerous, central repository? If all my eggs are in this one "pod" basket...

That’s a critical pushback. The challenges are massive. If I update my movie pod after watching a film on a plane, how does my home recommendation service know? It needs to be able to poll the latest version, or get push updates.

Privacy becomes a different kind of challenge, too. Right now, a data breach at Netflix exposes what movies you watched. In this model, a breach of your local device or your cloud-synced pod exposes everything—your movies, your travel, your food quirks. The attack surface consolidates, but so does the value of the target. It’s a single point of failure, but it’s your point of failure. You’d need serious, user-friendly encryption by default.

We'd need a whole ecosystem of apps that agree to play nice with this open standard. The big platforms have zero incentive to support reading from a user-owned pod when their entire business is based on owning the pod themselves. Why would Netflix agree to be a guest in my kitchen when they’ve spent billions building their own restaurant?

That's the biggest hurdle. You'd likely see adoption start with indie developers, niche services that can't afford to build massive user graphs themselves. They'd leap at the chance to offer personalized service from day one by tapping into existing user data. The big guys might only comply if forced by regulation or overwhelming user demand. Think of it like the "Sign in with Apple" or "Sign in with Google" buttons—they emerged because developers and users wanted frictionless sign-ups. A "Connect your Taste Profile" button could follow the same path.

Which circles back to the incentive alignment. A small wine recommendation app powered by user-owned data has one goal: make amazing wine suggestions so you keep using it. Its success is tied to the quality of its model, not its data hoard. That's a healthier market. It also changes the data dynamics. In today's model, creepy ads follow you because platforms sell your inferred intent. In this model, you're not broadcasting your data. You're presenting a certified record of your past preferences to a service for a specific task. It's a more intentional, less leaky form of sharing.

Assuming you can trust the model not to scrape and squirrel away a copy, which is a whole other layer of technical and legal enforcement. You’d need ways to audit the model’s behavior.

It’s not just about an API handshake. You'd need verifiable execution environments, maybe even blockchain-style attestations for the models themselves, proving they're running the code they claim and not exfiltrating data. It gets complex fast. But the principle—user ownership as the default—is the necessary starting point. We have to architect for that principle first, then solve the attendant problems.

The name might be less important than the architecture. Call it a "Preference Pod," a "Taste File," whatever. The core idea is the inversion: my data is my hub, and services are temporary spokes. Which raises the question—if the architecture is clear but the name is debatable, what's stopping someone from using this today? Can we actually do anything about this now, or is it just futuristic speculation?

The challenges are real. You can't exactly download "Personal AI Memories" from an app store yet. So what does a listener who's intrigued actually do next?

You start small. Exactly as Daniel suggested. The data size is trivial. If you're technically inclined, create a SQLite database on your computer. Just a simple table with movie titles, your rating, maybe the date you watched it. Use a tool like DBeaver or even a Python script to add entries. That's your prototype pod. It's not about building the whole federated system day one; it's about proving to yourself that the core data is yours and manageable. You could even write a simple script that queries it: "Show me all the sci-fi movies I rated above 4 stars.

For the less technical? There are already tools nudging in this direction. The "Your Data" export functions on platforms like Letterboxd or Goodreads—downloading your reviews and ratings as a CSV file. That's a seed. Store that file somewhere you control. It's a passive first step toward ownership. Even using a note-taking app like Obsidian or Notion to keep a dedicated "Movies I Loved" page is a move in the right direction. The format is less important than the act of curation.

The actionable insight is to begin curating a primary source. When you love a movie, don't just leave the thumbs-up on Netflix. Open a note on your phone, or a spreadsheet, and log it. You're building your own canonical record, independent of any platform's continued existence or goodwill. It's digital hygiene with a purpose. It’s like keeping your own recipe book instead of just bookmarking links that might go dead.

It shifts your mindset from "my data lives on Netflix" to "Netflix is one viewer of my data." That's the philosophical win before the technical one. And you’ll start to notice the gaps—like, I wish I remembered why I loved that movie, or what I ordered at that restaurant. So you start adding more fields.

The future vision this points toward is a world where you have a dashboard of these personal datasets. Your movie pod, your travel log, your culinary hit list. You grant temporary access to services—a weekend trip planner, a streaming service aggregator—and they compete on the quality of their insights using your data, not the size of their data hoard. The best AI isn’t the one with the most data; it’s the one that best interprets your data.

The call to action is to experiment with one domain. Pick movies, or books, or restaurants. Find a way to own that list. It might be a markdown file, a Notion database, an Airtable base. The format isn't sacred; the principle is. You're taking the first step out of the walled garden. And honestly, it’s satisfying. There’s a joy in querying your own private dataset.

That experiment teaches you what you'd want from a true standard. You'll immediately feel the pain points—how do I easily add entries? How do I tag things? How do I handle duplicates?—which informs what a good, user-centric schema should look like. You become a stakeholder in the solution.

It also makes the value tangible. When you can query your own data to answer, "What was that amazing Portuguese red I had in two thousand twenty-three?" and find it instantly in your own system, the benefit of ownership clicks. It's not just ideological; it's useful. It solves an immediate personal problem.

The goal isn't to overthrow Netflix by Tuesday—it's to cultivate the personal infrastructure so that when the tools and services that respect that infrastructure emerge, you're ready. You own the foundation. And that raises the obvious question: will the big platforms ever play along? Or is this destined to be a parallel, niche ecosystem?

Why would Netflix agree to read from my file when its entire advantage is writing to its own? That’s what haunts this whole idea. Their business model is engagement, and their secret sauce is the collective data.

I don't think they would, voluntarily. Not unless they faced regulatory pressure to interoperate, or a critical mass of users started demanding data portability as a non-negotiable feature. Their moat is their data. But think about email. You can use Gmail, but you can also take your email address and data elsewhere because of open protocols like IMAP and SMTP. That interoperability was fought for. It didn’t emerge from corporate benevolence.

Which means the future of this model likely starts at the edges. With the indie streaming guide, the boutique travel planner, the sommelier app built by two people. They’re the ones who’d benefit from tapping into pre-existing, rich user data without having to build a surveillance empire first. Success for them is proving that a user-aligned model can work, creating pressure and a working example.

That’s okay. The final thought has to be that the future of AI—the one that actually serves individuals—should prioritize user ownership. Not as an afterthought, but as the architectural cornerstone. My data, my hub. Everything else is a guest. The technical path exists. The data is small. The real gap is will, standardization, and initial experiments.

It’s a provocative shift. From platforms that know you, to a you that knows yourself, and chooses what to share. The path starts with a single list, in a place you control. Thanks, as always, to our producer, Hilbert Flumingtop, for keeping the audio leaves crunchy. And thanks to Modal, whose serverless GPUs could probably run a million of these little federated models without breaking a sweat.

This has been My Weird Prompts. If you enjoyed thinking about owning your digital self, leave us a review wherever you listen. It helps other self-sovereign data types find the show. Let us know if you start your own Taste Pod.

Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2398: Your Taste, Your Data: Owning Your AI Preferences

The Paradox of Preferences: Why You Can’t Describe Your Perfect Movie

The Case for Portable Taste Profiles

The Roadblocks—and Blueprints

Mentions

Downloads

You Might Also Like

Featured In

#2398: Your Taste, Your Data: Owning Your AI Preferences