#3271: LLMs as Parsers, Not Calculators

Stop letting LLMs do math. Use them to parse messy text, then let deterministic code handle the numbers.

Featuring

Listen

0:00

Episode Details

Episode ID: MWP-3441
Published: Jun 5
Duration: 30:52
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: large-language-models prompt-engineering model-context-protocol

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Comparing apartment costs sounds simple until you're juggling monthly rent, yearly maintenance fees, realtor costs quoted ex-VAT, and municipal taxes with inconsistent periods. The cognitive load of normalizing all these units — while a realtor waits for your answer — is where the real friction lives.

The core insight is that LLMs are terrible at arithmetic but excellent at parsing messy natural language into structured data. The right architecture separates these concerns completely. A pipeline approach works best: the user types or pastes raw text into a single input field, an LLM with function calling extracts four key variables (rent, realtor fee, maintenance, Arnona) along with their quoted periods, and deterministic code handles all the math — converting yearly figures to monthly, applying VAT to the correct line item, and computing the total.

Function calling (or tool use) is critical here because it constrains the LLM's output to match a precise schema. The model never sees the arithmetic or produces a total. Its only job is faithful transcription: outputting the numbers and periods exactly as the user stated them. Edge cases like "no realtor fee" or "maintenance included" are handled through nullable fields and a notes parameter. The system prompt explicitly instructs the model not to convert between periods or perform any calculations.

This pattern extends well beyond rental math. Any problem involving messy natural language inputs that need reliable numerical outputs — subscription comparisons, international salary analysis, expense tracking — benefits from treating the LLM as a parser rather than a calculator. The MCP connection for saving results to a spreadsheet adds persistence and comparability across viewings.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#3271: LLMs as Parsers, Not Calculators

Daniel sent us this one — he and Hannah are apartment hunting, and he's trying to compare the actual monthly cost of different rentals. The problem is the numbers come in quoting different units. Monthly rent, sure. But the realtor fee might be a fixed amount quoted ex-VAT, so you need to add eighteen percent. Building maintenance might be yearly. Arnona — municipal tax — might be monthly or yearly. Sometimes there's no realtor fee at all. The arithmetic is dead simple. But the input is messy. He wants to build a calculator that uses an LLM to handle that messiness, accessible on mobile and desktop, with the icing on the cake being an MCP connection to save results to a spreadsheet. His question is: what's the right architecture for this that isn't just a general-purpose chatbot?

This is one of those problems where the gap between how clean it looks on paper and how annoying it is in real life is enormous. I've been apartment hunting. You get a WhatsApp message that says "five thousand a month, realtor is four plus VAT, va'ad bayit is twelve hundred a year, Arnona two forty a month." And then the next one says "forty-eight hundred, no realtor, maintenance included, Arnona twenty-four hundred yearly." You're standing there trying to convert everything to monthly in your head while the realtor is asking if you want to put down a deposit.

That's exactly the friction. The math is fourth-grade arithmetic. The cognitive load is all in the normalization — what period is this quoted in, does this include VAT, is this fee even present. By the time you've done three of these in a row your brain is soup.

Let's unpack why this seemingly simple problem is actually a perfect case study for structured LLM use. The core insight is that an LLM is the wrong tool for arithmetic, but it's the right tool for parsing messy natural language and extracting structured data from it. The right architecture is not a chatbot. It's a pipeline.

The prompt asks specifically for something that isn't just a general assistant. And that instinct is correct. If you throw this at a general chatbot, it'll try to do the math itself, and sometimes it'll get it right and sometimes it'll add VAT to the wrong number or forget to annualize the maintenance fee. It's stochastic. You don't want stochastic math on your rent.

The key insight is that we need to separate the LLM's job from the computer's job. Here's how that pipeline works. Stage one: the user types or speaks raw text into a single input field. "Rent is five thousand a month, realtor wants four thousand plus VAT, maintenance is twelve hundred a year, Arnona is twenty-four hundred yearly." Stage two: that text hits an LLM with a system prompt that says, in essence, "extract these four variables and normalize their periods — do not do any arithmetic." The LLM outputs structured JSON. Stage three: deterministic code — JavaScript in the browser, Python on a server, doesn't matter — takes that JSON and does the actual math. Five thousand plus four thousand times one point one eight plus twelve hundred divided by twelve plus twenty-four hundred divided by twelve. And the user sees the monthly total.

The LLM is a parser, not a calculator.

And this is where function calling — or what Anthropic calls tool use — becomes the mechanism. You define a function called something like "compute_monthly_rental_cost" with a JSON schema for its parameters. The schema specifies exactly the fields you need: rent amount, rent period, realtor fee ex-VAT, maintenance amount, maintenance period, Arnona amount, Arnona period. The LLM receives the user's messy text and its job is to populate that schema. It never sees the arithmetic. It never produces a total. It just fills in the blanks.

If a field isn't present in the user's input — no realtor fee, or maintenance is included in rent — the LLM can output null or zero and flag it. That's something a spreadsheet can't do. A spreadsheet can't read "no realtor" and know to set that field to zero.

Let me walk through a concrete example. User types: "Rent is five thousand a month, realtor wants four thousand plus VAT, maintenance is twelve hundred a year, Arnona is twenty-four hundred yearly." The LLM outputs JSON. Rent amount: five thousand. Rent period: monthly. Realtor fee ex-VAT: four thousand. Maintenance amount: twelve hundred. Maintenance period: yearly. Arnona amount: twenty-four hundred. Arnona period: yearly. Then the deterministic code runs. Monthly total is five thousand plus four thousand times one point one eight — that's the realtor fee with VAT — plus twelve hundred divided by twelve plus twenty-four hundred divided by twelve. That's five thousand plus four thousand seven hundred twenty plus one hundred plus two hundred. Total: ten thousand twenty shekels per month.

For comparison, if the same user had typed that into a general chatbot, it might have added VAT to the rent instead of the realtor fee. Or it might have divided the realtor fee by twelve as if it were a recurring cost. These are exactly the kinds of errors LLMs make when they try to do math with context.

The misconception most people have is that LLMs can handle arithmetic reliably. They can't. They're stochastic. They predict tokens. They don't execute deterministic operations. When you ask an LLM to compute something, it's essentially doing pattern-matched estimation. It's usually right, but "usually" is not good enough for your rent. Off-by-one errors in unit conversion, forgetting to annualize, misapplying VAT — these are all documented failure modes.

There's a phrase I like for this. The LLM is the musical equivalent of beige wallpaper when it comes to math. It looks right until you actually pay attention.

That's a perfect way to put it. And the solution is beautifully simple: never let the LLM do math. Use it only for parsing and normalization. Deterministic code handles arithmetic. This separation of concerns is the architectural principle that makes the whole thing reliable.

Let's talk about the actual implementation. The prompt is asking for something accessible on mobile and desktop, quick to build. What's the stack?

The simplest approach is a single-page web app. Plain HTML, CSS, and JavaScript. No framework necessary, though something like Svelte or even just a lightweight reactive library can make the UI nicer. But the core is dead simple: a text area for input, a results card below it, and an API call to an LLM provider in between.

On the LLM provider side, you've got options. OpenAI's function calling API has been the standard pattern for structured extraction since June twenty twenty-three. Anthropic's tool use works the same way. You can even run a local model through Ollama if you want everything to stay on-device.

Let me dig into the prompt engineering specifics, because this is where a lot of people get it wrong. The system prompt needs to do a few things very clearly. First, it tells the model its role: "You extract rental cost information from user messages and output structured JSON. You do not perform arithmetic. You do not compute totals. Your only job is to identify the four variables and their periods." Second, it defines the output schema explicitly, with types and descriptions for each field. Third, it includes examples of edge cases.

The edge cases are where it gets interesting. "No realtor fee." "Maintenance included in rent." "Arnona is bimonthly" — which does happen in some municipalities. The system prompt should show the model how to handle each of these.

Let me sketch what the function definition looks like. The tool is called "extract_rental_costs." Its input schema has these properties. Rent amount: number, required. Rent period: string, enum of monthly, yearly, weekly — required. Realtor fee ex-VAT: number, nullable — because there might not be one. Maintenance amount: number, nullable. Maintenance period: string, same enum, nullable. Arnona amount: number, required. Arnona period: string, required. And then one more field: "notes" — a string for anything the model wants to flag, like "maintenance appears to be included in the quoted rent.

The system prompt gives the model explicit instructions for normalization. "Always output the period as quoted by the user. Do not convert between periods. The application code will handle all period conversions." That's the key instruction that prevents the model from trying to be helpful and dividing a yearly figure by twelve.

You want the model to be a faithful transcriber of what the user said, with the period preserved. The deterministic code then does the normalization. Yearly amounts get divided by twelve. Weekly amounts get multiplied by fifty-two and divided by twelve. VAT gets added to the realtor fee. The code handles all of that.

There's a subtler point here about why function calling specifically is better than just asking the model to output JSON in a chat message. Function calling — tool use — constrains the output to match the schema exactly. You're not hoping the model produces valid JSON. You're not parsing it out of a markdown code block. The API guarantees the output conforms to the schema you defined. That's what makes it production-grade rather than a hack.

This pattern — LLM parses, code computes — applies far beyond rental math. Subscription cost comparisons across different billing cycles. International salary comparisons with different tax treatments. Anywhere the input is messy natural language and the output needs to be a reliable number.

The prompt mentions Israel's VAT rate at eighteen percent. That's been the standard rate since twenty fifteen. Worth noting that if someone builds this for a different country, they'd swap in their local VAT rate in the computation code — not in the LLM prompt. The LLM shouldn't know about VAT rates. That's computation logic.

The LLM's job is to extract "realtor fee is four thousand plus VAT" and output realtor_fee_ex_vat as four thousand. The application code knows that VAT means multiply by one point one eight. If you're in the UK, the code multiplies by one point two. The LLM doesn't need to know.

Let's talk about the mobile experience, because the prompt specifically asks for mobile and desktop access. A single-page web app deployed to something like Vercel or Netlify is inherently mobile-responsive if you style it correctly. Big input field. Big touch targets. Results displayed in a card that's readable on a phone screen.

The UX flow I'd design is: one text area at the top. The user pastes or types whatever the realtor sent them. Below that, a button that says "calculate." The app sends the text to the LLM, gets back the structured JSON, runs the math, and displays a results card. The card shows each line item normalized to monthly, the VAT calculation broken out, and the total at the bottom in a larger font. Below the card, an optional field for "address or viewing notes" and a "save to spreadsheet" button.

That's where MCP comes in. Once we have that clean pipeline, the next question is how to make the results persistent and comparable. That's where MCP comes in.

MCP — the Model Context Protocol — was open-sourced by Anthropic in November twenty twenty-four and has since been adopted by OpenAI and Google. It's essentially a standardized way for LLMs to interact with external tools and data sources. In this context, it lets the user say "save this for the viewing at 123 Main Street" and have the app append a row to a Google Sheet.

The way it works is you define a tool — in this case, something like "append_rental_row" — with an input schema that includes columns for address, rent, realtor fee, maintenance, Arnona, total monthly, and notes. When the user says "save this," the LLM receives that tool definition and can call it with the structured data from the previous calculation.

Here's the clever part. The LLM doesn't need to re-extract the numbers from the user's save command. The application state already has the computed results. The LLM's job at the save step is just to extract the address and any notes from the user's natural language — "for the viewing on Ben Yehuda Street tomorrow at three PM" becomes address "Ben Yehuda Street" and notes "viewing tomorrow at 3pm" — and then the application fills in the financial fields from its own state.

The MCP tool definition might look like this. " Description: "Append a row to the rental comparison spreadsheet." Input schema: an object with properties for address, total monthly, rent, realtor fee, maintenance, Arnona, and notes. The financial fields are populated by the app from the calculation state. The LLM only needs to extract address and notes from the user's save command.

And the MCP server — which is just a small piece of middleware — receives that tool call and uses the Google Sheets API to append the row. Or it writes to a local CSV file. Or it sends a webhook to Airtable. The protocol abstracts the destination.

There's a misconception worth addressing here. Some people think you need a full app with a database to save rental comparisons. You don't. MCP plus Google Sheets or a simple CSV file is sufficient for a single-user comparison tool. You're not building Airbnb. You're building something for yourself and maybe your spouse to use while apartment hunting. A spreadsheet is actually the ideal persistence layer — it's already a comparison tool.

You can view and sort and filter the spreadsheet on your phone. It's the perfect backend for a personal productivity tool. MCP just bridges the gap between the natural language interface and the structured storage.

Let's talk about the privacy angle, because rental data is financial data. You're sharing how much you can afford, where you're looking, your budget. You probably don't want that going to a third-party API if you can avoid it.

This is where running a local LLM becomes compelling. If you use something like Llama three through Ollama on your laptop, the parsing step happens entirely on your device. The arithmetic is client-side JavaScript. The only thing that needs network access is the MCP spreadsheet sync — and even that could be a local CSV file if you don't need cloud access.

For mobile access, you could run the parsing through a privacy-respecting API with a good data usage policy, or you could host a small LLM on a cheap VPS that you control. The point is you have options. The architecture doesn't force you to send your rental data to OpenAI.

Let me talk about what the minimal viable version looks like, because I think the prompt is really asking "what can I build in an afternoon." Start without MCP. Build the calculator first. One HTML file. A text area. A script that calls the LLM API with a function definition, gets back the structured JSON, runs the math, and displays the result. That's maybe two hundred lines of code, most of it boilerplate. You can deploy it to Vercel or Netlify in five minutes and it works on your phone immediately.

The prompt engineering is the part that takes the most iteration, but even that is straightforward. You write the system prompt, you test it with a dozen variations of how a realtor might phrase things, you tweak the edge case handling. The function calling API guarantees valid JSON, so you're not debugging parsing errors.

Here's a concrete system prompt that would work. "You are a rental cost extractor. Given a user message about rental costs, extract the following fields and output them as structured JSON via the provided function. Do not perform any arithmetic. Do not compute totals. Do not convert between time periods. Simply identify and extract the values as stated." Then you list the fields with descriptions and examples.

You include a few-shot example in the prompt. User says: "The place on Dizengoff is forty-five hundred a month, no realtor, va'ad bayit is nine hundred a year, Arnona is eighteen hundred yearly." Expected output: rent amount forty-five hundred, rent period monthly, realtor fee ex-VAT null, maintenance amount nine hundred, maintenance period yearly, Arnona amount eighteen hundred, Arnona period yearly. That gives the model a template.

The few-shot examples are crucial for edge cases. Show it how to handle "no realtor." Show it how to handle "maintenance included." Show it how to handle "realtor is one month's rent plus VAT" — which is actually a common phrasing in Israel. The model needs to know that "one month's rent" means the rent amount from the same input, not a number to hallucinate.

That's a great edge case. "Realtor wants one month's rent plus VAT." The LLM needs to output the realtor fee as the same number as the monthly rent. That's extraction, not computation — it's identifying that "one month's rent" refers to the five thousand stated earlier. But it requires the model to do a bit of coreference resolution, which is exactly what LLMs are good at.

And this is why a general spreadsheet fails. You can't type "one month's rent" into a spreadsheet cell and have it resolve to the value in another cell. The LLM handles the linguistic reasoning, and then the deterministic code handles the math. Each does what it's good at.

Let's go back to the MCP integration for a moment, because it's the part that turns this from a calculator into a workflow. Once you've got the core calculator working, adding MCP is maybe another hundred lines of code. You define a tool, you set up an MCP server — there are open-source implementations for Google Sheets — and you add a "save" button to the UI.

The UX flow I'd build: user calculates the monthly cost. Below the result, there's a small text field labeled "address or notes" and a "save to spreadsheet" button. The user types "Ben Yehuda 42, viewing Thursday 3pm" and hits save. The app sends a request to the LLM with the save tool defined, the user's note text, and the current calculation state in the context. The LLM calls the tool with the address extracted and the financial fields copied from state. The MCP server appends the row.

Now you've got a spreadsheet with a row for every apartment you've viewed. Columns for address, rent, realtor fee, maintenance, Arnona, total monthly, notes. You can sort by total. You can add a column for your subjective rating. You can share it with your spouse. It's a comparison tool that grew organically from a calculator.

The broader pattern here is what I find really exciting. MCP turns a one-shot query into a persistent workflow without building a full backend. You don't need a database. You don't need user authentication. You don't need a server. You just need a tool definition and something that implements the tool — in this case, a Google Sheets MCP server that someone else already wrote.

This pattern is going to become more common as MCP matures. We're going to see a proliferation of tiny, purpose-built LLM tools that replace spreadsheets for specific workflows. The "app store for prompts," essentially. A calculator for rental costs. A tool for comparing contractor quotes. A subscription cost normalizer. Each one is maybe three hundred lines of code, does one thing well, and uses MCP to persist results.

The misconception some people have is that this is too niche to be worth building. But the pattern — LLM parses messy input, code computes output — applies to dozens of everyday financial calculations. Once you've built one, you can adapt it to others in an hour.

The other misconception is that you need to be a serious developer to build this. You don't. If you can write a basic HTML page and make an API call, you can build this. The LLM handles the hard part — understanding messy human language. The math is arithmetic. The deployment is copying a file to Netlify.

Let me address one more technical detail. When you're using function calling with OpenAI or tool use with Anthropic, you have a choice about how to handle the response. The simplest approach is to set the tool as required — meaning the model must call it. This guarantees you always get structured output. The alternative is to let the model decide whether to call the tool, which is useful if the user might say something that isn't a rental cost input. But for this use case, you probably want to require the tool call.

If the user says something that genuinely can't be parsed — "hey, what's the weather like" — the model will do its best, and you'll get a JSON object with default or null values. Your application code should check that at least the rent amount is present and reasonable before computing a total.

You want some basic validation on the application side. If rent amount is null or zero or negative, show an error message: "I couldn't find a rent amount in that text. Can you rephrase?" Don't just silently compute nonsense.

What does this mean for you, the listener, who might want to build something similar this weekend? Let's get concrete.

Here's the actionable blueprint. Step one: create a single HTML file with a text area, a button, and a results div. Style it with basic CSS so it looks decent on mobile. Step two: sign up for a free API key from OpenAI or Anthropic — both give you credits to start. Step three: write the system prompt and function definition we've been describing. Step four: wire up the button to send the text area content to the API, receive the structured JSON, run the math in JavaScript, and display the result. Step five: deploy to Vercel or Netlify — drag and drop the file, you get a URL. That's your calculator. It works on your phone. It works on your laptop. You built it in an afternoon.

Step six, optional but recommended: add MCP integration. Find an open-source MCP server for Google Sheets. Define the append_rental_row tool. Add a save button and a notes field to your UI. Now you've got a persistent comparison tool.

The entire thing is maybe three hundred to five hundred lines of code, most of which you're copying from documentation examples. The hard part is the prompt engineering, and we've basically given you the prompt.

The core architectural principle to remember — and this applies far beyond rental math — is never let the LLM do math. Use it only for parsing and normalization. Deterministic code handles arithmetic. That separation of concerns is the insight that makes the whole thing reliable.

MCP is the missing link for personal productivity tools. It turns a one-shot query into a persistent workflow without building a full backend. You don't need a database. You don't need a server. You just need a tool definition and something that implements the tool.

Build the calculator first without MCP. Get the parsing and math working reliably. Test it with real messages from realtors. Once the core works, add spreadsheet logging. The user's real need is the monthly total — the spreadsheet is a bonus.

Test edge cases aggressively. "Rent is forty-five hundred, everything included." "Realtor is one month plus VAT, no maintenance fee." "Arnona is paid every two months." The system prompt needs to handle all of these gracefully, and the only way to get that right is to test with real-world phrasing.

The other thing worth mentioning is that this pattern is inherently extensible. Once you've built the rental calculator, you can adapt it to compare contractor quotes — "labor is two thousand a day for five days, materials are eight thousand including VAT, disposal fee is five hundred.

Or subscription cost comparisons. "Netflix is fifteen ninety-nine a month, HBO is one forty-nine a year, Apple TV is ninety-nine a month but free for the first three months." The LLM extracts the amounts and periods, the code normalizes everything to monthly or yearly, and you get an honest comparison.

International salary comparisons too. "Offer A is a hundred twenty thousand dollars in New York, Offer B is ninety thousand euros in Berlin, Offer C is fifteen million yen in Tokyo." The LLM extracts the numbers and currencies, the code does the conversion and tax estimation. Again, same pattern.

The pattern is universal: messy natural language in, structured data out, deterministic computation on the structured data. The LLM handles the mess. The code handles the math. Neither does the other's job.

For the prompt's specific question — what's the recommendation for something that isn't a general assistant — the answer is a structured extraction pipeline with function calling. The LLM is a parser and normalizer. The math is JavaScript. The UI is a single-page web app. The persistence is MCP plus Google Sheets. Accessible on mobile and desktop by virtue of being a web page.

You can build it in an afternoon. That's the part I want to emphasize. This isn't a weekend project. It's an afternoon project. The barrier to entry for building purpose-specific LLM tools has dropped to nearly zero.

There's an open question worth sitting with. What other quotidian financial calculations suffer from the same unit-mixing problem? We mentioned contractor quotes and subscriptions and salaries. But I bet every listener has at least one calculation they do regularly where the inputs are always quoted in incompatible units and they're doing mental normalization every time.

The interesting ones are where the mismatch isn't just time periods. It could be currencies. It could be tax treatments — ex-VAT versus including VAT, pre-tax versus post-tax. It could be bundled versus unbundled — "maintenance included" versus "maintenance separate." Each of these requires the LLM to understand what's being said and the code to apply the right normalization.

The more I think about it, the more I think the "app store for prompts" idea is the natural endpoint. Not giant monolithic AI assistants that try to do everything, but hundreds of tiny, focused tools that each do one calculation perfectly. You don't ask a general chatbot to compare rental costs. You open your rental calculator, paste the text, and get a reliable answer in two seconds.

The giant AI assistants will still exist for open-ended tasks. But for anything where the output is a number you're going to act on — your rent, your budget, your salary comparison — you want determinism. You want the same input to produce the same output every time. And that means separating parsing from computation.

The listener who sent this in — try building this. The blueprint is on the table. Three stages: parse, normalize, compute. One HTML file. An API key you can get for free. Deploy in five minutes. Share your version with us, or send your own weird prompt at myweirdprompts.

If you're not a coder, the architectural principle is still worth understanding. When you use a general chatbot for financial calculations, you are trusting a stochastic system with deterministic math. That's a category error. The right approach is to let the LLM do what it's good at — understanding language — and let code do what it's good at — computing numbers.

And now: Hilbert's daily fun fact.

Now: Hilbert's daily fun fact.

Hilbert: In the eighteen-tens, British colonial administrators in India attempted to ban kabaddi after observing a variant in what is now Turkmenistan where raiders would hold their breath while chanting — they mistakenly believed the practice induced a trance state that made players immune to pain, which they classified as a public order risk. The ban was never enforced because local officers couldn't distinguish kabaddi from children's playground games.

The British Empire, defeated by the ambiguity of children's games.

They really did have a knack for banning things they didn't understand.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop. If you build this rental calculator — or adapt the pattern to your own messy financial calculation — we'd love to hear about it. You can find us at myweirdprompts.com, or share your own weird prompt while you're there. If you enjoyed the episode, leave us a review — it helps other curious listeners find the show. Until next time.

Until next time.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#3271: LLMs as Parsers, Not Calculators

Downloads

You Might Also Like

#3271: LLMs as Parsers, Not Calculators