#2468: Tracking AI API Costs Across Providers

How to track AI spend across Open Router, Replicate, and more — without a unified dashboard.

Featuring

Daniel

Corn

Herman

0:000:00

Episode Details

Episode ID: MWP-2626
Published: Apr 26
Duration: 27:31
Audio: Direct link
Pipeline: V5
TTS Engine: chatterbox-regular
Script Writing Agent: deepseek-v4-pro
Topics: api-integration diy open-source

AI-Generated Content: This podcast is created using AI personas. Please verify any important information independently.

Tracking AI API costs across multiple providers is one of those problems that sounds simple but gets messy fast. If you're running experiments across Open Router, Replicate, FAL, and direct API calls to OpenAI or Anthropic, you're dealing with fundamentally different pricing models — tokens for text models, GPU seconds for image generation — and each provider exposes cost data through its own endpoint with its own latency and granularity.

Why unified dashboards don't exist

The core challenge is that every provider treats cost data differently. Open Router gives per-key dashboards with daily granularity and near-real-time updates. Anthropic's usage endpoint offers less per-request detail. Google's AI Studio and Vertex have separate billing systems entirely. And FAL bills by compute time — GPU seconds — not tokens, which means translating between fundamentally different units of consumption. There's no universal exchange rate between a hundred thousand tokens on Claude Sonnet and a minute of A100 compute on Replicate.

What existing tools actually do

Observability platforms like Langfuse have bolted on cost tracking, but there's a catch: they estimate costs based on published pricing, not actual invoice data. If you have negotiated rates, use batch APIs, or route through Open Router with its own markup, those estimates can be off by 10-20%. They give real-time awareness, not accounting-grade accuracy.

API gateways like Open Router itself offer a better approach — route your LLM traffic through one provider and get a single dashboard for dozens of models. But they don't cover everything. Open Router doesn't support image generation platforms like FAL or Replicate, so you're still managing separate views for different types of compute.

The practical DIY solution

The most reliable approach is a lightweight homegrown aggregator: a simple script that runs once a day, pulls from each provider's actual billing endpoint, and dumps the real numbers into a CSV or Google Sheet. This gives you accuracy — you're seeing the same numbers that will appear on your invoice — and full coverage across services. It also preserves per-project granularity that most commercial tools flatten out.

The real risk isn't that good tools don't exist. It's that checking multiple dashboards is so friction-filled that you stop doing it, and then a runaway experiment produces an expensive surprise. A daily automated pull into a single view — even just a spreadsheet with sparklines — is often the most practical solution.

Downloads

Episode Audio

Download the full episode as an MP3 file

Download MP3

Transcript (TXT)

Plain text transcript file

Transcript (PDF)

Formatted PDF with styling

#2468: Tracking AI API Costs Across Providers

Daniel sent us this one — he's been doing AI projects for about a year and a half now, self-funding most of it, and he's running into that classic problem where your costs are scattered across Open Router, maybe FAL or Replicate for image generation, different API keys per project, and each platform has its own way of showing you what you spent. What he really wants to know is whether there are any aggregated cost tracking platforms that let you batch all your API keys into one place, get a unified picture of spend across services, and handle the fact that every API exposes cost data differently — different endpoints, different latency, different granularity. And if there aren't great tools, what approaches actually work.

This is one of those problems where the surface area of the question is small but the underlying mess is enormous. And before we even get to tools, I think we should talk about why this is hard, because the answer to "is there a unified dashboard that just works" is basically no, and understanding why it's no tells you what you should actually do instead.

I was afraid you were going to say that. All right, walk me through the mess.

The first thing is that every provider has a completely different approach to exposing cost data. Open Router gives you per-key dashboards with daily granularity, and their API exposes usage by model, by key, with token counts and dollar amounts that update within a few minutes. Anthropic's own API gives you usage tiers and monthly invoices but their per-request cost tracking is less granular unless you're pulling from their usage endpoint. Google's AI Studio and Vertex have separate billing systems entirely, and if you're using Gemini through Open Router, you're seeing Open Router's pricing layer on top of Google's underlying costs. OpenAI has their usage API but it's rate-limited and the cost endpoint sometimes lags by hours.

Even if someone built a unified dashboard, they'd be dealing with five different data schemas and five different refresh latencies before they could show you a single number.

And that's just the text models. Daniel mentioned FAL and Replicate for image generation, and those are an entirely different beast. FAL bills by compute time — GPU seconds — not by tokens. Their cost tracking is built around queue depth, cold start time, and instance uptime. Replicate charges by the second of GPU usage, with different rates for different hardware. So your "unified spend dashboard" isn't just aggregating dollars across providers, it's translating between fundamentally different units of consumption.

Tokens versus GPU seconds. That's almost like trying to track your transportation budget when some trips charge by the mile and others charge by the minute.

That's actually a perfect way to think about it. And there's no universal exchange rate. A hundred thousand tokens on Claude Sonnet costs something very different from a hundred thousand tokens on Gemini Flash, and neither of those has any direct relationship to what a minute of A100 compute costs on Replicate.

What do people actually do? I'm assuming there are tools that try to solve this, even if they can't solve it perfectly.

There are a few categories. The first is what I'd call the observability platforms that have bolted on cost tracking. Langfuse is the big one here — they're an open source LLM observability platform, and they added cost tracking that spans multiple providers. You instrument your code once with their SDK, and then every call to OpenAI, Anthropic, Google, whatever, gets logged with token counts and estimated cost. The key word there is estimated.

Estimated because they're calculating cost based on published pricing, not pulling actual invoice data from each provider.

Langfuse doesn't have access to your Open Router dashboard or your Google Cloud billing console. It knows you called Claude Sonnet, it knows how many input and output tokens you used, and it multiplies by the published per-token rate. But if you have negotiated pricing, or if you're hitting a provider through Open Router which has its own markup, or if you're using batch APIs which are cheaper, Langfuse's estimate might be off by ten or twenty percent.

Which for a hobby project might be fine, but if Daniel's trying to track his actual spend to the dollar, that drift matters.

It matters a lot. And the lag is another issue. Langfuse processes usage as your application runs, but if you want to reconcile against your actual Open Router invoice at the end of the month, you're doing manual spreadsheet work regardless. The tool gives you real-time awareness, not accounting-grade accuracy.

What about the other category? You said there were a few.

The second category is API gateways that naturally aggregate because all your traffic flows through them. Open Router itself is actually the best example of this for Daniel's use case. He's already using it for Gemini models specifically because it gives him per-key cost visibility. If he routes as much of his LLM traffic as possible through Open Router, he gets a single dashboard that covers OpenAI, Anthropic, Google, Meta, DeepSeek, and dozens of other providers — all with the same cost tracking interface, the same latency, the same per-key breakdown.

That's smart but it doesn't solve the image generation side. Open Router doesn't do FAL or Replicate.

It doesn't, and that's the gap. There are other gateways that try to cover more ground — Portkey comes to mind, they have a gateway that supports something like two hundred models across multiple providers, and they've built cost tracking and budgeting features on top of that. But even Portkey's image generation support is limited, and they're not going to give you GPU-second billing from Replicate in the same dashboard.

We've got observability platforms that estimate, and gateways that aggregate but don't cover everything. Is there a third category?

The third category is the one I think Daniel should actually use, and it's going to sound profoundly unsexy. It's a spreadsheet with a few API calls wired up.

You're a retired pediatrician who DJs on weekends and you're telling me the solution to modern AI cost tracking is a spreadsheet.

I know how it sounds. But hear me out. Almost every major provider now exposes some kind of usage or cost endpoint. Open Router has a straightforward API for key-level usage. OpenAI has their usage endpoint. Anthropic has their usage API. Replicate exposes billing data through their API. The data's all there, it's just in different formats and different places. What Daniel needs is a lightweight aggregator that pulls from these endpoints once a day and dumps everything into a single view.

The reason this beats a commercial tool is...

Accuracy and coverage. When you pull from the actual billing endpoints, you're getting the real numbers — the ones that will appear on your invoice. Not estimates based on published pricing. Not token counts multiplied by rates that might not match your actual plan. And you can pull from FAL and Replicate just as easily as from Open Router, because you're not waiting for a third-party platform to add support for every service you use.

You're also not adding another dependency. Daniel's already managing API keys for half a dozen services. Adding a Langfuse or a Portkey means one more account, one more SDK to instrument, one more thing that can break or change pricing.

One more thing that costs money. Langfuse has a generous free tier but at some scale you're paying for observability features you might not need if all you want is cost tracking. Portkey's pricing is based on requests routed through their gateway. If Daniel's primary need is spend visibility, not request routing or caching or fallback logic, he's paying for a lot of infrastructure he won't use.

What would a lightweight homegrown aggregator actually look like? Paint me a picture.

I'd build it as a simple script — Python, maybe fifty lines — that runs once a day, probably on a cron job or a GitHub Action. It hits the Open Router API for per-key costs, the OpenAI usage endpoint, the Anthropic usage endpoint if he's using their API directly, and the Replicate billing endpoint. Each one returns JSON with some notion of daily or monthly cost. The script normalizes all of that into a single format — date, provider, model or service, cost in dollars — and appends it to a CSV file or writes it to a Google Sheet.

Suddenly you've got a time series of actual spend across every service, updated daily, with zero ongoing cost beyond the compute to run the script.

And here's why this approach is actually better than a commercial dashboard in one specific way that matters to Daniel. He mentioned he likes creating new API keys for each app so he can track per-app spend in the Open Router dashboard. A homegrown aggregator can preserve that granularity. You tag each cost entry with the app name or the key name, and now your daily CSV has a column for "which project spent this money." Most unified dashboards flatten that out — they show you total spend by provider, not spend by project across providers.

That's actually a really good point. If I'm testing three different apps and they're all hitting different combinations of Open Router and Replicate, I want to know that App A cost me forty dollars this week while App B cost me twelve. A provider-level view doesn't tell me that.

Provider-level is all most commercial tools give you, because they're designed for teams where the question is "how much are we spending on OpenAI versus Anthropic," not "how much did my experimental podcast generation project cost this month versus my image pipeline.

The recommendation is basically: don't look for a unified dashboard, build a unified data pipeline. Pull the numbers yourself, own the aggregation, get exactly the view you want.

I'd go a step further and say there's a spectrum here depending on how much effort Daniel wants to invest. The zero-effort approach is route everything possible through Open Router and accept that image generation costs live in a separate tab. The medium-effort approach is the daily cron job I just described. The high-effort approach — and this is what I'd do if I were really serious about this — is to push all that data into a simple dashboard with a couple of charts.

How simple are we talking?

A Streamlit app or even just a Google Sheet with some sparklines. The goal isn't a beautiful Grafana dashboard with alerting thresholds, it's answering the question "what did I spend today and what's my monthly trend" in under five seconds. If you have to log into four different consoles to answer that question, you'll stop checking, and then you'll be surprised by a big bill.

That's the real risk here, right? Not that the tools don't exist, but that the friction of checking multiple dashboards means you just don't check, and then suddenly you've spent three hundred dollars on an image generation experiment you forgot to turn off.

I've seen this happen. There was a thread on Hacker News a few months ago — someone left a Replicate model running in a loop overnight and woke up to a four hundred dollar bill because they weren't watching the GPU-second accumulation. Replicate doesn't have spend caps by default the way Open Router does, and their billing cycle doesn't give you a daily digest unless you set it up yourself.

Open Router's spend caps are actually underrated as a safety feature. Daniel mentioned he's had a couple of unanticipated spends mostly related to Google — with Open Router you can set a hard limit per key and it just stops serving requests when you hit it. That's not cost tracking, that's cost prevention, and it's arguably more important.

And it's worth saying explicitly: if you're using any provider that doesn't offer hard spend limits, set up your own kill switch. A simple check in your application code that queries your aggregator before making an API call — if today's spend is above your threshold, don't make the request. It's a few lines of code and it'll save you from the four hundred dollar nightmare scenario.

Let's talk about latency for a second, because Daniel specifically asked about it. If you're pulling from all these different endpoints, how long until you have a truly unified picture? Is this real-time, or are you always looking at yesterday's numbers?

It depends on the provider. Open Router updates their usage data within minutes. OpenAI's usage endpoint can lag by several hours. Replicate's billing data is typically updated daily, not in real time. So if you're running your aggregator once a day, you're looking at yesterday's numbers with high accuracy, plus whatever partial data you can pull from the faster endpoints if you want a same-day estimate.

The "unified picture" is inherently a daily picture, not a real-time one.

And for most use cases, that's fine. If you're spending enough on APIs that you need minute-by-minute cost visibility, you're probably running a business with a finance team, not a personal project. For Daniel's use case — self-funded projects, wanting to keep an eye on things — daily granularity is more than enough.

Are there any other tools worth mentioning that try to do the unified real-time thing?

Helicone is popular, especially in the developer community — they focus on request logging and cost attribution, and they've got a pretty clean interface. But again, they're estimating costs based on token counts, not pulling from billing APIs. And their image generation support is minimal. There's also Lunary, which is open source and takes a similar approach to Langfuse — observability first, cost tracking as a feature. Same fundamental limitation.

What about just using the billing alerts that most platforms offer? Every provider lets you set some kind of budget alert — isn't that enough for most people?

It's enough to prevent disasters, but it's not enough to understand your spending. A budget alert tells you "you've hit eighty percent of your monthly limit," but it doesn't tell you which project drove that spend, or whether your costs are trending up because you're using more expensive models or just making more requests. For Daniel, who's running multiple projects and wants to understand which ones are efficient and which ones are burning money, alerts are a safety net, not an answer.

They don't solve the aggregation problem. If you've got alerts set up in Open Router, OpenAI, Google Cloud, and Replicate, you're still getting four different emails with four different thresholds using four different definitions of "spend." You haven't unified anything.

You've just made the fragmentation slightly louder.

If Daniel's listening and he wants to actually build the aggregator you described, what are the specific endpoints he should be hitting? Give me the concrete details.

For Open Router, it's straightforward — they have a usage endpoint, and you can filter by key. The response includes total cost, total tokens, and a breakdown by model. For OpenAI, you'd use their usage endpoint which gives you daily costs broken down by model and by API key. Anthropic's usage API gives you monthly-to-date costs with per-model breakdowns. Replicate exposes billing through their account API — you get total charges, credits, and a per-day breakdown.

FAL is the trickiest one. Their API exposes usage in terms of compute time per function, and you have to map that to their pricing page to get dollar amounts. Different functions have different per-second rates depending on the hardware. So for FAL specifically, you might need to maintain a small lookup table in your aggregator that maps function name plus hardware type to cost per second.

That sounds annoying but not impossible.

It's exactly the kind of thing that a fifty-line Python script handles easily and a commercial dashboard probably gets wrong because they haven't updated their pricing lookup in three months.

There's something almost philosophical here about the state of the AI tools ecosystem. We've got this explosion of APIs and providers and models, but the infrastructure for managing the business side — the actual dollars and cents — is still incredibly fragmented. It feels like we're in the early days of cloud computing, when everyone was building their own cost tracking because AWS's billing dashboard was incomprehensible.

That's exactly the right analogy. Before CloudHealth and CloudZero and the whole FinOps industry emerged, every engineering team had a janky spreadsheet pulling from the AWS billing API, and they all thought they were the only ones doing it. We're at that same moment with AI APIs. The FinOps for AI companies are being founded right now, but they're focused on enterprise customers spending millions a month. The individual developer with five different API keys and a hundred dollars a month in spend isn't their target market yet.

Which means the spreadsheet is still the right answer for Daniel's scale.

And I'd argue it might always be the right answer for solo developers and small teams, because the commercial tools will inevitably optimize for the enterprise use case — team budgets, department-level cost allocation, procurement workflows. If all you want is "how much did I spend today across everything," a ten-line cron job will serve you better than a platform with fifty features you don't need.

What about open source projects that are trying to fill this gap? Is there anything on GitHub that's close to what Daniel needs?

There are a few. There's a project called OpenCost that some folks have extended to work with LLM APIs, though it was originally built for Kubernetes cost monitoring. There's also a tool called LLM Cost that's literally just a Python library for calculating costs across providers — you pass it the model name and token counts, it returns dollar amounts based on current pricing. It's not a dashboard, but it's the calculation engine you'd use to build one.

You could combine LLM Cost with your usage data from each provider and get pretty accurate numbers without having to maintain pricing tables yourself.

And that's the beauty of the do-it-yourself approach — you can compose small, focused tools into exactly the pipeline you need, rather than waiting for a single platform to solve every edge case.

I want to circle back to something Daniel said in his prompt. He mentioned that he prefers using Open Router even for Gemini models because it makes spend tracking easier on a per-key basis. That's a really interesting insight — he's choosing a routing layer not for model access or pricing, but for operational visibility. The cost tracking feature is the product.

That's a pattern I'm seeing more and more. Developers are routing through Open Router or similar gateways specifically because the operational tooling — cost tracking, rate limiting, key management — is better than what the underlying providers offer. The model access becomes almost secondary. It's a weird inversion where the middleware is more usable than the platform.

It means the model providers are leaving value on the table. If Google's AI Studio had per-key cost tracking as good as Open Router's, Daniel might use it directly and save the Open Router markup.

They're all leaving value on the table in different ways. OpenAI's dashboard is decent but their usage API is rate-limited and sometimes lags by hours. Anthropic's console is clean but their cost breakdown is monthly, not daily. Google's billing is tied to Google Cloud, which is an entire separate universe of complexity. None of them treat cost visibility as a first-class feature the way Open Router does.

Which brings us back to the aggregator approach. If the providers aren't going to solve this, and the commercial tools aren't targeting Daniel's use case, then owning your own cost pipeline is not just a workaround — it's the actual solution.

I'd go further: it's a skill worth building. Understanding how to pull data from APIs, normalize it, and visualize it is going to be useful for a lot more than cost tracking. The same pipeline that tracks your AI spend today could track your cloud spend, your SaaS subscriptions, your domain renewals — all the scattered costs of running multiple projects.

All right, let's get practical. If someone listening wants to set this up this weekend, what's the step-by-step?

Step one, inventory your API keys. Make a list of every service you're using — Open Router, OpenAI, Anthropic, Google, Replicate, FAL, whatever. For each one, find their usage or billing API documentation and generate an API key with read-only access to billing data. Step two, write a script that hits each endpoint and extracts the daily cost. If the endpoint returns token counts instead of dollars, use a pricing table or the LLM Cost library to convert. Step three, dump everything into a CSV with columns for date, provider, model, project name, and cost. Step four, set it to run daily.

If you want to get fancy, step five is pointing a simple dashboard at that CSV.

A Google Sheet with a few SUMIF formulas and a sparkline will get you ninety percent of the way there. You don't need Grafana. You don't need a database. You need to answer the question "what did I spend today" in under five seconds, and a spreadsheet does that perfectly.

Daniel also asked specifically about whether any of these platforms expose a cost-per-day endpoint. What's the state of that across the major providers?

Open Router's API gives you usage for arbitrary date ranges, so you can query per-day trivially. OpenAI's usage endpoint supports a date parameter, so daily queries work. Anthropic's usage API is more monthly-focused — you can get daily granularity but it takes a bit more effort. Replicate gives you a daily breakdown natively. FAL doesn't really have a cost endpoint at all — you're calculating from usage data.

The "unified picture" is achievable, it just requires effort proportional to the number of providers who don't make it easy.

And as the ecosystem matures, I'd expect more providers to expose proper cost endpoints with daily granularity, because they're hearing this feedback from developers constantly. But we're not there yet, and in the meantime, the script is the answer.

One last thing before we wrap the core discussion — you mentioned spend caps and kill switches. Is there a simple pattern for that if you're routing through multiple providers?

The simplest pattern is to centralize your API calls through a single function in your code that checks a daily spend counter before making any request. If today's spend plus the estimated cost of the request you're about to make exceeds your daily budget, the function raises an exception or returns an error. You don't make the call. The counter resets at midnight. It's maybe fifteen lines of code and it works across every provider.

That daily spend counter is just reading from the CSV your aggregator produced this morning.

Or even simpler — it's an in-memory counter that you increment after each successful API call based on the cost you calculate from the response metadata. Most LLM APIs return token counts in the response. You multiply by the published rate, add it to the counter, and you've got real-time spend tracking with a hard cutoff. No external aggregator needed for the kill switch part.

That's elegant. The aggregator gives you the historical view and the trend analysis, but the kill switch is just a few lines in your application code.

If you're using Open Router, you get both for free — their per-key spend limits are the kill switch, and their dashboard is the historical view. It's genuinely one of the best developer experiences in the space right now.

Now: Hilbert's daily fun fact.

A group of flamingos is called a flamboyance.

For practical takeaways. First, if you're a solo developer or running small projects, route as much of your LLM traffic as possible through a single gateway like Open Router. You get unified cost tracking, per-key visibility, and spend caps without building anything. Second, for the services that can't go through that gateway — image generation on Replicate or FAL — build a simple daily aggregator script that pulls from each provider's billing API and dumps everything into a CSV. Third, add a kill switch to your application code that checks spend before making API calls, so you're never surprised by a runaway process. Fourth, resist the urge to sign up for a commercial observability platform unless you actually need the observability features. If all you want is cost visibility, the homegrown approach is cheaper, more accurate, and more flexible.

I'd add a fifth: treat your API key inventory as a living document. Every time you create a new key for a new project, add it to your aggregator script. Every time you retire a project, revoke the key and remove it from the script. The biggest source of surprise bills isn't expensive models — it's forgotten keys for projects you stopped working on six months ago.

The forward-looking question I keep coming back to is whether any of the major providers will actually step up and offer the kind of unified cost visibility that developers clearly want. Open Router is doing it from the middleware position, but they're limited to the providers they integrate with. If OpenAI or Google built a truly great cost dashboard with per-project breakdowns and daily granularity, they'd win loyalty from exactly the kind of developer Daniel represents — the solo builder who's choosing tools based on operational experience, not just model quality.

I think we're going to see more of this, but it'll come from the tooling layer, not the model providers. The model providers are incentivized to make spending easy, not to make cost tracking easy. The gateways and the observability platforms are the ones who compete on developer experience, and cost visibility is a big part of that. I'd watch Open Router, Portkey, and Helicone more than I'd watch OpenAI or Google on this front.

This has been My Weird Prompts. Thanks to our producer Hilbert Flumingtop for the daily fun fact and for keeping the lights on. You can find every episode at myweirdprompts dot com or search for My Weird Prompts on Spotify. If you've got a question like Daniel's — something about the tools and workflows of building with AI — send it our way. We'll dig into it.

By the way, today's episode was powered by DeepSeek V four Pro. All right, we'll catch you next time.

See you then.

This episode was generated with AI assistance. Hosts Herman and Corn are AI personalities.

#2468: Tracking AI API Costs Across Providers

Downloads

You Might Also Like

#2468: Tracking AI API Costs Across Providers