Subscribe
EP 013

Inference Margins, Real Numbers

with CFO, Series B AI infra company

A CFO opens the books on what it actually costs to serve tokens at scale — and why most pricing models are wrong.

  • 01 Cost-per-1k-tokens, audited
  • 02 Why the API mark-up tier is collapsing
  • 03 Reserved capacity vs spot, in practice

Most conversations about AI economics start with training costs. This episode is about the other side of the ledger: inference. My guest is the CFO of a Series B AI infrastructure company, and they brought actual numbers — not projections, not estimates, but audited cost-per-token figures from production workloads.

The picture that emerges is different from what most pitch decks assume. The margin structure of an AI product is more fragile than it looks, and the companies that figure out the unit economics first will have a durable advantage over those still guessing.

The API markup tier is collapsing. The companies that figure out reserved capacity economics first will own the next margin structure.

Key Moments

  • 05:20 — Cost-per-1k-tokens, audited and broken down by component
  • 14:45 — Why the API markup tier is collapsing faster than anyone expected
  • 28:10 — Reserved capacity vs. spot pricing — what the real tradeoffs look like in production
  • 35:30 — The unit economics that make or break a Series A AI startup

Who Should Listen

If you are pricing an AI product or building a financial model for an AI company, this episode will save you months of wrong assumptions. The numbers are real, and they are different from what the public pricing pages suggest.

R
Recorded Feb 9, 2026 · Published Feb 11, 2026

I write to think.
You can read along.

I'm Rijul. I write essays, host a podcast, and build small things on the web — all of it in service of one question: how do we leverage AI in the next decade without giving away what mattered in the last? New work lands here when it's ready. Subscribe and I'll send it once.