EP 013 42:08 Feb 11, 2026

Inference Margins, Real Numbers

with CFO, Series B AI infra company

A CFO opens the books on what it actually costs to serve tokens at scale — and why most pricing models are wrong.

Show Notes

01 Cost-per-1k-tokens, audited
02 Why the API mark-up tier is collapsing
03 Reserved capacity vs spot, in practice

Most conversations about AI economics start with training costs. This episode is about the other side of the ledger: inference. My guest is the CFO of a Series B AI infrastructure company, and they brought actual numbers — not projections, not estimates, but audited cost-per-token figures from production workloads.

The picture that emerges is different from what most pitch decks assume. The margin structure of an AI product is more fragile than it looks, and the companies that figure out the unit economics first will have a durable advantage over those still guessing.

The API markup tier is collapsing. The companies that figure out reserved capacity economics first will own the next margin structure.

Key Moments

05:20 — Cost-per-1k-tokens, audited and broken down by component
14:45 — Why the API markup tier is collapsing faster than anyone expected
28:10 — Reserved capacity vs. spot pricing — what the real tradeoffs look like in production
35:30 — The unit economics that make or break a Series A AI startup

Who Should Listen

If you are pricing an AI product or building a financial model for an AI company, this episode will save you months of wrong assumptions. The numbers are real, and they are different from what the public pricing pages suggest.

Recorded Feb 9, 2026 · Published Feb 11, 2026

← Previous Episode Ep 012: Building Agents That Almost Work Next Episode → Ep 014: On Sovereign AI — with a Senior Policy Advisor

Inference Margins, Real Numbers

Key Moments

Who Should Listen

I write to think. You can read along.

I write to think.
You can read along.