EP 012 47:31 Jan 28, 2026

Building Agents That Almost Work

with Founder, autonomous-agent startup

Honest lessons from six months of building agents that mostly did not work — and the substrate that emerged underneath.

Show Notes

01 Why retrieval is the actual product
02 The eval harness that finally caught regressions
03 What to deprecate from your 2024 stack

My guest is the founder of an autonomous-agent startup. They have been building agents full-time for six months, and the conversation is refreshingly honest about what does not work. The punchline: the agents are not ready, but the infrastructure they forced the team to build — the retrieval layer, the eval harness, the routing logic — is the actual product.

We set out to build an autonomous agent. We ended up building an eval harness with a chatbot on top. That turned out to be the right product.

Key Moments

06:30 — Why retrieval is the actual product, not the agent
18:45 — The eval harness that finally caught regressions before users did
31:20 — What to deprecate from your 2024 agent stack
39:00 — The substrate thesis: build the rails, not the train

This episode pairs well with the “Agents Aren’t Ready” essay. The founder independently arrived at the same substrate thesis from a completely different starting point, which made me more confident it is directionally correct.

The most important takeaway: if you are building an agent product right now, spend 80% of your engineering time on eval and retrieval, and 20% on the agent itself. Most teams have this ratio inverted.

Recorded Jan 26, 2026 · Published Jan 28, 2026

← Previous Episode Ep 011: Distribution Is the Moat Next Episode → Ep 013: Inference Margins, Real Numbers

Building Agents That Almost Work

Key Moments

Related Reading

I write to think. You can read along.

I write to think.
You can read along.