Building Agents That Almost Work
with Founder, autonomous-agent startup
Honest lessons from six months of building agents that mostly did not work — and the substrate that emerged underneath.
- 01 Why retrieval is the actual product
- 02 The eval harness that finally caught regressions
- 03 What to deprecate from your 2024 stack
My guest is the founder of an autonomous-agent startup. They have been building agents full-time for six months, and the conversation is refreshingly honest about what does not work. The punchline: the agents are not ready, but the infrastructure they forced the team to build — the retrieval layer, the eval harness, the routing logic — is the actual product.
We set out to build an autonomous agent. We ended up building an eval harness with a chatbot on top. That turned out to be the right product.
Key Moments
- 06:30 — Why retrieval is the actual product, not the agent
- 18:45 — The eval harness that finally caught regressions before users did
- 31:20 — What to deprecate from your 2024 agent stack
- 39:00 — The substrate thesis: build the rails, not the train
Related Reading
This episode pairs well with the “Agents Aren’t Ready” essay. The founder independently arrived at the same substrate thesis from a completely different starting point, which made me more confident it is directionally correct.
The most important takeaway: if you are building an agent product right now, spend 80% of your engineering time on eval and retrieval, and 20% on the agent itself. Most teams have this ratio inverted.