Subscribe
EP 012

Building Agents That Almost Work

with Founder, autonomous-agent startup

Honest lessons from six months of building agents that mostly did not work — and the substrate that emerged underneath.

  • 01 Why retrieval is the actual product
  • 02 The eval harness that finally caught regressions
  • 03 What to deprecate from your 2024 stack

My guest is the founder of an autonomous-agent startup. They have been building agents full-time for six months, and the conversation is refreshingly honest about what does not work. The punchline: the agents are not ready, but the infrastructure they forced the team to build — the retrieval layer, the eval harness, the routing logic — is the actual product.

We set out to build an autonomous agent. We ended up building an eval harness with a chatbot on top. That turned out to be the right product.

Key Moments

  • 06:30 — Why retrieval is the actual product, not the agent
  • 18:45 — The eval harness that finally caught regressions before users did
  • 31:20 — What to deprecate from your 2024 agent stack
  • 39:00 — The substrate thesis: build the rails, not the train

This episode pairs well with the “Agents Aren’t Ready” essay. The founder independently arrived at the same substrate thesis from a completely different starting point, which made me more confident it is directionally correct.

The most important takeaway: if you are building an agent product right now, spend 80% of your engineering time on eval and retrieval, and 20% on the agent itself. Most teams have this ratio inverted.

R
Recorded Jan 26, 2026 · Published Jan 28, 2026

I write to think.
You can read along.

I'm Rijul. I write essays, host a podcast, and build small things on the web — all of it in service of one question: how do we leverage AI in the next decade without giving away what mattered in the last? New work lands here when it's ready. Subscribe and I'll send it once.