Subscribe
Public · v0.4

Atlas

A reference architecture for retrieval-grounded agents that actually ship to production — includes the eval harness, the routing layer, and the parts I would not build again.

TypeScriptpgvectorModalAnthropic

Atlas started as a question: what does a production-grade agent architecture actually look like when you strip away the demo magic? After six months of building, breaking, and rebuilding autonomous agents, the answer turned out to be less about the agent and more about everything around it.

The repository is organized around three layers: retrieval, routing, and evaluation. Most agent frameworks hand-wave the first and ignore the third. That is backwards. The retrieval layer is the product. The eval harness is what keeps the product honest. The agent itself is the least interesting part.

The agent is the least interesting part of an agent architecture. The eval harness is where the actual engineering lives.

The Three Layers

The retrieval layer uses pgvector for semantic search with a hybrid BM25 fallback. The routing layer decides which model to call, how to chunk the context, and when to bail out and ask for human input. The eval harness runs continuously in CI — every commit gets scored against a regression suite of 200+ test cases.

This is not a framework. It is a reference — a record of what actually worked in production, with comments on why, and what I tried first that did not work. The architecture is opinionated because the problems it solves are specific.

What I Would Not Build Again

The first version had a planning layer that generated multi-step plans before execution. It looked impressive in demos. In production, it added latency without improving outcomes — the model was better at deciding the next step given the current state than at planning five steps ahead. We ripped it out in v0.3.

The second thing I would skip is the custom memory system. We built an elaborate episodic memory with vector-indexed recall. It turned out that a simple conversation buffer with aggressive summarization worked just as well for our use cases. The complexity was not free.

Stack

  • TypeScript end-to-end — the type safety pays for itself in a system with this many moving parts
  • pgvector on Supabase — good enough, and the operational overhead is near zero
  • Modal for compute — serverless GPU inference with cold starts under 2 seconds
  • Anthropic Claude as the primary model — the instruction-following quality is meaningfully better for agentic workflows

Atlas v0.4 is public on GitHub. It is opinionated, undercommented, and probably not what you expect. That is the point — it is a record of what actually worked, not what should have worked in theory.

R
Rijul · Last updated Feb 2026

I write to think.
You can read along.

I'm Rijul. I write essays, host a podcast, and build small things on the web — all of it in service of one question: how do we leverage AI in the next decade without giving away what mattered in the last? New work lands here when it's ready. Subscribe and I'll send it once.