Quorum
Six models argue a question. A seventh judges. The truth is somewhere in the disagreement.
Quorum is a multi-model debate harness. You give it a question — usually something with genuine uncertainty, where reasonable people disagree — and it orchestrates a structured debate between six different language models. A seventh model acts as judge, scoring each argument on logical coherence, evidence use, and novelty.
I built it as a research tool for my own essays. When I am writing about a topic where I have a strong prior, Quorum forces me to encounter the strongest versions of the opposing arguments. It is uncomfortably effective at this.
The most useful thing about a multi-model debate is not the conclusion. It is finding out which argument you cannot refute.
How It Works
- OpenRouter for model access — debate rounds cycle through Claude, GPT-4, Gemini, Llama, Mistral, and Command R
- DSPy for prompt orchestration — each round’s prompts are compiled, not hand-written
- Python for the harness — simple, synchronous, easy to debug
- Structured output for scoring — the judge model returns JSON with per-argument scores and reasoning
What I Learned
The surprising finding from running hundreds of debates is that model diversity matters more than model quality. A debate between six copies of the best model produces less interesting results than a debate between six different models of varying quality. The disagreements are where the signal lives.
Quorum is open source under MIT. It is rough around the edges but functional. Several of the essays on this site — particularly the sovereign AI thesis — were stress-tested through it before publication.