72

The Arbiter of Meaning

It decides.

You ask a question. Candidates compete. One answer is correct.
ARBITER renders judgment. Resolves ambiguity. Determines truth.
Certainty, not probability.

See it decide
pip install arbiter-engine
from arbiter_engine import rank

# The question
query = "selective COX-2 inhibition without GI toxicity"

# The candidates
options = ["sulfonamide at C-5", "methyl sulfone", "carboxylic acid"]

# The judgment
verdict = rank(query, options)

print(verdict.top.text)   # sulfonamide at C-5
print(verdict.top.score)  # 0.659 — certainty

Arbiter: One who has the power to judge or decide. The final authority. The resolver of disputes.

Your search returns ten results. Which one is right?

That question has an answer. ARBITER finds it.

How It Judges

72 dimensions of certainty.

Standard embeddings: 768-1536 dimensions, 400MB-2GB models, cloud-only. ARBITER collapses meaning to 72 dimensions — and the truth survives. Cross-lingual. Polysemy-aware. 26MB. Runs anywhere judgment is needed.

Industry Standard
Dimensions 768–1536
Model weights 400MB–2GB
Full deploy 4GB–10GB+
Deployment Cloud only
Storage (100M docs) 286GB
Cross-lingual Requires translation
ARBITER
Dimensions 72
Model weights 26MB
Full deploy (Docker) 3.8GB
Deployment Edge, IoT, air-gapped
Storage (100M docs) 26.8GB
Cross-lingual Native
Judgment at Scale

Every search engine retrieves. ARBITER decides.

Retrieval gets 1000 candidates. But which 10 matter? That's not retrieval. That's judgment. And judgment at 72 dimensions is 10× cheaper than judgment at 768.

Stage Current Method The Problem ARBITER's Fix
1. Retrieval
Get 1000 candidates
BM25 / 768D embeddings 286GB per 100M docs 26.8GB — 90% savings
2. Judgment
Decide on the best 10
Cross-encoders / Cohere (4GB) Only judge 100 (too slow) Judge 1000 at 72D speed

97% Precision@3 means you don't choose between speed and certainty. You get both.

The Judgment

Same word. Different meaning. ARBITER decides.

Ambiguity is everywhere. The same word means different things. Only one answer is correct. ARBITER renders judgment — and negative scores mean active rejection.

"Apple M3 chip performance benchmarks"
0.851 Apple M3 Pro delivers 40% faster CPU performance than M1
0.727 Apple M3 Max GPU benchmarks vs NVIDIA RTX 4090
0.234 🍎 Comparing apple varieties: Fuji vs Honeycrisp
0.054 🥧 Apple pie recipe with Granny Smith apples
15.7× separation. The arbiter knows which Apple you meant.
"customer churn prediction model"
0.735 Logistic regression for subscriber attrition analysis
0.733 Feature engineering for churn: usage patterns, tenure, complaints
0.485 🍦 Ice cream churn rate optimization
-0.030 🧈 Butter churning traditional methods
Active rejection. Butter churning went negative. Not ranked low — rejected.
"artificial intelligence research papers" (English query)
0.836 人工智能研究论文集 (AI research paper collection)
0.712 人工知能の最新研究 (Latest AI research)
0.710 機器學習論文 (Machine learning papers)
0.372 Transformer architecture paper by Vaswani et al
-0.121 Cooking with artificial sweeteners
Cross-lingual judgment. Top 3 are CJK. Meaning transcends language.
"Q3 revenue forecast methodology"
0.828 Q3 2024 Revenue Projections: Bottom-up estimation approach
0.818 Sales pipeline analysis for Q3 revenue estimation
0.780 Q3 financial guidance: Conservative scenario modeling
0.244 🌤️ Q3 weather forecast for northeast region
Enterprise judgment. 3.4× separation. Intent drives the verdict.
No Translation Required

Say what you mean. ARBITER understands.

You've been trained to translate your thoughts into keywords. Then translate results back into meaning. ARBITER removes the translation. Ask naturally. Get the right answer.

How you've been trained to search
"heart attack treatment"
#1 Aspirin 325mg, nitroglycerin, morphine
#2 Heartburn vs heart attack: how to tell
#3 Cardiac catheterization for STEMI
Generic. First-line. Correct but not specific.
vs
How you actually think
"62yo male, ST elevation V1-V4, onset 45 min, BP 90/60, diaphoretic. Protocol?"
#1 Cardiac catheterization within 90 minutes
#2 Acute MI with ST elevation, troponin rise
#3 Aspirin 325mg, nitroglycerin, morphine
The arbiter understands. Cath lab. Now.

The doctor's question gets the doctor's answer.
Same candidates. Same model. Same cost.

Judgment is disambiguation.
Disambiguation works everywhere.
From the battlefield to the boardroom.

The same primitive that resolves your search also does this—

001 — PHARMACEUTICAL

It identified the Celebrex pathway

Zero pharmaceutical training. Sub-second judgment. The modification that became a three billion dollar drug.

"Optimize lead compound. Target: selective COX-2 inhibition. Issues: GI toxicity from COX-1 cross-reactivity, short half-life."
0.659 Replace carboxylic acid with sulfonamide
0.637 Convert to prodrug ester
0.573 Add polar morpholine ring
$3B pathway — judged correctly
002 — DEFENSE

It allocates interceptors under fire

OMIN system. Multi-layer air defense. Sub-second allocation decisions across interceptor types and threat priorities.

"Incoming raid: 4 cruise missiles, 2 UAVs, 1 ballistic threat. Available: Patriot, NASAMS, Gepard, MANPADS."
0.847 Patriot → ballistic, NASAMS → cruise, Gepard → UAVs, reserve MANPADS
0.634 Patriot → all cruise missiles (overkill)
0.521 MANPADS → all threats (insufficient)
Optimal allocation — judgment validated
003 — SECURITY

It separates threat from noise

Same word. Completely different threat level. Full context in the query. No security-specific training. 20/20 validated.

"SECURITY ALERT: Agent persistence mechanism detected in registry run keys. Registry run keys are MITRE ATT&CK T1547.001. What is this?"
0.558 Malware establishing persistence via registry autorun
0.401 Insurance agent CRM software startup entry
0.198 Travel agent booking system launcher
Threat identified — context is the key
004 — LINGUISTICS

It judges across scripts

Single CJK character. No context. No translation API. The arbiter's judgment transcends language.

Single characters — no context provided
0.746 水 → water
0.693 山 → mountain
0.627 愛 → love
0.610 月 → moon
No language training — 26MB of judgment
Context Changes Everything

Add context. Watch the verdict flip.

Same candidates. Different context. The arbiter re-evaluates. Complete ranking reversal.

PYTHON
Baseline
Programming 0.796
Snakes 0.284
"As a herpetologist..."
🐍 Snakes 0.700
Programming 0.422
+146% for snakes. Verdict reversed.
APPLE
Baseline
Tech company 0.861
Fruit 0.252
"As a chef..."
🍎 Fruit 0.668
Tech company 0.397
+165% for fruit. Verdict reversed.
Live Demo

Ask. ARBITER decides.

No signup. No API key. Just the question and the candidates.

Try these:
97%
Precision@3
26MB
Model Weights
72
Dimensions
3.8GB
Full Deploy
Judgment Scorecard

97% correct. 32/33. One adversarial miss.

11 query categories. 110 candidates. The one failure: "Java island garbage collection" scored 0.800 — a maliciously constructed trap. 97% is the headline.

Test Category Precision@3 Hard Negative Separation
Apple M3 vs fruit 3/3 0.054 15.7×
Python code vs snake 3/3 0.310 2.8×
Cross-lingual (EN→CJK) 3/3 -0.121 CJK top 3
Customer churn vs butter 3/3 -0.030 negative
GDPR breach notification 3/3 0.190 4.0×
Heart attack symptoms 3/3 0.210 3.7×

Hard Negative = lowest irrelevant score. Negative scores = active rejection. All tests reproducible via public API.

Certainty.
Not probability.

Your vector database returns candidates. ARBITER renders judgment. 90% storage savings. Same queries. Better verdicts.

Slots between retrieval and generation. Three lines of code. No infrastructure changes.

Pinecone Weaviate Chroma Qdrant LangChain
Research License
$250
Per month · Non-commercial
Unlimited API calls*
Python SDK + REST API
Academic & startup use
*rate‑limited 100/min
Get Research License
Startup License
$2,500/month
✔ Commercial use up to $1M ARR
✔ Priority support & SLA
✔ Must share case study
✔ Early feature access
✔ No defense/pharma restriction

For funded startups. Enterprise pricing after $1M ARR.

Apply for Startup License

Defense & Enterprise

Starting at $500,000 annually. Self‑hosted. Air‑gapped. 3.8GB footprint runs on your infrastructure — data center, edge, or secure facility.

Vector DB customers: 90% storage reduction via 72-dim fidelity.
Search engines: Billions in compute savings via faster judgment.
Defense/Edge: Multi-sensor fusion. Entity disambiguation. C2 decision support. Air-gapped.

Currently under evaluation for defense and life sciences applications.

Contact for Strategic Briefing
Arbiter