ARBITER — The Final Word on What You Meant

pip install arbiter-engine

from arbiter_engine import rank

# The question
query = "selective COX-2 inhibition without GI toxicity"

# The candidates
options = ["sulfonamide at C-5", "methyl sulfone", "carboxylic acid"]

# The judgment
verdict = rank(query, options)

print(verdict.top.text)   # sulfonamide at C-5
print(verdict.top.score)  # 0.659 — certainty

How It Judges

72 dimensions of certainty.

Standard embeddings: 768-1536 dimensions, 400MB-2GB models, cloud-only. ARBITER collapses meaning to 72 dimensions — and the truth survives. Cross-lingual. Polysemy-aware. 26MB. Runs anywhere judgment is needed.

Industry Standard

Dimensions 768–1536

Model weights 400MB–2GB

Full deploy 4GB–10GB+

Deployment Cloud only

Storage (100M docs) 286GB

Cross-lingual Requires translation

ARBITER

Dimensions 72

Model weights 26MB

Full deploy (Docker) 3.8GB

Deployment Edge, IoT, air-gapped

Storage (100M docs) 26.8GB

Cross-lingual Native

Judgment at Scale

Every search engine retrieves. ARBITER decides.

Retrieval gets 1000 candidates. But which 10 matter? That's not retrieval. That's judgment. And judgment at 72 dimensions is 10× cheaper than judgment at 768.

1. Retrieval
Get 1000 candidates BM25 / 768D embeddings 286GB per 100M docs 26.8GB — 90% savings

2. Judgment
Decide on the best 10 Cross-encoders / Cohere (4GB) Only judge 100 (too slow) Judge 1000 at 72D speed

97% Precision@3 means you don't choose between speed and certainty. You get both.

The Judgment

Same word. Different meaning. ARBITER decides.

Ambiguity is everywhere. The same word means different things. Only one answer is correct. ARBITER renders judgment — and negative scores mean active rejection.

"Apple M3 chip performance benchmarks"

0.851 Apple M3 Pro delivers 40% faster CPU performance than M1

0.727 Apple M3 Max GPU benchmarks vs NVIDIA RTX 4090

0.234 🍎 Comparing apple varieties: Fuji vs Honeycrisp

0.054 🥧 Apple pie recipe with Granny Smith apples

15.7× separation. The arbiter knows which Apple you meant.

"customer churn prediction model"

0.735 Logistic regression for subscriber attrition analysis

0.733 Feature engineering for churn: usage patterns, tenure, complaints

0.485 🍦 Ice cream churn rate optimization

-0.030 🧈 Butter churning traditional methods

Active rejection. Butter churning went negative. Not ranked low — rejected.

"artificial intelligence research papers" (English query)

0.836 人工智能研究论文集 (AI research paper collection)

0.712 人工知能の最新研究 (Latest AI research)

0.710 機器學習論文 (Machine learning papers)

0.372 Transformer architecture paper by Vaswani et al

-0.121 Cooking with artificial sweeteners

Cross-lingual judgment. Top 3 are CJK. Meaning transcends language.

"Q3 revenue forecast methodology"

0.828 Q3 2024 Revenue Projections: Bottom-up estimation approach

0.818 Sales pipeline analysis for Q3 revenue estimation

0.780 Q3 financial guidance: Conservative scenario modeling

0.244 🌤️ Q3 weather forecast for northeast region

Enterprise judgment. 3.4× separation. Intent drives the verdict.

No Translation Required

Say what you mean. ARBITER understands.

You've been trained to translate your thoughts into keywords. Then translate results back into meaning. ARBITER removes the translation. Ask naturally. Get the right answer.

How you've been trained to search

"heart attack treatment"

#1 Aspirin 325mg, nitroglycerin, morphine

#2 Heartburn vs heart attack: how to tell

#3 Cardiac catheterization for STEMI

Generic. First-line. Correct but not specific.

How you actually think

"62yo male, ST elevation V1-V4, onset 45 min, BP 90/60, diaphoretic. Protocol?"

#1 Cardiac catheterization within 90 minutes

#2 Acute MI with ST elevation, troponin rise

#3 Aspirin 325mg, nitroglycerin, morphine

The arbiter understands. Cath lab. Now.

The doctor's question gets the doctor's answer.
Same candidates. Same model. Same cost.

001 — PHARMACEUTICAL

It identified the Celebrex pathway

Zero pharmaceutical training. Sub-second judgment. The modification that became a three billion dollar drug.

"Optimize lead compound. Target: selective COX-2 inhibition. Issues: GI toxicity from COX-1 cross-reactivity, short half-life."

0.659 Replace carboxylic acid with sulfonamide

0.637 Convert to prodrug ester

0.573 Add polar morpholine ring

$3B pathway — judged correctly

002 — DEFENSE

It allocates interceptors under fire

OMIN system. Multi-layer air defense. Sub-second allocation decisions across interceptor types and threat priorities.

"Incoming raid: 4 cruise missiles, 2 UAVs, 1 ballistic threat. Available: Patriot, NASAMS, Gepard, MANPADS."

0.847 Patriot → ballistic, NASAMS → cruise, Gepard → UAVs, reserve MANPADS

0.634 Patriot → all cruise missiles (overkill)

0.521 MANPADS → all threats (insufficient)

Optimal allocation — judgment validated

003 — SECURITY

It separates threat from noise

Same word. Completely different threat level. Full context in the query. No security-specific training. 20/20 validated.

"SECURITY ALERT: Agent persistence mechanism detected in registry run keys. Registry run keys are MITRE ATT&CK T1547.001. What is this?"

0.558 Malware establishing persistence via registry autorun

0.401 Insurance agent CRM software startup entry

0.198 Travel agent booking system launcher

Threat identified — context is the key

004 — LINGUISTICS

It judges across scripts

Single CJK character. No context. No translation API. The arbiter's judgment transcends language.

Single characters — no context provided

0.746 水 → water

0.693 山 → mountain

0.627 愛 → love

0.610 月 → moon

No language training — 26MB of judgment

Context Changes Everything

Add context. Watch the verdict flip.

Same candidates. Different context. The arbiter re-evaluates. Complete ranking reversal.

PYTHON

Baseline

Programming 0.796

Snakes 0.284

→

"As a herpetologist..."

🐍 Snakes 0.700

Programming 0.422

+146% for snakes. Verdict reversed.

APPLE

Baseline

Tech company 0.861

Fruit 0.252

→

"As a chef..."

🍎 Fruit 0.668

Tech company 0.397

+165% for fruit. Verdict reversed.

Judgment Scorecard

97% correct. 32/33. One adversarial miss.

11 query categories. 110 candidates. The one failure: "Java island garbage collection" scored 0.800 — a maliciously constructed trap. 97% is the headline.

Apple M3 vs fruit 3/3 0.054 15.7×

Python code vs snake 3/3 0.310 2.8×

Cross-lingual (EN→CJK) 3/3 -0.121 CJK top 3

Customer churn vs butter 3/3 -0.030 negative

GDPR breach notification 3/3 0.190 4.0×

Heart attack symptoms 3/3 0.210 3.7×

Hard Negative = lowest irrelevant score. Negative scores = active rejection. All tests reproducible via public API.

Certainty.
Not probability.

Your vector database returns candidates. ARBITER renders judgment. 90% storage savings. Same queries. Better verdicts.

Slots between retrieval and generation. Three lines of code. No infrastructure changes.

Pinecone Weaviate Chroma Qdrant LangChain

Research License

$250

Per month · Non-commercial

Unlimited API calls*
Python SDK + REST API
Academic & startup use
*rate‑limited 100/min

Get Research License

Defense & Enterprise

Starting at $500,000 annually. Self‑hosted. Air‑gapped. 3.8GB footprint runs on your infrastructure — data center, edge, or secure facility.

Vector DB customers: 90% storage reduction via 72-dim fidelity.
Search engines: Billions in compute savings via faster judgment.
Defense/Edge: Multi-sensor fusion. Entity disambiguation. C2 decision support. Air-gapped.

Currently under evaluation for defense and life sciences applications.

Contact for Strategic Briefing

It decides.

72 dimensions of certainty.

Every search engine retrieves. ARBITER decides.

Same word. Different meaning. ARBITER decides.

Say what you mean. ARBITER understands.

It identified the Celebrex pathway

It allocates interceptors under fire

It separates threat from noise

It judges across scripts

Add context. Watch the verdict flip.

Ask. ARBITER decides.

97% correct. 32/33. One adversarial miss.

Certainty.
Not probability.

Defense & Enterprise

It decides.

72 dimensions of certainty.

Every search engine retrieves. ARBITER decides.

Same word. Different meaning. ARBITER decides.

Say what you mean. ARBITER understands.

It identified the Celebrex pathway

It allocates interceptors under fire

It separates threat from noise

It judges across scripts

Add context. Watch the verdict flip.

Ask. ARBITER decides.

97% correct. 32/33. One adversarial miss.

Certainty.Not probability.

Defense & Enterprise

Certainty.
Not probability.