The Unified Gateway

& Routing System

For AI Models

One API. One key. Access all major AI models. We turn model access into infrastructure — so you can focus on building your product.

All Major Models
OpenAI, Anthropic, Google, DeepSeek & more
One Standard API
Single interface, unified billing
Enterprise-Grade
Backed by Sequoia China (HongShan)

"Commonstack is building the AI infrastructure for the agentic era — making it dramatically easier for developers and enterprises to build AI agents and intelligent applications, while significantly reducing the cost and complexity of using AI."

Everything You Need to Power AI at Scale

Nine core capabilities that turn fragmented AI model access into a unified, reliable, and cost-efficient infrastructure layer.

01

Unified API Gateway

One standard API and one key to access all models — OpenAI, Anthropic, Google, DeepSeek. Eliminates differences in APIs, error formats, and billing structures.

Coming Soon
02

Intelligent Routing

Configure routing policies to automatically select the most reliable, lowest latency, or lowest cost route. Dynamically balance quality, speed, and cost.

03

Automatic Failover

Automatically switches to backup routes on timeouts, rate limits, or outages. Complex routing logic for rate limits and stability fluctuations is handled for you.

04

Continuous Quality Assurance

Regular benchmark evaluations across all platform models with in-depth quality reports — quantifying performance, accuracy, and stability.

05

Unified Management Dashboard

Centralized interface for usage, billing, logs, and quotas. Monitor consumption and limits by organization, user, or API key.

06

Minimal Integration Effort

Fully compatible with the OpenAI SDK format. Existing applications migrate by simply replacing the Base URL and API key — integration completed in minutes.

07

Prompt Caching

Supports upstream provider caching capabilities. Repeated prompts and contexts are billed at lower rates, significantly reducing costs for repetitive workflows.

08

Full Features & Multimodal

Streaming, Function calling, Structured outputs, Vision, Image generation, PDF input, Reasoning — all supported across providers.

09

OpenClaw Integration

Preconfigured toolkits for OpenClaw agents: Web search (Brave), Web scraping (Firecrawl), Voice (ElevenLabs), and more — out of the box.

UncommonRoute
Open Source

A lightning-fast local LLM smart router for Cursor, Claude Code, Codex & OpenAI SDK. Route by difficulty. Refuse habitual waste.

$ pip install uncommon-route
The "Overkill" Problem

Most AI coding tools use a one-size-fits-all strategy — every request, no matter how trivial, goes to the most expensive model. A single developer's monthly bill can balloon 3–5× unnecessarily.

"Design a fault-tolerant distributed database"→ Top model ✓
"Calculate 2 + 2"→ Top model ✗ (wasteful)
A Lightning-Fast Local Judge
INPUT CLIENTS
UNCOMMONROUTE
UPSTREAM APIS
Runs entirely locally — ~0.5ms decision
Does NOT delegate to external model
Transparent — no code changes needed
Works with Cursor, Claude Code, Codex
4 Difficulty Tiers

Every prompt is classified in real time — matched to the right model, not the most expensive one.

SIMPLE

Greetings, short queries, basic translation, simple formatting.

moonshot/kimi-k2.5
MEDIUM

Basic code generation, principle explanation, content summarization.

moonshot/kimi-k2.5
COMPLEX

Multi-constraint design, complex architecture implementation, refactoring.

google/gemini-3.1-pro
REASONING

Mathematical proofs, hardcore logical reasoning, deep debugging.

xai/grok-4-1-fast
COMPUTE & COST INCREASES> > >
5 Routing Strategies

Choose the strategy that fits your needs — or let auto decide for you.

BALANCED MODE

Default choice — intelligently balances cost and quality for every request.

Self-Evolving Router

The AI model landscape changes every few weeks. When a new model drops, how does the router know whether to use it — and for which tasks? The answer is a continuous feedback loop built directly into the UncommonStack platform.

The Challenge

Static routing rules become stale. A model that was "best for complex code" last month may be surpassed by a cheaper alternative today. Without a feedback mechanism, the router can't adapt.

New model released
?Router has no benchmark data for it
Defaults to expensive SOTA model
Our Solution

UncommonStack collects real-world routing feedback from every request on the platform. When a new model appears, it enters a live evaluation pipeline — automatically benchmarked against existing routing decisions. The router's difficulty classifier and model-tier mappings are updated continuously, without manual intervention.

The Feedback Loop
01
User Requests on UncommonStack

Every routed request generates a signal: which model was used, what was the task difficulty, and what was the outcome.

02
RouterBench Evaluation Pipeline

New models are automatically inserted into RouterBench — our 4-phase benchmark system — and scored against the existing model pool.

03
Difficulty Classifier Update

Ground truth routing labels are refreshed. The classifier learns which new model handles which task tier best, at what cost.

04
Router Weights Deployed

Updated routing policies are pushed to UncommonRoute. Users automatically benefit from the new model without changing any code.

Loop repeats with every new model release — zero manual effort
RouterBench
v2.0

A rigorous benchmark pipeline that generates ground-truth routing labels for multi-step agent tasks. It powers both the Self-Evolving Router and our published evaluation results.

4-Phase Pipeline
PHASE 1
Dual SOTA Baseline

Run both GPT-5.4 and Claude Opus 4.6 on each task. Select the cheaper successful trajectory as the baseline.

PHASE 2
Trajectory Router Analysis

A SOTA model acts as a routing judge — analyzing each step of the trajectory and labeling which steps can be downgraded.

PHASE 3
Sequential Lock-in Search

For each downgradable step, test cheaper model tiers in order. Lock in the cheapest tier that still passes. Repeat for all steps.

PHASE 4
Ground Truth Assembly

The optimal mixed-model path becomes the routing supervision signal — one labeled sample per LLM call in the trajectory.

3 Evaluation Metrics
Pass Rate
Passed cases / Total valid cases

Did the routed path complete the task above the target threshold?

Cost Savings Score
100 × Σ save_test / Σ save_gt

How much of the ground-truth cost savings did the router actually capture? Scored 0–100.

Money Saved
baseline.cost − cost_test (USD)

Nominal dollar savings using fixed tier prices — comparable across experiments and time.

Model Pool — 4 Tiers
HIGHOutput price
Claude Opus 4.6$25/1M
GPT-5.4$15/1M
MID-HIGHOutput price
Claude Haiku 4.5$5/1M
Gemini 3 Flash$3/1M
MIDOutput price
MiniMax M2.5$1.15/1M
Qwen3.5-27B$1.56/1M
LOWOutput price
DeepSeek V3.2$0.38/1M
GLM-4.5-Air$0.85/1M
Validated on SWE-bench — 10/10 tasks found cheaper mixed paths, avg 52% cost reduction
Proven With Real Data

Not theoretical — validated across 131 agent coding sessions, 763 handwritten prompts, and 200+ model routing experiments.

0%
API COST SAVINGS

Simulated 131 agent coding sessions vs. using Claude Opus throughout.

0%
ROUTING ACCURACY

763 handwritten prompts, multilingual blind testing.

0%
TASK QUALITY RETENTION

Same output quality at dramatically lower cost.

0%
REAL-WORLD COST SAVINGS

Validated across 200+ model routing experiments.

~0.5ms Routing Latency

Near-zero-perception local ultra-fast decision making. The routing engine runs entirely on your machine — no network round-trip, no API call, no latency overhead.

0.5ms

"Systems thinking beats model maximalism."

Stop paying for the brand. Start paying for the result. Find hidden gems that are fast, well-priced, and highly successful.

Built by World-Class Engineers

A team of competitive programmers, industry veterans, and researchers — united by a shared mission to build the AI infrastructure for the agentic era.

🏆
ACM World Finalists

Competitive programming excellence

🎓
Top Universities

UC Berkeley, CMU, Columbia, Imperial College London, University of Toronto

🏢
Industry Veterans

Experience across Google, Apple, Microsoft, and ByteDance

💰
Tens of Millions Raised

Backed by Sequoia China (HongShan)

Our Vision

From multimodal model routing, to Skills and MCP tool ecosystems, to model fine-tuning, reinforcement learning, and inference acceleration — we are building a complete AI stack for the next generation of intelligent systems.

Our vision is to make it dramatically easier for developers and enterprises to build AI agents and intelligent applications, while significantly reducing the cost and complexity of using AI.

@JoySong_J

Writing about AI · Building @commonstack_ai & @Gradient_HQ · Angel Investor · Ex-@ABCDELabs, @Bybit_Official, @IBM

FOLLOW ON X / TWITTER →
Scan MeQR code for UncommonRouteuncommonroute