Downsizing logodownsizing

A place of
abundance

Tokens should be cheap, fast, and smart - for everyone. We get there through efficiency: reducing what goes in, reducing what comes out, and routing where it performs best.

Downsize your Tokens costs

Works best with agents and AI coding tools

Takes 2 minutes or one prompt to set up.

Reduces inference costs for agents and AI coding tools. Uses Anthropic / OpenAI by default — or routes to alternative models at equivalent quality to reduce further.

Talk to us →

Under the hood

We optimize at all levels

You point to our gateway. We handle routing, compression, and model selection.

InputReduceFilterRouteOutput
KimiDeepSeekFableSonnetOpus
Input token blockReduced tokensFiltered tokensRouted output
01Input

Full token grid arrives - data, prompt and context.

02Reduce

We reduce input token volume by removing low-signal and redundant units.

03Filter

When feasible, we cherry-pick only the most salient pieces.

04Route

Smart router sends bundle to the best model the grader recommends.

05Output

Response compressed too.

Try it out. Install the CLI and connect coding tools or agents in seconds.

CLI quickstart →

Same Quality. Better prices.

Use Anthropic and OpenAI models at discounted prices

Or use our models for maximum cost savings

Processing mode
ModelInput / 1M tokOutput / 1M tok
Anthropic

Claude Fable 5

Mythos-class, most capable publicly available

$10$7.14
$50$38.46

Claude Opus 4.8

Most intelligent, for agents & coding

$5$3.57
$25$19.23

Claude Sonnet 4.6

Balance of intelligence, cost & speed

$3$2.14
$15$11.54

Claude Haiku 4.5

Fastest, most compact Anthropic model

$0.8$0.72
$4$3.60
OpenAI

GPT-5.5

Frontier model for complex reasoning

$5$3.57
$30$23.08

GPT-5.4

Affordable model for coding & work

$2.50$1.79
$15$11.54

GPT-5.4 mini

Strongest mini for coding & sub-agents

$0.75$0.675
$4.50$4.05
Downsizing

Universe

Maximum capability, multimodal & deep reasoning

$1
$4

Galaxy

Balanced intelligence at a fraction of the cost

$0.5
$2

Star

Lightning-fast, ultra-low cost

$0.25
$1

*These figures represent typical compression gains observed across diverse applications and agents.

Running a custom pipeline? Share your setup and we'll figure out what fits.

contact@downsizing.dev →

Coding Subscription Pricing

Bring your subscription, and we'll double its capacity

Bring your existing Claude or Codex subscription — we optimize on top.

Bring your own

First 30,000 tokens / mo
Free
Beyond 30,000 tokens

Flat monthly cap — never more than $5

$5/mo

Bring your existing Claude or Codex subscription and we'll extend your effective limits — at no extra cost for lighter usage, and a flat $5/mo for heavier workloads.

  • Input & output token compression
  • Smart routing across providers
  • Focused attention mode
  • Usage dashboard & analytics
Get started free
Coming soon

Our subscription

Pro$15/mo
Max$70/mo

Full access — no external subscription required. We provide the tokens with doubled capacity compared to direct Anthropic or OpenAI plans.

  • Everything in Bring your own
  • 2× token capacity vs. Claude Code or Codex
Join waitlist

No setup fees · Cancel anytime · Charged only when you exceed 30,000 tokens

From users

Loved by developers

We switched our code-generation pipelines to Downsizing and started seeing savings right away.

Paula

Paula

Staff Engineer · Samsung

Went from $8k/month on Claude Code to $5k. Didn't change a single prompt.

Aleh

Aleh

Senior Software Engineer · Nvidia

Extends my Claude Code subscription at least by 2x on the free version. Love it.

Darshan

Darshan

Senior Software Engineer · Google

Downsizing replaced multiple plugins with better outcomes, 2 minutes to set up.

Anton

Anton

Senior Software Engineer · Revolut

Analytics

Numbers speak for themselves

Actual Claude Code sessions

downsizing.dev/dashboard/savings

Input tokens

3.5B

received

Input saved

1.44B

of 3.5B total

Requests

14,077

total

Output tokens

284M

received

Output saved

99.4M

of 284M total

Routed

8,602

61% of total

Input tokens by model

ModelRequestsTokens inSaved
Sonnet8,2412.1B
891M
Opus1,034412M
176M
Haiku4,802980M
372M

Recent requests

ModelSavedWhen
Sonnet+18,4322m ago
Haiku+4,2114m ago
Sonnet+34,2097m ago
Haiku+2,84711m ago

Controls

Full control in real-time

Tune compression, routing, and attention for each workload — from a single config.

Input compression

Redundancy stripped from context before it reaches the model.

Output compression

Models instructed to be concise and eliminate scaffolding.

Smart routing

Match each request to the cheapest capable model.

Focused attention

Steers the model to attend only to the task-critical parts of context.

Adjust the controls above to configure your proxy — then sign up to save.

Benchmarked

Compression doesn't reduce quality.

In large contexts, it often improves it.

Claude Fable 5

Baseline

95%

Compressed

95.7%

Delta

+0.7%

Baseline
Compressed

Claude Opus 4.8

Baseline

88.6%

Compressed

89.3%

Delta

+0.7%

Baseline
Compressed

Claude Sonnet 4.6

Baseline

79.6%

Compressed

80.8%

Delta

+1.2%

Baseline
Compressed

GPT-5.5

Baseline

88.7%

Compressed

89.5%

Delta

+0.8%

Baseline
Compressed