A place of
abundance

Tokens should be cheap, fast, and smart - for everyone. We get there through efficiency: reducing what goes in, reducing what comes out, and routing where it performs best.

Book a call Join Discord

Downsize your Tokens costs

Works best with agents and AI coding tools

Takes 2 minutes or one prompt to set up.

Reduces inference costs for agents and AI coding tools. Uses Anthropic / OpenAI by default — or routes to alternative models at equivalent quality to reduce further.

Talk to us →

Under the hood

We optimize at all levels

You point to our gateway. We handle routing, compression, and model selection.

InputReduceFilterRouteOutput

KimiDeepSeekFableSonnetOpus

Input token blockReduced tokensFiltered tokensRouted output

01Input

Full token grid arrives - data, prompt and context.

02Reduce

We reduce input token volume by removing low-signal and redundant units.

03Filter

When feasible, we cherry-pick only the most salient pieces.

04Route

Smart router sends bundle to the best model the grader recommends.

05Output

Response compressed too.

Try it out. Install the CLI and connect coding tools or agents in seconds.

CLI quickstart →

Same Quality. Better prices.

Use Anthropic and OpenAI models at discounted prices

Or use our models for maximum cost savings

Processing mode

Model	Input / 1M tok	Output / 1M tok
Anthropic
Claude Fable 5 Mythos-class, most capable publicly available	$10$7.14	$50$38.46
Claude Opus 4.8 Most intelligent, for agents & coding	$5$3.57	$25$19.23
Claude Sonnet 4.6 Balance of intelligence, cost & speed	$3$2.14	$15$11.54
Claude Haiku 4.5 Fastest, most compact Anthropic model	$0.8$0.72	$4$3.60
OpenAI
GPT-5.5 Frontier model for complex reasoning	$5$3.57	$30$23.08
GPT-5.4 Affordable model for coding & work	$2.50$1.79	$15$11.54
GPT-5.4 mini Strongest mini for coding & sub-agents	$0.75$0.675	$4.50$4.05
Downsizing
Universe Maximum capability, multimodal & deep reasoning	$1	$4
Galaxy Balanced intelligence at a fraction of the cost	$0.5	$2
Star Lightning-fast, ultra-low cost	$0.25	$1

^*These figures represent typical compression gains observed across diverse applications and agents.

Running a custom pipeline? Share your setup and we'll figure out what fits.

contact@downsizing.dev →

Coding Subscription Pricing

Bring your subscription, and we'll double its capacity

Bring your existing Claude or Codex subscription — we optimize on top.

Bring your own

First 30,000 tokens / mo

Free

Beyond 30,000 tokens

Flat monthly cap — never more than $5

$5/mo

Bring your existing Claude or Codex subscription and we'll extend your effective limits — at no extra cost for lighter usage, and a flat $5/mo for heavier workloads.

Input & output token compression
Smart routing across providers
Focused attention mode
Usage dashboard & analytics

Get started free

Coming soon

Our subscription

Pro$15/mo

Max$70/mo

Full access — no external subscription required. We provide the tokens with doubled capacity compared to direct Anthropic or OpenAI plans.

Everything in Bring your own
2× token capacity vs. Claude Code or Codex

Join waitlist

No setup fees · Cancel anytime · Charged only when you exceed 30,000 tokens

From users

Loved by developers

“We switched our code-generation pipelines to Downsizing and started seeing savings right away.”

Paula

Staff Engineer · Samsung

“Went from $8k/month on Claude Code to $5k. Didn't change a single prompt.”

Aleh

Senior Software Engineer · Nvidia

“Extends my Claude Code subscription at least by 2x on the free version. Love it.”

Darshan

Senior Software Engineer · Google

“Downsizing replaced multiple plugins with better outcomes, 2 minutes to set up.”

Anton

Senior Software Engineer · Revolut

Analytics

Numbers speak for themselves

Actual Claude Code sessions

downsizing.dev/dashboard/savings

Input tokens

3.5B

received

Input saved

1.44B

of 3.5B total

Requests

14,077

total

Output tokens

284M

received

Output saved

99.4M

of 284M total

Routed

8,602

61% of total

Input tokens by model

Model	Requests	Tokens in	Saved
Sonnet	8,241	2.1B	891M
Opus	1,034	412M	176M
Haiku	4,802	980M	372M

Recent requests

Model	Saved	When
Sonnet	+18,432	2m ago
Haiku	+4,211	4m ago
Sonnet	+34,209	7m ago
Haiku	+2,847	11m ago

Controls

Full control in real-time

Tune compression, routing, and attention for each workload — from a single config.

Input compression

Redundancy stripped from context before it reaches the model.

Output compression

Models instructed to be concise and eliminate scaffolding.

Smart routing

Match each request to the cheapest capable model.

Focused attention

Steers the model to attend only to the task-critical parts of context.

Adjust the controls above to configure your proxy — then sign up to save.

Benchmarked

Compression doesn't reduce quality.

In large contexts, it often improves it.

ModelBaselineCompressedDelta

Claude Fable 5

Baseline

95%

Compressed

95.7%

Delta

+0.7%

Baseline

Compressed

Claude Fable 595%95.7%+0.7%

Baseline

Compressed

Claude Opus 4.8

Baseline

88.6%

Compressed

89.3%

Delta

+0.7%

Baseline

Compressed

Claude Opus 4.888.6%89.3%+0.7%

Baseline

Compressed

Claude Sonnet 4.6

Baseline

79.6%

Compressed

80.8%

Delta

+1.2%

Baseline

Compressed

Claude Sonnet 4.679.6%80.8%+1.2%

Baseline

Compressed

GPT-5.5

Baseline

88.7%

Compressed

89.5%

Delta

+0.8%

Baseline

Compressed

GPT-5.588.7%89.5%+0.8%

Baseline

Compressed

Integrations

Works with Claude Code, Codex and many others

Zero changes to your existing tools or workflows.

Coding & agentic tools

We silently compress redundant context so you can code longer and spend significantly less.

Personal AI Assistants

We prune stale context and use cheaper models for simple tasks so your agents run longer for less.

Live

A place of
abundance

Downsize your Tokens costs

We optimize at all levels

Use Anthropic and OpenAI models at discounted prices

Bring your subscription, and we'll double its capacity

Loved by developers

Numbers speak for themselves

Full control in real-time

Compression doesn't reduce quality.

Works with Claude Code, Codex and many others

Claude Code

Codex

Junie

Hermes

Aider

Amp

Cline

Crush

Droid

Eigent

Gemini CLI

Goose

Pi

A place of abundance

Downsize your OpenClawTokens costs

We optimize at all levels

Use Anthropic and OpenAI models at discounted prices

Bring your subscription, and we'll double its capacity

Loved by developers

Numbers speak for themselves

Full control in real-time

Compression doesn't reduce quality.

Works with Claude Code, Codex and many others

Claude Code

Codex

Junie

Hermes

Aider

Amp

Cline

Crush

Droid

Eigent

Gemini CLI

Goose

Pi

A place of
abundance

Downsize your Tokens costs