Article · cost and pricing

The real cost of Claude at scale in 2026

Anthropic cut Opus pricing 66% earlier this year. Most builders haven't redone the math. What Claude actually costs in production — and where the savings really are.

By Lucas Powell·April 15, 2026·7 min read·1,536 words

The headline event in Claude pricing this year was Anthropic cutting Opus 4.7 by roughly 66% — from $15/$75 per million tokens to $5/$25. Sonnet 4.6 stayed at $3/$15. Haiku 4.5 stayed at $1/$5.

The pricing change ran through the news cycle in a week and most builders moved on. They shouldn't have. The drop fundamentally rewrites what Claude costs at production volume, and almost nobody has redone the math against their actual workloads.

Here's what Claude actually costs to run an agent in 2026, with concrete numbers from real workflows.

The list-price math nobody quotes correctly

Vendor blog posts about LLM pricing always quote per-token rates. Builders pricing their actual workloads need three things vendors rarely lead with: tokens per task, tasks per month, and the input/output split.

Take a customer-support reply agent — one of the most common production workloads we see. Realistic per-task numbers:

~2,500 input tokens (system prompt, conversation history, customer message)
~400 output tokens (the reply)
5,000 tasks per month (a small B2B SaaS)

That's 12.5M input tokens and 2M output tokens per month. The per-model cost:

Model	Input ($/M)	Output ($/M)	Total cost/month
Claude Opus 4.7	$5	$25	$113
Claude Sonnet 4.6	$3	$15	$68
Claude Haiku 4.5	$1	$5	$23
GPT-5.5	$5	$30	$123
Gemini 2.5 Pro	$1.25	$10	$36
DeepSeek V4 Flash	$0.14	$0.28	$2.30

Three things in that table that don't make it into vendor pricing pages:

Sonnet handles ~90% of these tasks at less than half the cost of Opus. The right-sizing question is the biggest cost lever in any workload, and most teams skip past it.
Haiku is roughly 5× cheaper than Sonnet for the same task class. Customer-support replies don't need frontier reasoning. Haiku is fine for the bulk of them.
The frontier-vs-value gap is roughly 50× between Opus and DeepSeek V4 Flash. Models in the same product category have wildly different unit economics.

The pricing data lives at /api-pricing if you want to slot your own workloads in.

What "at scale" actually means

The above is small-team scale. Production agents at serious volume run an order of magnitude higher.

A real example we've sized: a customer support deflection agent processing 50,000 tickets/month. Same per-task profile, 10× the volume.

Model	Cost/month	Per ticket
Claude Opus 4.7	$1,125	$0.0225
Claude Sonnet 4.6	$675	$0.0135
Claude Haiku 4.5	$225	$0.0045
Sonnet w/ prompt caching	~$135	~$0.0027

Two notes on that table:

The prompt-caching row is the one that matters. Anthropic's prompt caching (and OpenAI's, and DeepSeek's) drops cached input cost by roughly 90%. For a customer support agent where the system prompt is identical across every reply, you're caching ~80% of your input tokens. Real-world cache hit rates land in the 60–80% range for well-designed agents.

After caching, Sonnet 4.6 costs roughly $0.003 per ticket at 50k tickets/month. For comparison: a human support rep handling the same volume would cost roughly $4,500/month fully loaded. The cost gap isn't 5× or 10× — it's three orders of magnitude.

The right model for production isn't always the cheapest. If your tickets need correct technical reasoning, Sonnet beats Haiku on quality enough to justify the cost difference. If they're routine routing decisions, Haiku is fine. The calculator on /calculator shows the spread for your specific workflow shape.

The Claude Pro / Max / API decision

For builders specifically, Anthropic's pricing structure has three tiers worth understanding:

Claude Pro ($20/month): Claude.ai web access, modest Claude Code usage, no Opus
Claude Max ($100/month): more usage, Opus access, full Claude Code access
API pricing (per-token): $5/$25 Opus, $3/$15 Sonnet, $1/$5 Haiku, no subscription

Three rules of thumb for picking:

Solo builders shipping personal projects: Claude Pro for browser-based work, plus Sonnet API direct for any production agent you build. Total ~$50/month including modest API usage. Claude Code on Pro covers ~80% of what coding-focused builders need.

Serious operators running agents in production: Claude Max for the development workflow, plus API direct for the production traffic. The Max tier's Claude Code allowance is the lever — it removes the "am I about to hit my limit?" friction that breaks long agentic loops.

Teams running agents at scale (50k+ tasks/month): API direct, no subscription. The flat per-token pricing is correctly priced for production volume; subscription tiers are designed for individual usage and you'll burn through their caps in days.

The distinction nobody makes clearly: subscriptions price for human-driven usage (you typing in Claude.ai or Claude Code). API pricing prices for machine-driven usage (your agent running). Use both for what they're priced for.

The 5 cost levers that actually move the bill

After a year of helping builders right-size their Claude bills, the cost levers that actually matter, ranked by impact:

1. Drop to a cheaper model

The single biggest lever. Most teams default to Opus or Sonnet and never test Haiku on the bulk of their workload. The cost difference is usually 5–25× between adjacent tiers. The quality difference is task-dependent — sometimes catastrophic, often imperceptible.

The right test: take 100 real production tasks, run them on each tier, measure quality with a rubric. If Haiku passes the rubric on 90% of tasks, route those tasks to Haiku and reserve Sonnet/Opus for the 10% that need it. That single split typically cuts the bill 60–80%.

2. Enable prompt caching

For any agent with a stable system prompt and repeated conversation context — which is most production agents — Anthropic's prompt caching drops input costs ~90% for cached portions. At 70% cache hit rate, your effective input cost is roughly 35% of list price. Free leverage.

Caching adds latency on the first call but saves it on every subsequent call. For high-volume workflows, the math is overwhelming.

3. Trim the prompt

Most production prompts have unused context, redundant examples, or stale instructions that nobody has audited in months. Halving prompt length halves your input cost, with no quality impact (often improvement, since the model has less to wade through).

A 30-minute prompt audit on a high-traffic agent often pays for itself the same week.

4. Use batch APIs where applicable

Anthropic's Batch API runs at 50% the standard per-token rate. Useful for any non-real-time workflow: nightly research jobs, scheduled briefings, bulk classification, weekly reports. The 24-hour delivery window is fine for these use cases.

5. Mix models within a workflow

Lead generation is the canonical example. The naive approach runs Opus on every page scraped. The mature pattern: Haiku or DeepSeek V4 Flash for the bulk parsing, Opus only for the final synthesis decision.

We covered this pattern in Where AI agents actually deliver ROI in 2026. The shape of the numbers: $340/month all-Opus drops to $28/month routed, drops to $8/month aggressively routed. Same workflow, different unit economics.

When Claude is the wrong default

We're a Claude-friendly publication. That's because Claude has been the right default for most builders most of the time. But it's not always:

For high-volume, simple tasks: DeepSeek V4 Flash is roughly 10× cheaper than Haiku and handles classification, simple summaries, and routine routing fine. If you're running 100k+ tasks/month of simple work, run the math.

For latency-critical applications: Gemini 2.5 Flash tends to be faster than Claude for shorter responses. Voice agents and real-time UIs benefit.

For long-context work (>500k tokens): Gemini 2.5 Pro has a 2M context window — twice Claude's. If you're feeding entire codebases or document collections into a single call, Gemini wins on context.

For local/private deployment: Claude doesn't run locally. If data residency or offline operation matters, Llama 3.3 70B self-hosted is the move.

For everything else — agent workflows, code generation, research synthesis, customer-facing chat, the bulk of what builders are actually shipping — Claude is the right default. Just not always the highest-tier Claude.

What changed and what didn't

Two facts to anchor your 2026 Claude pricing decisions:

Opus dropping 66% changed the math at the top. Frontier-tier reasoning that used to cost $15/$75 now costs $5/$25. For the workflows that genuinely need it — complex agentic decisions, deep research synthesis, code generation on hard problems — Opus is now a budget option, not a luxury.
The right-sizing imperative didn't change. The biggest cost mistake teams make isn't picking the wrong vendor. It's defaulting to a model tier higher than the work requires. That mistake is bigger than any pricing change.

The cost calculator at /calculator does the per-token math against real workflow shapes. The pricing data is verified daily and lives at /api-pricing. If you haven't redone the math against your actual production workloads since the Opus price drop, that's the highest-leverage hour of work you'll do this month.

Most builders are paying 3–10× more than they need to. The leverage is sitting there.

About the author

Lucas Powell

Founder, Growth 8020

Founder of Growth 8020. Started Agent Shortlist as the publication he wished existed when his team had to pick AI tools.

Full bio →Growth 8020 ↗GitHub ↗