Question 1

How much does the Claude API cost?

Accepted Answer

Claude API pricing depends on the model. Claude Opus 4.8 is $5 per million input tokens and $25 per million output tokens. Claude Sonnet 4.6 is $3 input and $15 output. Claude Haiku 4.5 is $1 input and $5 output. Anthropic cut Opus pricing 66% in early 2026, making frontier-tier reasoning much more accessible. Prompt caching can drop input costs another 50–90% on repeat queries.

Question 2

How much does GPT-5 cost?

Accepted Answer

GPT-5 is $1.25 per million input tokens and $10 per million output tokens. GPT-5 mini is $0.25 input and $2 output — about 5× cheaper. OpenAI's frontier line was rebranded from GPT-4 to GPT-5 in 2025. Cached input pricing is roughly 50% of standard input.

Question 3

What is the cheapest LLM API?

Accepted Answer

DeepSeek V4 Flash is currently the cheapest credible model at $0.14 per million input tokens and $0.28 per million output tokens — roughly 35× cheaper than Claude Sonnet 4.6 for input. Gemini 2.5 Flash and Claude Haiku 4.5 are the cheapest options from major US vendors. For open-weight models, Llama 3.3 70B via Together AI is competitive at the value tier. The right cheap model depends on your task — value-tier models handle classification and short replies well but struggle with complex reasoning.

Question 4

How do you keep this pricing data accurate?

Accepted Answer

An automated audit runs daily. Small drift (under 25% on both input and output) is auto-applied and published the same day. Larger changes — re-pricings, model deprecations, tier consolidations — go through human editorial review against the live vendor pricing page before publication. Vendor pages remain the canonical source of truth; this dataset is a clean machine-readable mirror, not a substitute. The exact verification date is shown on this page and embedded in the public dataset's JSON.

Question 5

Can I use this pricing data in my own product?

Accepted Answer

Yes. The full dataset is published as pricing.json in a public GitHub repo (lucaspowell8020/ai-agent-pricing) under CC-BY-4.0. You can fetch it directly from the raw GitHub URL, no API key required. Use it in commercial products, comparison tools, calculators, or blog posts — attribution to agentshortlist.com is appreciated but not required by the licence.

Question 6

Why not just check the vendor pricing pages directly?

Accepted Answer

You can, and you should verify before committing budget. The reason this dataset exists: vendor pricing pages disagree on units (some quote per token, some per thousand, some per million), use struck-through old prices for SEO, and change format frequently. Aggregating into one normalised file makes side-by-side comparison and programmatic use possible. We treat the vendor pages as authoritative — this dataset is a clean, machine-readable mirror.

Question 7

How does AI API pricing actually work?

Accepted Answer

Nearly every LLM API charges per token — small pieces of text the model reads (input) and writes (output). A token is roughly 3/4 of an English word. Pricing is usually quoted per million tokens, split between input and output rates. Output tokens are 2–6× more expensive than input across most vendors. A simple chat turn (1,000 input + 200 output tokens) on Claude Sonnet costs about $0.006. Subscription tiers (Claude Pro, ChatGPT Plus) bundle a fixed amount of usage with a flat monthly fee — useful for individual usage, not for production agent traffic where direct API pricing is correctly sized.

Question 8

What are typical API pricing tiers?

Accepted Answer

Most frontier vendors offer three tiers: frontier (best reasoning — Claude Opus 4.8, GPT-5.5, Gemini 3 Pro), balanced (the sweet spot for most production work — Claude Sonnet 4.6, GPT-5.4, Gemini Flash), and value (cheap and fast for high-volume mechanical work — Claude Haiku 4.5, GPT-5.4 mini, Gemini Flash-Lite). Cross-tier price gaps are large: balanced is typically 5–10× cheaper than frontier; value is another 3–5× cheaper than balanced. Most teams over-spec by defaulting to frontier — picking the right tier is the single biggest cost lever in production agent workflows.

Question 9

Is this an AI API pricing tracker?

Accepted Answer

Yes — this page is updated daily against vendor sources, with prices verified against an industry catalog before publication. Small drift (under 25% on input and output) is auto-applied the same day; larger changes go through human editorial review. The dataset is also published as an open JSON file you can fetch programmatically — see the 'Open dataset' section below. If you're building a comparison tool or calculator, the raw JSON is the path; if you're just checking prices for budget planning, this page is the path.

Question 10

What's the difference between AI cloud pricing and direct API pricing?

Accepted Answer

Direct API pricing is what you pay the model vendor (Anthropic, OpenAI, Google) per token. Cloud pricing is what you pay a cloud provider (AWS Bedrock, Azure OpenAI, Vertex AI) for the same models, typically with a 10–30% markup in exchange for unified billing, identity management, and compliance certifications. For most builders, direct API is cheaper and simpler. For enterprises with strict procurement, identity, or compliance requirements, the cloud markup is the cost of doing business. The tier rates we publish on this page are direct API rates.

Tier	Use case	Lowest input cost	Frontier example
Frontier	Hardest reasoning, ambiguous decisions, long-context synthesis	$1.25/M (Grok 4.3)	$10/M (Claude Fable 5)
Balanced	Most production agent work — drafting, planning, multi-step workflows	$0.5/M (Gemini 3 Flash Preview)	$3/M (Claude Sonnet 4.6)
Value	Classification, extraction, routing, high-volume mechanical work	$0.0938/M (DeepSeek V4 Flash)	$1/M (Claude Haiku 4.5)

Model	Inputper 1M tokens	Outputper 1M tokens	Context	Tier
Claude Fable 5Released 2026-06-09 Anthropic's premium creative-writing and roleplay model. Priced 2x above Opus 5. Different design target than Opus — reach for it when character consistency, prose quality, or long-form fiction matters more than raw reasoning throughput.	$10	$50	1M	frontier
Claude Mythos 5 Limited-availability frontier model (Glasswing program). Same pricing as Fable 5. Not generally accessible — access via Anthropic's Glasswing program.	$10	$50	1M	frontier
Claude Opus 5 Anthropic's current flagship reasoning model. Same $5/$25 pricing as the entire Opus 4.x line — no price bump for the generation upgrade. Best pick for the hardest agentic decisions where reasoning quality compounds.	$5	$25	1M	frontier
Claude Opus 5 Fast Low-latency variant of Opus 5 — 2x the price for guaranteed throughput and faster time-to-first-token. Right pick for production agentic loops where latency matters more than cost.	$10	$50	1M	frontier
Claude Opus 4.8Released 2026-05-28 Previous flagship — superseded by Opus 5 at the same $5/$25 price. Still accessible via API for backwards compatibility. No reason to pick over Opus 5 unless you've pinned to this version.	$5	$25	1M	frontier
Claude Opus 4.8 FastReleased 2026-05-28 Low-latency variant of Opus 4.8. Superseded by Opus 5 Fast at the same price. Right pick only if you've pinned to Opus 4.8 for other reasons.	$10	$50	1M	frontier
Claude Opus 4.7Released 2026-04-16 Previous flagship — superseded by Opus 4.8 (2026-05-28) at the same price. Still accessible via API for backwards compatibility.	$5	$25	1M	frontier
Claude Sonnet 5 Anthropic's current balanced-tier flagship. Introductory pricing of $2 input / $10 output per million tokens is in effect through Aug 31, 2026; standard pricing of $3/$15 kicks in Sept 1, 2026. Either way, the best price-quality point in the Claude lineup for the bulk of production agent workloads.	$2	$10	1M	balanced
Claude Sonnet 4.6Released 2026-02-17 Previous balanced-tier model. Superseded by Sonnet 5 at introductory $2/$10 pricing (Sonnet 5 will match Sonnet 4.6 at $3/$15 after Aug 31, 2026). Still fully accessible via API for backwards compatibility.	$3	$15	1M	balanced
Claude Haiku 4.5Released 2025-10-15 Fast and cheap. Best for high-volume classification and short replies.	$1	$5	200k	value

Model	Inputper 1M tokens	Outputper 1M tokens	Context	Tier
GPT-5.5Released 2026-04-24 OpenAI's frontier line. Marketed for coding and professional work.	$5	$30	400k	frontier
GPT-5.4Released 2026-03-05 The full GPT-5.4 — not the mini. Comparable pricing to Claude Sonnet with OpenAI's tool-use and computer-use stack.	$2.5	$15	1.05M	balanced
GPT-5.4 miniReleased 2026-03-17 OpenAI's value tier. Strong on coding, computer use, and subagent workloads.	$0.75	$4.5	128k	value
GPT-5.4 nanoReleased 2026-03-17 OpenAI's cheapest model. ~4x cheaper than GPT-5.4 mini. Best for high-volume mechanical work — classification, extraction, format conversion — where Haiku-tier output is sufficient.	$0.20	$1.25	128k	value
GPT-5.3 Codex OpenAI's coding-specific model — powers the Codex CLI and ChatGPT cloud-runner agent. Tuned for software engineering tasks; competitive with Claude Sonnet on coding-only benchmarks at lower input cost.	$1.75	$14	400k	balanced

Model	Inputper 1M tokens	Outputper 1M tokens	Context	Tier
Grok 4.5Released 2026-07-06 xAI's flagship as of July 2026 — the model xAI now points everyone to for everything, including code. It costs more than Grok 4.3 ($2/$6 vs $1.25/$2.50) and drops to a 500k context (half of 4.3's 1M), the trade for what xAI bills as its most intelligent and fastest model. Reach for 4.3 instead when you need the 1M window or the lower price, and for cheap agentic coding, Grok Build 0.1 is still the value pick. xAI also ships separately-priced Voice and Imagine APIs — see the Voice & Media APIs section below.	$2	$6	500k	frontier
Grok 4.3Released 2026-04-30 The previous xAI flagship, still worth it: a 1M context — double Grok 4.5's 500k — at half the price ($1.25/$2.50), which makes it the better call for long-document or high-volume work where 4.5's extra reasoning isn't worth 2–3× the token cost. Live web access via X integration, agentic tool calling, non-reasoning mode by default. xAI also lists dated Grok 4.20 reasoning, non-reasoning, and multi-agent snapshots at this same price; we track the headline models here.	$1.25	$2.5	1M	frontier
Grok Build 0.1Released 2026-05-20 xAI's coding-specific model — trained for agentic coding workflows. xAI now steers general users to Grok 4.5 for code, but Build 0.1 stays the cheaper, faster option: a fraction of 4.5's output price with a 256k context (enough for most repo work). Direct competitor to GPT-5 mini and Claude Haiku at the value tier for coding tasks.	$1	$2	256k	value

Model	Inputper 1M tokens	Outputper 1M tokens	Context	Tier
Gemini 2.5 ProReleased 2025-06-17 Largest context window on the list. Pricing shown is for prompts ≤200k tokens; rises to $2.50/$15 above that.	$1.25	$10	2M	balanced
Gemini 2.5 FlashReleased 2025-06-17 Hybrid reasoning model with 1M context and tunable thinking budgets.	$0.30	$2.5	1M	value
Gemini 3.5 FlashReleased 2026-05-19 Google's current Flash flagship — 'most intelligent model built for speed' per their docs. Combines frontier intelligence with native Google Search and Maps grounding (5k free prompts/month, then $14 per 1k queries). Context caching at $0.15/M.	$1.5	$9	1M	balanced
Gemini 3.1 Pro PreviewReleased 2026-02-19 Multimodal, agentic, strong on coding. Pricing shown is for prompts ≤200k tokens; rises to $4/$18 above that.	$2	$12	2M	balanced
Gemini 3 Flash PreviewReleased 2025-12-17 Google's high-speed thinking model for agentic workflows. Strong on coding and multi-turn chat. Top 5 on public model marketplaces this week.	$0.50	$3	1M	balanced
Gemini 3.1 Flash-Lite PreviewReleased 2026-03-03 Google's most cost-efficient model. Optimised for high-volume agentic tasks and simple data processing.	$0.25	$1.5	1M	value

AI Model API Pricing (2026).

2026 AI API pricing at a glance.

Anthropic

OpenAI

xAI

Google

Meta / Together AI

DeepSeek

Moonshot AI

z.ai

Mistral

Voice & Media APIs.

xAI

Pricing is just the start. The real question is what your workflow actually costs.

Use this pricing data in your own product.

What builders ask before they pick a model.