Reference
AI API Pricing.
Per-token pricing for every frontier model worth running an agent on. Filter by vendor or tier, verify against the source, plug into the cost calculator. Backed by an open dataset you can use in your own tools.
Vendor
Tier
Anthropic
Vendor pricing page ↗| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
Claude Opus 4.7 Best-in-class reasoning. Anthropic's flagship for the hardest agentic decisions. | $5 | $25 | frontier |
Claude Sonnet 4.6 The sweet spot. Handles ~90% of agent workflows at a fraction of Opus pricing. | $3 | $15 | balanced |
Claude Haiku 4.5 Fast and cheap. Best for high-volume classification and short replies. | $1 | $5 | value |
OpenAI
Vendor pricing page ↗| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
GPT-5.5 OpenAI's frontier line. Marketed for coding and professional work. | $5 | $30 | frontier |
GPT-5.4 The full GPT-5.4 — not the mini. Comparable pricing to Claude Sonnet with OpenAI's tool-use and computer-use stack. | $2.5 | $15 | balanced |
GPT-5.4 mini OpenAI's value tier. Strong on coding, computer use, and subagent workloads. | $0.75 | $4.5 | value |
| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
Grok 4.20 xAI's current flagship. 2M context with live web access via X integration. xAI has consolidated its lineup — earlier Grok 4 and Grok 4.1 Fast are no longer featured on the catalog. For an even-cheaper variant, Grok 4.3 ships at the same $1.25/$2.5 price. | $1.25 | $2.5 | frontier |
| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
Gemini 2.5 Pro Largest context window on the list. Pricing shown is for prompts ≤200k tokens; rises to $2.50/$15 above that. | $1.25 | $10 | balanced |
Gemini 2.5 Flash Hybrid reasoning model with 1M context and tunable thinking budgets. | $0.30 | $2.5 | value |
Gemini 3.1 Pro Preview Multimodal, agentic, strong on coding. Pricing shown is for prompts ≤200k tokens; rises to $4/$18 above that. | $2 | $12 | balanced |
Gemini 3 Flash Preview Google's high-speed thinking model for agentic workflows. Strong on coding and multi-turn chat. #5 on OpenRouter this week. | $0.50 | $3 | balanced |
Gemini 3.1 Flash-Lite Preview Google's most cost-efficient model. Optimised for high-volume agentic tasks and simple data processing. | $0.25 | $1.5 | value |
Meta / Together AI
Vendor pricing page ↗| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
Llama 3.3 70B (Together AI) Open weights. The price shown reflects Together AI; other inference providers host the same weights at different rates. | $0.88 | $0.88 | value |
DeepSeek
Vendor pricing page ↗| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
DeepSeek V4 Flash DeepSeek's value tier. Cache-hit pricing drops input to $0.0028/M on repeat queries. | $0.10 | $0.20 | value |
DeepSeek V3.2 DeepSeek's capable open-weight model. Extraordinary value — frontier-class performance at $0.25/M input. #3 on OpenRouter this week. | $0.23 | $0.34 | value |
Moonshot AI
Vendor pricing page ↗| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
Kimi K2.6 Moonshot AI's current flagship — 841% week-on-week growth on OpenRouter. Strong on long-context and agentic work. | $0.68 | $3.41 | balanced |
| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
GLM 5.1 Successor to GLM-4.6. z.ai's current flagship with improved reasoning and tool-use benchmarks. | $0.98 | $3.08 | balanced |
Mistral
Vendor pricing page ↗| Model | Inputper 1M tokens | Outputper 1M tokens | Tier |
|---|---|---|---|
Mistral Large 2.1 European frontier model. Stronger data residency story than US providers — useful for EU compliance-heavy workflows. | $2 | $6 | balanced |
Run your own numbers
Pricing is just the start. The real question is what your workflow actually costs.
The cost calculator combines this pricing data with realistic per-task token estimates across ten builder workflows — from ticket classification to code review — so you can see the monthly bill at your real volume, not just the headline rate.
Open the cost calculator →Open dataset
Use this pricing data in your own product.
The full dataset is published as pricing.json in a public GitHub repo. CC-BY-4.0. No API key, no rate limit, no auth. Updated automatically when prices change.
GitHub
lucaspowell8020/ai-agent-pricing ↗
README, methodology, JSON Schema, JavaScript and Python examples.
Raw JSON
raw.githubusercontent.com/.../pricing.json ↗
Single file, no auth. Fetch directly from your code or CI.
curl
curl -O https://raw.githubusercontent.com/lucaspowell8020/ai-agent-pricing/main/pricing.json
JavaScript
const data = await fetch( "https://raw.githubusercontent.com/lucaspowell8020/ai-agent-pricing/main/pricing.json" ).then((r) => r.json()); const opus = data.models.find((m) => m.slug === "claude-opus-4-7"); console.log(opus.inputPricePerMillion, opus.outputPricePerMillion);
Python
import json, urllib.request
data = json.loads(urllib.request.urlopen(
"https://raw.githubusercontent.com/lucaspowell8020/ai-agent-pricing/main/pricing.json"
).read())
opus = next(m for m in data["models"] if m["slug"] == "claude-opus-4-7")
print(opus["inputPricePerMillion"], opus["outputPricePerMillion"])Common questions
What builders ask before they pick a model.
How much does the Claude API cost?
Claude API pricing depends on the model. Claude Opus 4.7 is $5 per million input tokens and $25 per million output tokens. Claude Sonnet 4.6 is $3 input and $15 output. Claude Haiku 4.5 is $1 input and $5 output. Anthropic cut Opus pricing 66% in early 2026, making frontier-tier reasoning much more accessible. Prompt caching can drop input costs another 50–90% on repeat queries.
How much does GPT-5 cost?
GPT-5 is $1.25 per million input tokens and $10 per million output tokens. GPT-5 mini is $0.25 input and $2 output — about 5× cheaper. OpenAI's frontier line was rebranded from GPT-4 to GPT-5 in 2025. Cached input pricing is roughly 50% of standard input.
What is the cheapest LLM API?
DeepSeek V4 Flash is currently the cheapest credible model at $0.14 per million input tokens and $0.28 per million output tokens — roughly 35× cheaper than Claude Sonnet 4.6 for input. Gemini 2.5 Flash and Claude Haiku 4.5 are the cheapest options from major US vendors. For open-weight models, Llama 3.3 70B via Together AI is competitive at the value tier. The right cheap model depends on your task — value-tier models handle classification and short replies well but struggle with complex reasoning.
How do you keep this pricing data accurate?
An automated audit runs daily. Small drift (under 25% on both input and output) is auto-applied and published the same day. Larger changes — re-pricings, model deprecations, tier consolidations — go through human editorial review against the live vendor pricing page before publication. Vendor pages remain the canonical source of truth; this dataset is a clean machine-readable mirror, not a substitute. The exact verification date is shown on this page and embedded in the public dataset's JSON.
Can I use this pricing data in my own product?
Yes. The full dataset is published as pricing.json in a public GitHub repo (lucaspowell8020/ai-agent-pricing) under CC-BY-4.0. You can fetch it directly from the raw GitHub URL, no API key required. Use it in commercial products, comparison tools, calculators, or blog posts — attribution to agentshortlist.com is appreciated but not required by the licence.
Why not just check the vendor pricing pages directly?
You can, and you should verify before committing budget. The reason this dataset exists: vendor pricing pages disagree on units (some quote per token, some per thousand, some per million), use struck-through old prices for SEO, and change format frequently. Aggregating into one normalised file makes side-by-side comparison and programmatic use possible. We treat the vendor pages as authoritative — this dataset is a clean, machine-readable mirror.
How does AI API pricing actually work?
Nearly every LLM API charges per token — small pieces of text the model reads (input) and writes (output). A token is roughly 3/4 of an English word. Pricing is usually quoted per million tokens, split between input and output rates. Output tokens are 2–6× more expensive than input across most vendors. A simple chat turn (1,000 input + 200 output tokens) on Claude Sonnet costs about $0.006. Subscription tiers (Claude Pro, ChatGPT Plus) bundle a fixed amount of usage with a flat monthly fee — useful for individual usage, not for production agent traffic where direct API pricing is correctly sized.
What are typical API pricing tiers?
Most frontier vendors offer three tiers: frontier (best reasoning — Claude Opus 4.7, GPT-5.5, Gemini 3 Pro), balanced (the sweet spot for most production work — Claude Sonnet 4.6, GPT-5.4, Gemini Flash), and value (cheap and fast for high-volume mechanical work — Claude Haiku 4.5, GPT-5.4 mini, Gemini Flash-Lite). Cross-tier price gaps are large: balanced is typically 5–10× cheaper than frontier; value is another 3–5× cheaper than balanced. Most teams over-spec by defaulting to frontier — picking the right tier is the single biggest cost lever in production agent workflows.
Is this an AI API pricing tracker?
Yes — this page is updated daily against vendor sources, with prices verified against an industry catalog before publication. Small drift (under 25% on input and output) is auto-applied the same day; larger changes go through human editorial review. The dataset is also published as an open JSON file you can fetch programmatically — see the 'Open dataset' section below. If you're building a comparison tool or calculator, the raw JSON is the path; if you're just checking prices for budget planning, this page is the path.
What's the difference between AI cloud pricing and direct API pricing?
Direct API pricing is what you pay the model vendor (Anthropic, OpenAI, Google) per token. Cloud pricing is what you pay a cloud provider (AWS Bedrock, Azure OpenAI, Vertex AI) for the same models, typically with a 10–30% markup in exchange for unified billing, identity management, and compliance certifications. For most builders, direct API is cheaper and simpler. For enterprises with strict procurement, identity, or compliance requirements, the cloud markup is the cost of doing business. The tier rates we publish on this page are direct API rates.
Related
Tool
Cost Calculator
Pick a workflow, set the volume, see the monthly bill across every model in this dataset.
Tool
Agent Picker
Five questions, one platform recommendation. Pricing is only part of the choice.
Reviews
The 2026 Shortlist
Platforms tested across real builder workflows. Verdicts, ratings, decision matrix.