Agent Shortlist

Reference

AI API Pricing.

Per-token pricing for every frontier model worth running an agent on. Filter by vendor or tier, verify against the source, plug into the cost calculator. Backed by an open dataset you can use in your own tools.

Verified: 2026-06-09·18 models·9 vendors·Open dataset on GitHub ↗
Always verify pricing directly with the vendor before committing budget. Vendors change prices, ship new tiers, and apply per-account discounts that a public dataset can't track. Each vendor block below links to its official pricing page.

Vendor

Tier

Showing 18 of 18 models
ModelInputper 1M tokensOutputper 1M tokensTier
Claude Opus 4.7

Best-in-class reasoning. Anthropic's flagship for the hardest agentic decisions.

$5$25frontier
Claude Sonnet 4.6

The sweet spot. Handles ~90% of agent workflows at a fraction of Opus pricing.

$3$15balanced
Claude Haiku 4.5

Fast and cheap. Best for high-volume classification and short replies.

$1$5value
ModelInputper 1M tokensOutputper 1M tokensTier
GPT-5.5

OpenAI's frontier line. Marketed for coding and professional work.

$5$30frontier
GPT-5.4

The full GPT-5.4 — not the mini. Comparable pricing to Claude Sonnet with OpenAI's tool-use and computer-use stack.

$2.5$15balanced
GPT-5.4 mini

OpenAI's value tier. Strong on coding, computer use, and subagent workloads.

$0.75$4.5value
ModelInputper 1M tokensOutputper 1M tokensTier
Grok 4.20

xAI's current flagship. 2M context with live web access via X integration. xAI has consolidated its lineup — earlier Grok 4 and Grok 4.1 Fast are no longer featured on the catalog. For an even-cheaper variant, Grok 4.3 ships at the same $1.25/$2.5 price.

$1.25$2.5frontier
ModelInputper 1M tokensOutputper 1M tokensTier
Gemini 2.5 Pro

Largest context window on the list. Pricing shown is for prompts ≤200k tokens; rises to $2.50/$15 above that.

$1.25$10balanced
Gemini 2.5 Flash

Hybrid reasoning model with 1M context and tunable thinking budgets.

$0.30$2.5value
Gemini 3.1 Pro Preview

Multimodal, agentic, strong on coding. Pricing shown is for prompts ≤200k tokens; rises to $4/$18 above that.

$2$12balanced
Gemini 3 Flash Preview

Google's high-speed thinking model for agentic workflows. Strong on coding and multi-turn chat. #5 on OpenRouter this week.

$0.50$3balanced
Gemini 3.1 Flash-Lite Preview

Google's most cost-efficient model. Optimised for high-volume agentic tasks and simple data processing.

$0.25$1.5value

Meta / Together AI

Vendor pricing page ↗
ModelInputper 1M tokensOutputper 1M tokensTier
Llama 3.3 70B (Together AI)

Open weights. The price shown reflects Together AI; other inference providers host the same weights at different rates.

$0.88$0.88value
ModelInputper 1M tokensOutputper 1M tokensTier
DeepSeek V4 Flash

DeepSeek's value tier. Cache-hit pricing drops input to $0.0028/M on repeat queries.

$0.10$0.20value
DeepSeek V3.2

DeepSeek's capable open-weight model. Extraordinary value — frontier-class performance at $0.25/M input. #3 on OpenRouter this week.

$0.23$0.34value
ModelInputper 1M tokensOutputper 1M tokensTier
Kimi K2.6

Moonshot AI's current flagship — 841% week-on-week growth on OpenRouter. Strong on long-context and agentic work.

$0.68$3.41balanced
ModelInputper 1M tokensOutputper 1M tokensTier
GLM 5.1

Successor to GLM-4.6. z.ai's current flagship with improved reasoning and tool-use benchmarks.

$0.98$3.08balanced
ModelInputper 1M tokensOutputper 1M tokensTier
Mistral Large 2.1

European frontier model. Stronger data residency story than US providers — useful for EU compliance-heavy workflows.

$2$6balanced

Run your own numbers

Pricing is just the start. The real question is what your workflow actually costs.

The cost calculator combines this pricing data with realistic per-task token estimates across ten builder workflows — from ticket classification to code review — so you can see the monthly bill at your real volume, not just the headline rate.

Open the cost calculator →

Open dataset

Use this pricing data in your own product.

The full dataset is published as pricing.json in a public GitHub repo. CC-BY-4.0. No API key, no rate limit, no auth. Updated automatically when prices change.

curl

curl -O https://raw.githubusercontent.com/lucaspowell8020/ai-agent-pricing/main/pricing.json

JavaScript

const data = await fetch(
  "https://raw.githubusercontent.com/lucaspowell8020/ai-agent-pricing/main/pricing.json"
).then((r) => r.json());

const opus = data.models.find((m) => m.slug === "claude-opus-4-7");
console.log(opus.inputPricePerMillion, opus.outputPricePerMillion);

Python

import json, urllib.request

data = json.loads(urllib.request.urlopen(
    "https://raw.githubusercontent.com/lucaspowell8020/ai-agent-pricing/main/pricing.json"
).read())

opus = next(m for m in data["models"] if m["slug"] == "claude-opus-4-7")
print(opus["inputPricePerMillion"], opus["outputPricePerMillion"])

Common questions

What builders ask before they pick a model.

How much does the Claude API cost?

Claude API pricing depends on the model. Claude Opus 4.7 is $5 per million input tokens and $25 per million output tokens. Claude Sonnet 4.6 is $3 input and $15 output. Claude Haiku 4.5 is $1 input and $5 output. Anthropic cut Opus pricing 66% in early 2026, making frontier-tier reasoning much more accessible. Prompt caching can drop input costs another 50–90% on repeat queries.

How much does GPT-5 cost?

GPT-5 is $1.25 per million input tokens and $10 per million output tokens. GPT-5 mini is $0.25 input and $2 output — about 5× cheaper. OpenAI's frontier line was rebranded from GPT-4 to GPT-5 in 2025. Cached input pricing is roughly 50% of standard input.

What is the cheapest LLM API?

DeepSeek V4 Flash is currently the cheapest credible model at $0.14 per million input tokens and $0.28 per million output tokens — roughly 35× cheaper than Claude Sonnet 4.6 for input. Gemini 2.5 Flash and Claude Haiku 4.5 are the cheapest options from major US vendors. For open-weight models, Llama 3.3 70B via Together AI is competitive at the value tier. The right cheap model depends on your task — value-tier models handle classification and short replies well but struggle with complex reasoning.

How do you keep this pricing data accurate?

An automated audit runs daily. Small drift (under 25% on both input and output) is auto-applied and published the same day. Larger changes — re-pricings, model deprecations, tier consolidations — go through human editorial review against the live vendor pricing page before publication. Vendor pages remain the canonical source of truth; this dataset is a clean machine-readable mirror, not a substitute. The exact verification date is shown on this page and embedded in the public dataset's JSON.

Can I use this pricing data in my own product?

Yes. The full dataset is published as pricing.json in a public GitHub repo (lucaspowell8020/ai-agent-pricing) under CC-BY-4.0. You can fetch it directly from the raw GitHub URL, no API key required. Use it in commercial products, comparison tools, calculators, or blog posts — attribution to agentshortlist.com is appreciated but not required by the licence.

Why not just check the vendor pricing pages directly?

You can, and you should verify before committing budget. The reason this dataset exists: vendor pricing pages disagree on units (some quote per token, some per thousand, some per million), use struck-through old prices for SEO, and change format frequently. Aggregating into one normalised file makes side-by-side comparison and programmatic use possible. We treat the vendor pages as authoritative — this dataset is a clean, machine-readable mirror.

How does AI API pricing actually work?

Nearly every LLM API charges per token — small pieces of text the model reads (input) and writes (output). A token is roughly 3/4 of an English word. Pricing is usually quoted per million tokens, split between input and output rates. Output tokens are 2–6× more expensive than input across most vendors. A simple chat turn (1,000 input + 200 output tokens) on Claude Sonnet costs about $0.006. Subscription tiers (Claude Pro, ChatGPT Plus) bundle a fixed amount of usage with a flat monthly fee — useful for individual usage, not for production agent traffic where direct API pricing is correctly sized.

What are typical API pricing tiers?

Most frontier vendors offer three tiers: frontier (best reasoning — Claude Opus 4.7, GPT-5.5, Gemini 3 Pro), balanced (the sweet spot for most production work — Claude Sonnet 4.6, GPT-5.4, Gemini Flash), and value (cheap and fast for high-volume mechanical work — Claude Haiku 4.5, GPT-5.4 mini, Gemini Flash-Lite). Cross-tier price gaps are large: balanced is typically 5–10× cheaper than frontier; value is another 3–5× cheaper than balanced. Most teams over-spec by defaulting to frontier — picking the right tier is the single biggest cost lever in production agent workflows.

Is this an AI API pricing tracker?

Yes — this page is updated daily against vendor sources, with prices verified against an industry catalog before publication. Small drift (under 25% on input and output) is auto-applied the same day; larger changes go through human editorial review. The dataset is also published as an open JSON file you can fetch programmatically — see the 'Open dataset' section below. If you're building a comparison tool or calculator, the raw JSON is the path; if you're just checking prices for budget planning, this page is the path.

What's the difference between AI cloud pricing and direct API pricing?

Direct API pricing is what you pay the model vendor (Anthropic, OpenAI, Google) per token. Cloud pricing is what you pay a cloud provider (AWS Bedrock, Azure OpenAI, Vertex AI) for the same models, typically with a 10–30% markup in exchange for unified billing, identity management, and compliance certifications. For most builders, direct API is cheaper and simpler. For enterprises with strict procurement, identity, or compliance requirements, the cloud markup is the cost of doing business. The tier rates we publish on this page are direct API rates.

Related