What's New
What changed across AI agents this week.
Model releases with vendor-verified dates. Platform pricing and verdict changes from our daily audit. New editorial coverage. One feed, updated as it happens.
Weekly digest
Get the digest of what changed
Model releases, platform updates, and pricing shifts — summarised weekly. No spam, unsubscribe in one click.
This week
17 changes in the last 7 days.
Claude Code vs OpenAI Codex (2026): Which Coding Agent Wins
Claude Code or OpenAI Codex in 2026? Pricing, ChatGPT bundling, agent autonomy, IDE integration, and the specific tasks where each one wins after side-by-side testing.
Claude Opus vs Sonnet 4.6 (2026): When Each Model Actually Wins
Claude Opus 4.8 costs 1.6x more than Sonnet 4.6 but isn't always 1.6x better. The 70/20/10 routing pattern, specific task types where each wins, and the cost math at production volume.
Hermes — pricing
Free and open-source under MIT. You pay only for model API tokens (200+ models accessible through its marketplace integration — Claude, GPT, Gemini, DeepSeek, Kimi, GLM, local models) plus your own hosting. Hosting on a $5-$20/month VPS ha…
Hermes — verdict
The most technically sophisticated open-source agent harness in 2026. Server-deployed, model-agnostic, and the only platform with a genuine self-improvement loop that compounds over months of use. Right pick when you have technical capacit…
Claude Pro vs Max vs API: When Each One Actually Wins
Claude Pro at $20, Max at $100, or pay-per-token API — which one is right for your usage? Cost math at three usage tiers, the crossover points, and the hybrid pattern most serious users run.
Hermes vs Aider: when each one is the right pick
Hermes and Aider both run open-source and model-agnostic, but they answer different questions. The decision rule, the cost math, and when each is the right pick.
Hermes vs Cline: which open-source agent fits your workflow
Hermes and Cline are open-source and model-agnostic but built for different work. The honest decision rule, cost math, and when each is the right pick.
Hermes vs OpenHands: the open-source agent comparison that matters
Hermes and OpenHands are both top-tier open-source agents — but they're built for different jobs. Decision rule, cost math, and when each is the right pick.
How to Avoid Hitting Claude Usage Limits (2026 Builder's Guide)
The patterns that let serious builders use Claude 3-5x more without hitting limits. Prompt caching, model routing, subagents, hooks, and when to move work to the API.
Windsurf — pricing
Free tier with meaningful daily usage — generous enough for individual evaluation, no credit card required. Pro at $15/month unlocks unlimited Supercomplete and higher Cascade usage. Teams at ~$30/user/month adds collaboration and admin co…
Windsurf — verdict
The fastest AI IDE for developers who want autonomous execution baked into the editor. Cascade and Flows handle multi-file refactors and goal-driven work end-to-end. OpenAI-acquired and actively developed. The right pick when speed and IDE…
Evals as PRDs: How AI Teams Are Replacing Specs With Tests
Evals are becoming the new PRD for AI agent development — quantifiable tests that define 'done' for coding agents. The frameworks, the patterns, and the failure modes.
How to Create an AI Agent: A Tested Builder's Guide (2026)
How to create an AI agent in 2026: four real paths from no-code to fully custom, with the platform we'd pick for each, time to first agent, and what it actually costs.
Loop Engineering: How to Design Self-Prompting AI Agents
Loop engineering, the four patterns that turn AI agents from manual chat tools into autonomous systems. Heartbeats, crons, hooks, and goals, with the platforms that ship each.
Multi-Agent AI: When to Use It, When to Skip It, What Actually Works
Multi-agent AI compared honestly, the three patterns that work in production, the four that don't, and the cost math that decides which is right.
Kilo Code — pricing
Free tier (Kilo Auto — no credit card, no token limit on the free model tier). Paid plans run $20–$50/month for higher-tier model access. BYOK is the recommended path for serious use: connect your own API key to any of 500+ models via Kilo…
Kilo Code — verdict
The best-priced agentic coding tool for developers who need JetBrains support or want to switch models mid-session. Apache 2.0, BYOK, multi-IDE. The right pick when Cursor's editor lock-in and Claude Code's model lock-in both bother you.
Earlier (last 60 days)
23 prior changes.
GLM 5.2 released
z.ai · $0.98/$3.08 per 1M tokens · 1.0M context · balanced tier
Kimi K2.7 Code released
Moonshot AI · $0.612/$3.069 per 1M tokens · 262k context · value tier
The best AI coding agents in 2026
The 12 AI coding agents builders deploy in 2026, Claude Code, Cursor, Cline, Windsurf, Aider, Amp, and more. Honest verdicts on which to pick for which work.
The best AI voice agents in 2026
The four voice AI platforms builders deploy in 2026 — Retell, Vapi, Bland, ElevenLabs. Pricing, latency, languages, and which to pick for which use case.
The best no-code AI agent builders in 2026
The no-code AI agent platforms non-technical builders ship with in 2026 — Lindy, Relevance AI, Stack AI, Manus. Pricing, capabilities, which to pick.
Claude Fable 5 released
Anthropic · $10/$50 per 1M tokens · 1M context · frontier tier
Claude Opus 4.8 released
Anthropic · $5/$25 per 1M tokens · 1M context · frontier tier
Claude Opus 4.8 Fast released
Anthropic · $10/$50 per 1M tokens · 1M context · frontier tier
Grok Build 0.1 released
xAI · $1/$2 per 1M tokens · 256k context · value tier
Gemini 3.5 Flash released
Google · $1.5/$9 per 1M tokens · 1M context · balanced tier
The ARR framework: which tasks should you actually give to an AI agent?
A short mental model for deciding which tasks belong with an AI agent and which don't. Three letters. Autonomous, Recurring, Reviewable. Skip the rest.
Director vs doer: the mindset shift that separates working AI agents from broken ones
Stop prompting. Start directing. The mindset change builders need to make once they move from chatbots to agents, and the practices that come with it.
Hermes vs Cursor: a comparison nobody else makes, and why it matters
Hermes and Cursor get compared by people who don't know they're different categories. What each does, why the question matters, and which to pick.
How much does it cost to build an AI agent in 2026?
AI agent development costs in 2026: no-code ($30–$300/mo), low-code ($50–$300/mo), custom builds ($2k–$50k first month). Cost-per-task, hidden items.
The lethal trifecta: the AI agent security trap nobody warns you about
Three capabilities safe alone, catastrophic combined: private data, internet access, untrusted input. How the AI agent security trap works and how to break it.
Self-hosted AI is bigger than you think
Three of the top productivity AI tools by usage are self-hosted and open source. That's not the narrative. Here's the usage data behind it.
Why two open-source agents quietly own the productivity category in 2026
Two open-source agents own 95% of productivity tokens on the public model-marketplace leaderboards. Here's why the market concentrated so fast, and what it means for builders.
Grok 4.3 released
xAI · $1.25/$2.5 per 1M tokens · 1M context · frontier tier
OpenClaw vs Hermes: which open-source agent should you self-host in 2026?
OpenClaw vs Hermes head to head, the two open-source agents builders actually run. Trade-offs that matter and the usage data behind the choice.
Skills vs MCP servers vs subagents: the architectural map for builders
Five concepts in Claude Code overlap with each other. Most explainers stop at definitions. Here's when to actually use which.
GPT-5.5 released
OpenAI · $5/$30 per 1M tokens · 400k context · frontier tier
DeepSeek V4 Flash released
DeepSeek · $0.09/$0.18 per 1M tokens · 128k context · value tier
What Claude Skills actually are (and why most people are getting them wrong)
Most builders think Claude Skills are saved prompts. The architecture is different, and it's the reason Skills are actually useful.
How this feed works
Three sources feed this page, all automated. Model releases come from our daily pricing audit cross-referenced with vendor announcement pages. Platform updates come from our platform-change detector, which compares today's review against yesterday's and surfaces changes to pricing or verdict. New articles come from this site's editorial cluster.
Nothing here is hand-curated — the same daily-audit infrastructure that runs the open pricing dataset and the platform reviews also publishes this feed. The point: a single trusted place where you can see what changed without hunting across 27 vendor pages.
Weekly digest
Get the weekly summary by email.
One email per week with the model releases, pricing changes, and platform updates that landed since the last digest. Free, unsubscribe in one click.