Agent Shortlist

Article · foundations

AI Agent Workflow Patterns: The Six Designs That Work in Production

Six proven AI agent workflow patterns — with real examples, cost shapes, and failure modes. Skip the theory. Build the one that fits your use case.

By Lucas Powell·March 9, 2026·10 min read·2,232 words

Most AI agent projects fail before they ship. Not because the model is wrong. Because the workflow design is wrong.

People reach for complex multi-agent architectures when a single-step loop would do the job. Or they wire up a research agent when what they actually need is a parser. Wrong pattern, wasted build time, flaky results in production.

There are six AI agent workflow patterns that actually hold up. Know them. Pick the right one.


Pattern 1: Classify-then-Route

The workflow: Agent reads the input — a support ticket, a lead form, a document — assigns it to a category, then routes it to the right handler or sub-agent.

Why it works: You run a cheap, fast model on the classify step. GPT-4o mini or Haiku for the routing decision; your heavy model only touches the complex categories. A support ticket system handling 10,000 tickets a month might spend $3 on classification and $40 on the 8% of tickets that need real reasoning.

Real example: B2B SaaS company routes inbound support tickets to five queues — billing, bugs, onboarding, feature requests, and "other." The classifier runs on a 3-sentence system prompt. The onboarding queue triggers a multi-step troubleshooting agent. The billing queue pings a human. Total cost per ticket: under $0.002.

One support team running this pattern processes 3,000 tickets a month. Haiku handles the classify step at $0.0005 per ticket — $1.50 total. Sonnet only runs on the 8% of tickets that score "complex". The misclassification rate is about 2%, caught in the weekly human review. The CFO approved the workflow when they saw the bill was $18/month.

Use our cost calculator to model classify-and-route spend before you build.

Failure mode: The classifier is undertrained. It collapses edge cases into the wrong category, and your expensive downstream agent runs on garbage. Fix it with a confidence threshold — anything below 0.85 routes to "human review."


Pattern 2: Extract-then-Act

The workflow: Agent reads a document, email, or web page. Extracts structured data. Writes it to a database, triggers an action, or hands off to the next step.

Why it works: This is the "agent as structured parser" pattern. It's reliable because you're not asking the model to reason — you're asking it to read and format. Consistent inputs produce consistent outputs.

Real example: Accounts payable team runs every inbound invoice through an extraction agent. It pulls vendor name, line items, amounts, due date, and PO number. Writes to Xero. Flags mismatches for human review. What took 12 minutes per invoice now takes 30 seconds. Error rate dropped from 4% to under 1%.

The pattern breaks when document quality degrades. One AP team found their extraction accuracy dropped from 94% to 71% when a vendor switched from PDF invoices to photographed paper ones. The fix was a pre-processing step to detect image-based invoices and route them to human extraction instead. Worth knowing before you go live.

Failure mode: Unstructured inputs break your schema. A PDF scanned at 300dpi with a crooked table will hallucinate line items. Build a validation step: if extracted totals don't match the sum of line items, route to human review automatically.


Pattern 3: Research-then-Synthesise

The workflow: One or more agents gather information from multiple sources — web search, internal docs, databases, APIs. A synthesis agent reads all of it and produces the final output.

Why it works: You parallelise the gathering. Five research sub-agents run simultaneously, each pulling from a different source. The synthesis step is where model quality matters most — use your best model here.

Real example: Competitive intelligence team runs a weekly report on three competitors. Research agents pull from their pricing pages, recent press releases, LinkedIn job postings, and G2 reviews. Synthesis agent writes a structured 500-word brief. What took a half-day of analyst time now runs overnight and lands in Slack every Monday morning.

One competitive intel analyst runs this weekly: three cheap Haiku agents scrape pricing pages, release notes, and job listings for five competitors. One Sonnet agent reads all three outputs and writes a two-page brief. The analyst reviews and adds judgment. Went from half a day to 40 minutes. The brief is better because the agent doesn't miss the job listings — the analyst always did.

For multi-agent coordination, see how AI agent orchestration handles the handoffs between gather and synthesise steps.

Failure mode: Research agents return conflicting data. The synthesis agent averages them instead of flagging the conflict. Build an explicit conflict-detection step: if two sources disagree on a material fact, surface both and let the human decide.


Pattern 4: Draft-then-Review

The workflow: Agent produces a draft. A second agent — or a human — reviews and approves before any output leaves the system.

Why it works: It makes agents safe in customer-facing roles. The drafter can be fast and cheap. The reviewer catches tone issues, factual errors, and brand violations before they hit a customer inbox.

Real example: E-commerce brand uses a draft-then-review loop for customer service replies. Draft agent handles 800 tickets a day. A review agent checks each response against a 12-point quality rubric. Anything below threshold gets flagged for a human agent. Human workload dropped 70%. Brand complaints about tone dropped 40%.

One content team tracks edit rate — how often the human materially changes the agent's draft before sending. They started at 65% edit rate. After three months of refining the skills file with tone examples and format rules, they're at 28%. That's the metric that shows whether your skills file is working.

Failure mode: Review criteria drift. You define the rubric once, it ages, and the reviewer starts approving things that would have failed six months ago. Schedule a monthly audit of flagged-then-approved tickets to catch drift early.


Pattern 5: Monitor-then-Alert

The workflow: Agent runs on a schedule. Reads a feed — transactions, brand mentions, metrics, logs. Flags anomalies. Pings a human only when something needs attention.

Why it works: Humans don't need to see the normal. They need to see the exceptions. This pattern scales a single person's attention across hundreds of data streams.

Real example: Fintech startup monitors 15,000 daily transactions for fraud signals. Agent runs every 15 minutes. Checks each batch against 22 behavioral rules. Pages the on-call analyst when it finds a cluster. Before: two analysts reviewed every transaction. After: the same two analysts handle six escalations a day and spend the rest of their time on higher-value work.

Alert fatigue is the death of this pattern. One team configured their monitoring agent with a threshold too sensitive — it was firing 40 alerts a day, mostly noise. Within a week, the team stopped reading the alerts. The agent was running. Nobody was watching. Calibrate the threshold before you assume the pattern is working.

Failure mode: Alert fatigue. The agent flags too many false positives, humans start ignoring the pings, and a real incident slips through. Tune aggressively. Track your false positive rate. If it's above 20%, the model's threshold is wrong.


Pattern 6: Plan-then-Execute

The workflow: Agent receives a complex goal. Breaks it into a sequence of steps. Executes each step. Handles failures and replans when something breaks.

Why it works: It's the only pattern that can handle genuinely open-ended tasks — "research this company and write a full investment memo" or "refactor this codebase to remove the deprecated API."

Real example: Engineering team uses a plan-then-execute agent to handle database migrations. Agent reads the migration spec, generates a step-by-step plan, runs each step against a staging environment, checks outputs, and flags steps that fail before touching production. Migrations that took a senior engineer three hours now take 40 minutes of human review time.

This is where frameworks earn their keep. LangGraph, CrewAI, and the newer options handle the state management and replanning logic you'd otherwise build yourself. If you're running multiple plan-then-execute workflows across a team, Paperclip handles the orchestration layer.

This is the only pattern where we'd say: don't build it yourself unless your team has engineered multi-step agent systems before. The failure modes compound. One agency built a Plan-then-Execute agent for client reporting — it planned correctly, executed step 1 correctly, then hit an unexpected API rate limit on step 3 and silently produced a partial report that looked complete. No error. No flag. The client received a report missing two sections. The fix was a completion-check step at the end. It took an incident to discover it was needed.

Failure mode: The plan is wrong and the agent executes confidently anyway. Add a plan-review step before execution starts. Show the plan to a human (or a critic agent) and require approval before any irreversible actions run.


Choosing the Right Pattern

Don't start with the most complex pattern you can imagine. Start with the simplest one that solves the problem.

Decision prompt:

  1. Does the input need to be sorted into categories? Start with classify-then-route.
  2. Does the input need to be parsed into structured data? Use extract-then-act.
  3. Do you need information from multiple sources before writing anything? Use research-then-synthesise.
  4. Is the output customer-facing or high-stakes? Add a draft-then-review layer.
  5. Are you watching for exceptions in a continuous data stream? Build monitor-then-alert.
  6. Is the goal genuinely open-ended — too complex for the first five patterns? Then reach for plan-then-execute.

Most production workflows are a combination of two or three of these — a classify-then-route outer shell with a research-then-synthesise core, for instance. But build the core pattern first. Composition comes later.

If you're building without code, Lindy handles patterns 1 through 5 well. For patterns that need custom logic, n8n gives you the workflow builder without locking you into a black box.

The teams who ship reliable agents aren't the ones with the most sophisticated architectures. They're the ones who picked the right pattern for the job and built it simply.

Pick the pattern. Build it lean. Add complexity only when the simple version breaks.

Frequently asked questions

What is an AI agent workflow?

An AI agent workflow is the sequence of steps an agent takes to complete a task — including how it ingests input, calls models or tools, makes decisions, and produces output. A simple workflow is one model call. A complex one involves multiple agents, conditional branching, retry logic, and human-in-the-loop checkpoints. The six patterns in this article cover the workflow shapes most production agents converge on.

What are the most common AI agent workflow patterns?

Six patterns cover roughly 95% of production deployments: classify-then-route (categorise input, send to the right handler), extract-then-act (parse structured data, do something with it), research-then-synthesise (gather from multiple sources, produce a unified answer), draft-then-review (write a candidate, have something check it), monitor-then-alert (watch a stream, flag exceptions), and plan-then-execute (decompose a goal into steps and run them). Most real-world workflows combine two or three.

How do I choose the right AI agent workflow pattern?

Start with the simplest pattern that could plausibly work — usually classify-then-route or extract-then-act. Only escalate to more complex patterns (research-then-synthesise, plan-then-execute) when you've validated that the simple version genuinely can't do the job. Building plan-then-execute when extract-then-act would have worked is the most common over-engineering mistake.

What's the difference between a workflow and an agent?

A workflow is a defined sequence of steps. An agent is a decision-maker that can choose which steps to run. Some workflows have no agent at all (deterministic pipelines). Some agents have no fixed workflow (autonomous exploration). Most production AI deployments are agents inside structured workflows — the workflow defines the bounds; the agent makes the in-bound decisions.

What's the simplest AI agent workflow pattern?

Classify-then-route. One model call categorises the input. A simple routing rule sends it to the right handler. No state, no multi-step reasoning, no failure modes beyond "the classification was wrong." If you're building your first agent, start here unless your use case explicitly needs something more complex.

Should I build a multi-agent workflow or a single-agent one?

Start single-agent. The most common failure pattern: building multi-agent orchestration before validating that any single agent works reliably. One agent doing one thing well is harder than it looks. Nail that, then add the second agent when you have a real coordination problem to solve. The orchestration article covers what to do once you cross that threshold.

How do I know if my AI agent workflow is failing in production?

Three signals: (1) cost is climbing faster than usage (workflow is calling expensive models unnecessarily — see model routing); (2) latency is creeping up (an agent or tool is slower than expected); (3) output quality is drifting (success rates on your eval set are dropping). All three are surfaced by agent observability — instrument it from day one, not day 90.

What to read next

About the author

Lucas Powell

Lucas Powell

Founder, Growth 8020 · Editor, Agent Shortlist

Founder of Growth 8020, an AI-first B2B marketing studio. Editor of Agent Shortlist — the publication he wished existed when his team had to pick AI tools.