Article · foundations

Director vs doer: the mindset shift that separates working AI agents from broken ones

Stop prompting. Start directing. The mindset change builders need to make once they move from chatbots to agents, and the practices that come with it.

By Lucas Powell·May 17, 2026·8 min read·1,735 words

Most of what was written about "prompt engineering" in 2023 and 2024 is actively misleading advice in 2026. It taught builders to perfect a single prompt as if the model were a vending machine: drop in the right tokens, get the right output. That worked for chatbots. It's broken thinking for agents.

The shift that separates builders who get value from AI agents from builders who don't is a mental one: stop being the doer and start being the director. Stop optimizing prompts; start engineering context. The work isn't writing the perfect instruction — it's building the right environment for the agent to operate in.

Here's what changes when you make the shift, and the practices that come with it.

What the doer mindset gets wrong

When you treat an agent like a chatbot, you optimize for the single best instruction. You workshop the prompt. You add few-shot examples. You spend an hour tuning the system message. You get an output. You either accept it or workshop the prompt some more.

This approach works fine for one-off tasks. It collapses when you're trying to build something that runs unattended. Three failure modes show up immediately:

The prompt is fine for the demo task and broken for everything else. The agent works perfectly when given the example you tuned it on, then degrades on adjacent tasks you didn't think to test.
You're the bottleneck for every operation. Because you're driving each prompt manually, the "automation" is really just you typing faster.
Context resets every session. The agent has no memory of previous decisions, your house style, your team's standards, your acceptable failure modes. You're re-teaching every conversation.

These aren't problems you fix with better prompts. They're problems you fix by changing what you do.

The director mindset

A director defines the outcome and the standards, sets up the environment for the work to happen, reviews the result, and gives feedback that makes the next round better. They don't write the script word by word. They don't operate the camera. They make decisions about what counts as good enough and what doesn't.

For AI agents, this means:

You define the outcome and the quality bar, not the step-by-step instructions for how to get there
You load the agent with the context it needs, your organization's voice, your acceptable patterns, your house rules — so it doesn't have to be retold every session
You review work, not method — focus on whether the output is right, not whether the process matched what you would have done
You build evaluation criteria that let you catch regressions automatically rather than checking every output by hand
You write reverse prompts — instructions to the agent about what kind of output it should produce — instead of forward prompts that explain step by step what to do

The mental shift is from "how do I get this single output right?" to "how do I make the production environment for this kind of output reliable?"

Context engineering, the practice

The discipline that replaces prompt engineering is context engineering. Instead of trying to encode everything the agent needs in one prompt, you build a layered context system around it.

The layers, roughly:

1. The system prompt — short, identity-level

What the agent is, who it's serving, what tone and standards it works to. Not a manual. Not a step-by-step. A persona file. The point isn't to tell the agent what to do — it's to tell the agent who it is when it decides what to do.

A good system prompt is 100–300 words and almost never changes. A system prompt that keeps growing is a system prompt that's doing the wrong job. Move the growing parts to the next layer.

2. Operating procedures — modular, retrievable, evergreen

Specific knowledge the agent needs for specific tasks. Customer-support tone guides. House style rules for writing. Decision trees for routing tickets. Reference docs for the product.

These don't go in the system prompt. They get retrieved on demand when relevant, either through a skills system or a retrieval layer (RAG). Storing them inline in every prompt wastes tokens and confuses the agent on tasks where they don't apply.

3. Task context — situational, generated fresh each run

The actual data the agent needs for this specific task: the customer's ticket, the document being summarized, the codebase the agent is editing, the user's current state.

This is what makes each agent run different from the last. Most teams put 80% of their attention on this layer because it varies the most.

4. Feedback memory — accumulating over time

The pattern most teams skip and the one that separates a static agent from one that gets better. Agents need a way to encode "we tried this approach last week, here's what didn't work" without you having to repeat it every session.

Skills files written by you, summaries of past decisions written by the agent, post-task retrospectives, all of these are forms of feedback memory. They turn "I tell the agent everything every time" into "the agent already knows."

We covered the architecture in AI agent skills and memory.

What this looks like in practice

A customer-support reply agent built with the doer mindset and the director mindset, side by side:

Doer-built version:

A single 2,000-word system prompt with every house rule, every escalation policy, every product fact crammed in
Builder writes a new prompt template for each ticket category
Output reviewed every time before send, because the builder doesn't trust the agent's judgment in general
After 30 days: the prompt has grown to 4,000 words, half of which is obsolete; the agent confuses two of the ticket categories; the builder spends as much time managing the agent as they used to spend writing replies

Director-built version:

A 200-word system prompt establishing the agent's role and voice
House rules and tone guides stored as retrievable skills files; agent loads only what's relevant per ticket
A small set of evaluation criteria (rubric: does the reply solve the problem, does it sound like our brand, does it include the right next step?) that lets the builder review 10 replies at a time instead of one at a time
Failure modes get added to the skills file as they're discovered; agent improves over time without prompt-by-prompt babysitting
After 30 days: setup is stable, builder reviews a sample weekly, agent handles 90% of tickets without intervention

The work the second builder did wasn't more skilled. It was different. Less typing-the-perfect-prompt; more building-the-right-system.

The four habits that come with the shift

If you're trying to make the change in your own practice, four habits do most of the heavy lifting:

1. Define the output, not the process

Stop telling the agent "first do X, then Y, then Z." Tell it what the result should look like. "Output a reply in the following structure, hitting these criteria." The agent figures out the steps. This is harder than it sounds because most builders are over-specifying out of anxiety, not necessity.

2. Write evaluation criteria before you write the prompt

Before you draft the system prompt for a new agent, write down what would make its output "right." Three to seven concrete checks. Now your iteration loop is: try a prompt, run it against examples, see how many checks pass, improve the prompt, repeat. Without the checks you're tuning by vibes.

3. Build a feedback loop into the agent itself

Every time you correct the agent, ask whether the correction is a one-off or a pattern. If it's a pattern, write it into a skills file so the agent doesn't make the same mistake next time. This is the single biggest leverage point in director-mode work, your corrections compound instead of evaporating.

4. Stop reviewing every output

Once you have evaluation criteria and a feedback loop, you don't need to read every output. Sample 10% and check that the criteria are still being met. The other 90% goes through. If quality drops, you'll see it in the sample and you can investigate.

This step is where most builders get stuck. The instinct is to "just check one more", but at scale, that instinct is what prevents the agent from actually saving you time. Trust the sampling, build better evaluation, and let go of full review.

When the doer mindset is still right

The doer mindset isn't always wrong. Two cases where it's correct:

One-off tasks. You're not building automation, you're using AI to do a single thing once. Workshop the prompt, get the output, move on. Most of the chatbot-shaped use of AI lives here, and that's fine.
Building the first version of a workflow. Before you can direct, you have to understand. The first dozen runs of a new agent are reasonably done in doer-mode while you learn what the failure modes are. Then you switch to director-mode and build the system around it.

The mistake is staying in doer-mode after you've learned what you needed to learn. That's where most builders plateau.

What director-mode unlocks

When you make the shift, three things change at once:

The agent gets better over time instead of staying constant or degrading. Feedback loops compound.
You stop being the bottleneck for every run. Sampling and evaluation replace per-output review.
You can run more agents. The capacity that prompt-tuning ate is now available to direct a portfolio of agents instead of one.

The builders who are getting real productivity gains from AI in 2026 are almost all in director-mode. The ones who are still struggling are almost all in doer-mode, often without realizing there's a different way to work.

If you're trying to figure out where to start, the ARR framework helps you pick which tasks to direct first. The skills and memory article covers the feedback-loop architecture. The picker recommends platforms that support director-mode workflows (skills files, persistent memory, evaluation tooling) by default.

The vocabulary will keep shifting — "prompt engineering" was the right term in 2023, "context engineering" is more accurate in 2026, something else will replace it in 2027. The underlying shift is the one that lasts: stop being the doer, start being the director. That's the only mindset change that actually matters.

About the author

Lucas Powell

Founder, Growth 8020 · Editor, Agent Shortlist

Founder of Growth 8020, an AI-first B2B marketing studio. Editor of Agent Shortlist — the publication he wished existed when his team had to pick AI tools.

Full bio →Growth 8020 ↗GitHub ↗

Liked this one? Get the next.

One issue every two weeks. New reviews, tools I've built, and one interesting thing shipped by someone else. Unsubscribe in one click.

← All articles