← All posts

Shipping Faster with Agentic AI Workflows

The first wave of AI coding tools was a smarter autocomplete. The current wave is something different: an agent that plans, runs commands, edits files, checks its own work, and coordinates other agents to get a job done. Used well, agentic workflows compress hours of mechanical work into minutes. Used carelessly, they produce confident nonsense at scale. Here's how to get the upside — the patterns that work, where they pay off, and the guardrails that keep them safe.

From one prompt to an orchestrated team

A single AI chat turn is a soloist: one context window, one train of thought, everything competing for the same attention. An agentic workflow is an orchestra. A coordinator decomposes the task, spins up specialized sub-agents — each with its own fresh context, tools, and permissions — runs them in parallel or in sequence, and keeps the bulky intermediate work out of the main thread so only conclusions flow back. The modern agentic terminals make this a first-class feature (we compared two of them in Claude Code vs. OpenAI Codex CLI). The leverage comes from three things context windows alone can't give you: parallelism, isolation (a sub-agent's 50 messages of searching don't pollute the main context), and independent verification.

The patterns that actually work

A handful of compositions cover most real value. They're worth knowing by name because choosing the right one is most of the skill.

  • Fan-out / map. The same operation across many independent items — review 40 changed files, summarize 200 documents, migrate every call site of a deprecated API. Each runs in its own agent, in parallel.
  • Pipeline. Each item flows through stages (find → fix → test) independently, with no barrier between stages, so item A can be in stage 3 while item B is still in stage 1. Wall-clock time is the slowest single chain, not the sum of stages.
  • Adversarial verification. The highest-value pattern. After one agent produces a finding, spawn independent agents whose only job is to try to refute it — and keep it only if it survives. This is what separates "plausible" from "true," and it's how you stop a fleet of agents from amplifying a confident mistake.
  • Loop-until-done. For unknown-size discovery (find all the bugs, all the edge cases), keep spawning finders until several consecutive rounds turn up nothing new — rather than guessing a fixed count up front.
  • Judge panel. Generate several independent attempts at a hard design from different angles, score them with separate judges, then synthesize the winner while grafting the best ideas from the runners-up.

Notice the theme: the wins come less from a single super-smart agent and more from structure — fanning out for coverage, pipelining for speed, and verifying adversarially for confidence.

Where they pay off

  • Large-scale code changes — framework migrations, dependency bumps, codemods across hundreds of files, where the work is repetitive but must be applied carefully each time.
  • Research and codebase understanding — many readers sweeping a system in parallel, each from a different angle, synthesized into one map.
  • Review and auditing — independent passes for bugs, security, and performance, each finding then adversarially verified before it reaches you.
  • Content and data work — exactly the kind of multi-source research-and-synthesis that went into the posts on this site, run as a fan-out of researchers plus a fact-checking pass.

The guardrails are not optional

More autonomy multiplies both output and blast radius. An agent that can run shell commands can also delete the wrong thing — we catalogued exactly that in Security in the Age of AI. The discipline that makes agentic workflows safe is the same discipline that makes any automation safe:

  • Least privilege. Give each agent only the tools and scopes its task needs; default sub-agents to read-only and let the human or a narrow parent apply writes.
  • A human gate on irreversible actions. Deletes, deploys, force-pushes, infrastructure changes — require approval. Speed on the reversible stuff; a checkpoint on the rest.
  • Isolation. Run risky work in a sandbox or a throwaway branch/worktree so a bad step is contained, not catastrophic.
  • Verification over trust. Bake the adversarial check into the workflow; never merge a fleet's output unreviewed just because there's a lot of it.
  • Audit. Keep the trail — what ran, what it touched, what it decided — so you can answer "why did this change?" later.

Mind the cost

Parallel agents burn tokens in parallel. A workflow that spawns dozens of sub-agents can cost real money, so treat it like compute: scale the fleet to the task, use a cheaper/faster model for the worker agents and reserve the strongest model for planning and synthesis, and don't reach for a 30-agent swarm when a single well-aimed prompt would do. The goal is leverage, not theater.

When not to bother

Agentic workflows are overkill for small, well-scoped tasks — a quick fix, a single-file edit, a clear one-shot question. They shine when the work is large (many items), uncertain (you don't know how many issues exist), or needs independent confidence (high-stakes findings worth verifying). For everything else, the soloist is faster and cheaper. As with any tool: the skill is knowing when to pick it up.

The takeaway

Agentic AI is most powerful not as a smarter chatbot but as an orchestrator — fanning work out for coverage, pipelining it for speed, and verifying it adversarially for trust, all behind real guardrails. Teams that internalize the patterns (and the discipline) get a genuine multiplier on the repetitive, large-scale, and research-heavy work that used to eat days. Teams that skip the guardrails just make mistakes faster.

Related reading

← Back to all posts