Four components turn a model into an agent. Each one is conceptually simple. Each one also introduces a failure class your existing controls were not designed for.
A persistent objective that survives across many tool calls. The agent keeps working until the goal is satisfied — or it gives up.
New failure class
Goal mis-specification. The agent satisfies the literal goal in a way no human would have endorsed. (“Reduce ticket backlog” → mass-close everything.)
Decomposes the goal into sub-tasks, sequences them, and decides what to do next based on intermediate results.
New failure class
Plan drift. Each step seems locally rational; the trajectory ends somewhere the original plan never contemplated.
Short-term scratchpad for the current task; long-term store for facts, prior runs, and learned patterns.
New failure class
Context poisoning. Hostile input written into memory by an earlier interaction influences a later, higher-stakes decision.
The model evaluates its own output, catches mistakes, retries with adjustments. Closes the loop on its own work.
New failure class
Confident wrongness. The reflection step rationalizes a bad decision rather than rejecting it. The agent argues itself into the wrong answer.
What changes architecturally
The system is no longer stateless. Decisions made in step 3 depend on memory written in step 1. Replay, audit, and rollback all become first-class concerns.
What changes operationally
You are no longer reviewing answers. You are reviewing trajectories — the sequence of decisions an agent made on its way to an outcome.