The New KPI Set: Eight Metrics That Matter

2026-05-07

Section 3 · Managerial PivotA metric panel you can adopt this quarter

Eight KPIs every agentic workflow should report.

Four families, two metrics each. Quality and cost are obvious; risk and health are the ones most teams skip — and the ones that decide whether an agent ages well.

Quality

Goal Completion Rate

What: Share of runs where the agent reached the goal without escalation

Why: The headline number. If this is below baseline, nothing else matters.

Healthy looks like

Above the human baseline for the same workflow, measured on the same evaluation set.

Quality

Outcome-vs-Intent Score

What: Independent grader scoring whether the outcome served the user's actual intent

Why: Catches the agent that satisfies the literal goal in the wrong way.

Healthy looks like

Steady or improving. A drop here that completion rate doesn't show is a serious signal.

Cost

Cost per Successful Outcome

What: Total spend (tokens, tool calls, infra, evaluators) divided by successful outcomes

Why: The economic unit. The only number that says whether the agent is worth running.

Healthy looks like

Trending down over time as prompts, models, and tools improve.

Cost

Trajectory Length

What: Median number of model calls, tool invocations, and tokens per run

Why: A leading indicator for cost. Spikes here precede cost spikes by days.

Healthy looks like

Stable or compressing. Sudden growth means something changed — investigate.

Risk

Intervention Rate

What: Share of runs that required a human override, escalation, or rejection

Why: A workflow with rising interventions is regressing — even if completion rate looks fine.

Healthy looks like

Low and stable. A cliff downward is suspicious; a creep upward is a problem.

Risk

Blast-Radius Incidents

What: Count of agent actions that required rollback, customer communication, or remediation

Why: The number a regulator or a board will ask for. Track it from day one.

Healthy looks like

Tracked by zone. Zone 2–3 incidents reviewed individually; Zone 0–1 reported in aggregate.

Health

Drift Index

What: Statistical distance between today's outputs and a fixed historical reference set

Why: Silent quality degradation is the most-cited cause of agent failures in production.

Healthy looks like

Within a defined band. Crossing the band auto-creates a review ticket.

Health

Approval Latency

What: Time from agent proposal to human decision in human-in-the-loop workflows

Why: If approvals take too long, reviewers rubber-stamp. If they happen too fast, reviewers aren't reading.

Healthy looks like

In a healthy band — fast enough not to bottleneck, slow enough that reviewers actually read.

Reporting cadence: quality and cost weekly. Risk monthly to the executive team. Health continuously, with alerting. Aggregate the panel into a single one-page dashboard the agent owner reviews daily.