Kill Switches & Circuit Breakers: The Technical Pattern

2026-05-07
Section 2 · Trust ArchitectureIf you only build one thing, build this

Four levels of stop-the-bleeding.

A kill switch is not a button on a wiki page. It is a designed control with a defined scope, a defined trigger, a defined operator, and a defined recovery path. You need all four levels.

Level
Control
Scope
Trigger
Who can pull it
Recovery path
L1
Action Veto
A single proposed action
Pre-flight validator says no, or human reviewer rejects in the loop
Validator (auto), reviewer (manual)
Agent receives the rejection as feedback, can re-plan
L2
Trajectory Halt
A single agent run / one goal
In-flight anomaly: token budget blown, repeated tool failures, plan drift detected
Runtime auto-trip, on-call operator
Run is terminated; partial state is rolled back where possible, escalated otherwise
L3
Capability Disable
A specific tool or class of action, across all agents
Downstream system unhealthy, exploit suspected, regulatory event
Engineering on-call with documented runbook
Tool returns a deterministic “unavailable” to all agents until cleared
L4
Global Pause
All agentic activity in the organization
Material incident, suspected systemic prompt injection, board-level pause
Defined emergency authority — typically dual control: CISO + a designated executive
Coordinated restart with documented post-incident review before re-enable

Properties every kill switch must have

Out-of-band

Cannot be disabled by the agent it controls. Different code path. Different credentials.

Tested

Pulled in a game day at least quarterly. An untested kill switch is decoration.

Observable

When tripped, every dependent system and every operator knows immediately.

Reversible

A defined re-enable procedure with sign-off. Pulling the switch must not require a re-deploy to undo.