Section 2 · Trust ArchitectureIf you only build one thing, build this
Four levels of stop-the-bleeding.
A kill switch is not a button on a wiki page. It is a designed control with a defined scope, a defined trigger, a defined operator, and a defined recovery path. You need all four levels.
Level
Control
Scope
Trigger
Who can pull it
Recovery path
L1
Action Veto
A single proposed action
Pre-flight validator says no, or human reviewer rejects in the loop
Validator (auto), reviewer (manual)
Agent receives the rejection as feedback, can re-plan
L2
Trajectory Halt
A single agent run / one goal
In-flight anomaly: token budget blown, repeated tool failures, plan drift detected
Runtime auto-trip, on-call operator
Run is terminated; partial state is rolled back where possible, escalated otherwise
L3
Capability Disable
A specific tool or class of action, across all agents
Downstream system unhealthy, exploit suspected, regulatory event
Engineering on-call with documented runbook
Tool returns a deterministic “unavailable” to all agents until cleared
L4
Global Pause
All agentic activity in the organization
Material incident, suspected systemic prompt injection, board-level pause
Defined emergency authority — typically dual control: CISO + a designated executive
Coordinated restart with documented post-incident review before re-enable
Properties every kill switch must have
Out-of-band
Cannot be disabled by the agent it controls. Different code path. Different credentials.
Tested
Pulled in a game day at least quarterly. An untested kill switch is decoration.
Observable
When tripped, every dependent system and every operator knows immediately.
Reversible
A defined re-enable procedure with sign-off. Pulling the switch must not require a re-deploy to undo.