Kill Switches & Circuit Breakers: The Technical Pattern

2026-05-07

Section 2 · Trust ArchitectureIf you only build one thing, build this

Four levels of stop-the-bleeding.

A kill switch is not a button on a wiki page. It is a designed control with a defined scope, a defined trigger, a defined operator, and a defined recovery path. You need all four levels.

Level

Control

Scope

Trigger

Who can pull it

Recovery path

Action Veto

A single proposed action

Pre-flight validator says no, or human reviewer rejects in the loop

Validator (auto), reviewer (manual)

Agent receives the rejection as feedback, can re-plan

Trajectory Halt

A single agent run / one goal

In-flight anomaly: token budget blown, repeated tool failures, plan drift detected

Runtime auto-trip, on-call operator

Run is terminated; partial state is rolled back where possible, escalated otherwise

Capability Disable

A specific tool or class of action, across all agents

Downstream system unhealthy, exploit suspected, regulatory event

Engineering on-call with documented runbook

Tool returns a deterministic “unavailable” to all agents until cleared

Global Pause

All agentic activity in the organization

Material incident, suspected systemic prompt injection, board-level pause

Defined emergency authority — typically dual control: CISO + a designated executive

Coordinated restart with documented post-incident review before re-enable

Properties every kill switch must have

Out-of-band

Cannot be disabled by the agent it controls. Different code path. Different credentials.

Tested

Pulled in a game day at least quarterly. An untested kill switch is decoration.

Observable

When tripped, every dependent system and every operator knows immediately.

Reversible

A defined re-enable procedure with sign-off. Pulling the switch must not require a re-deploy to undo.