Privileged access “break-glass” design
Design a last-resort path to restore control without creating a standing backdoor.
What “break-glass” means
Emergency-only elevation used when normal privileged access paths are unavailable or unsafe. It restores command and control during outages, ransomware, or identity provider failure.
Design goals
- No standing trust or always-on accounts
- Out-of-band credential custody
- Two-person integrity for activation
- Shortest possible duration and scope
- Works when SSO, MFA, or PAM is down
- Offline-verifiable procedures
- Documented runbooks, practiced often
- Clear ownership and paging
Reference architecture
- Accounts: Pre-provision disabled emergency admin roles in each critical domain (cloud roots, IdP break-glass, hypervisor mgmt, backups, CI/CD).
- Credentials: Store split knowledge secrets in two independent HSM-backed vaults. Require quorum to reconstruct.
- Devices: Keep two clean-room laptops with full-disk encryption, no EDR dependency, and offline configuration media.
- Paths: Separate “resilience plane” network with bastion, dedicated DNS, and restricted egress. No shared creds with production.
- Controls: Activation via out-of-band comms, change ticket, and time-bound policy that auto-expires.
- Evidence: Append all steps and decisions to an immutable log. Anchor daily hash externally.
Activation flow
| Step | Who | Control | Output |
|---|---|---|---|
| Declare emergency | Incident Commander | SEV-1 playbook, ticket, paging | Incident ID |
| Authorize use | Two executives | Two-person approval, callback verification | Signed approval record |
| Reconstruct secret | Custodians A+B | Separate vaults, audit cameras | Time-limited credential |
| Enable account | Responder | Just-in-time role, max TTL 60–120 min | Active session |
| Perform action | Responder | Screen capture, command logging | Change set applied |
| Revoke and rotate | Responder | Auto-expire, rotate secrets, disable account | Access removed |
| Review | IR + GRC | Immutable log, lessons, improvements | Postmortem |
Technical controls checklist
- Break-glass roles pre-scoped to least privilege
- Default disabled, activation via policy toggle
- No federation dependency for login path
- Out-of-band MFA tokens stored sealed
- Quorum-based retrieval (e.g., 2-of-3 shares)
- Short TTL passwords or ephemeral keys
- Immediate rotation after use
- No reuse across planes or tenants
- Dedicated bastion with allowlist rules
- Command logging on bastion and target
- Break-glass security group with time lock
- Emergency DNS and IdP fallback documented
- Immutable log of approvals, tokens, and commands
- Out-of-band alert to executives on activation
- Video capture on custodian stations
- Daily hash anchored to public chain or TSA
Testing cadence
- Monthly tabletop: IdP outage and ransomware scenarios
- Quarterly live test: full path from approval to revoke
- Credential fire drill: recover from vaults without production SSO
- Evidence check: verify the ledger and external anchors
Plain words
Keep one safe door for bad days. Two people must agree to open it. It closes itself fast. Everything is written down where no one can quietly change it. Practice until it is boring.
{
"event": "break_glass_activation",
"incident_id": "INC-2025-10-21-042",
"scope": ["aws:org:prod-landing", "idp:admin-fallback"],
"reason": "idp-outage",
"approved_by": ["CISO", "VP-Engineering"],
"credential_source": ["vault-a:share1", "vault-b:share2"],
"ttl_minutes": 90,
"bastion": "resilience-bastion-01",
"timestamp": "2025-10-21T17:05:12Z",
"prev_hash": "b5f1...0a",
"sha3_256": "7a9c...55",
"sig": "dilithium3:base64..."
}
FAQ
Why not rely only on PAM and JIT?
How many break-glass accounts should exist?
What if attackers force activation?
How long should access last?
Takeaway: one narrow, testable emergency path. Two-person control. Short life. Full evidence.