Privileged Access - “Break-Glass” Design | Cyber.Irish | Secure Web Design, Cybersecurity & Networking

Privileged access “break-glass” design

Design a last-resort path to restore control without creating a standing backdoor.

TL;DR: Keep one narrow, offline-verifiable path for emergency admin access. Make it opt-in, short-lived, multi-party, and fully logged. Store credentials out of band. Test it every month. Kill it fast after use.

What “break-glass” means

Emergency-only elevation used when normal privileged access paths are unavailable or unsafe. It restores command and control during outages, ransomware, or identity provider failure.

Design goals

Security

No standing trust or always-on accounts
Out-of-band credential custody
Two-person integrity for activation
Shortest possible duration and scope

Reliability

Works when SSO, MFA, or PAM is down
Offline-verifiable procedures
Documented runbooks, practiced often
Clear ownership and paging

Reference architecture

Minimal viable pattern

Accounts: Pre-provision disabled emergency admin roles in each critical domain (cloud roots, IdP break-glass, hypervisor mgmt, backups, CI/CD).
Credentials: Store split knowledge secrets in two independent HSM-backed vaults. Require quorum to reconstruct.
Devices: Keep two clean-room laptops with full-disk encryption, no EDR dependency, and offline configuration media.
Paths: Separate “resilience plane” network with bastion, dedicated DNS, and restricted egress. No shared creds with production.
Controls: Activation via out-of-band comms, change ticket, and time-bound policy that auto-expires.
Evidence: Append all steps and decisions to an immutable log. Anchor daily hash externally.

Activation flow

Step	Who	Control	Output
Declare emergency	Incident Commander	SEV-1 playbook, ticket, paging	Incident ID
Authorize use	Two executives	Two-person approval, callback verification	Signed approval record
Reconstruct secret	Custodians A+B	Separate vaults, audit cameras	Time-limited credential
Enable account	Responder	Just-in-time role, max TTL 60–120 min	Active session
Perform action	Responder	Screen capture, command logging	Change set applied
Revoke and rotate	Responder	Auto-expire, rotate secrets, disable account	Access removed
Review	IR + GRC	Immutable log, lessons, improvements	Postmortem

Technical controls checklist

Identity

Break-glass roles pre-scoped to least privilege
Default disabled, activation via policy toggle
No federation dependency for login path
Out-of-band MFA tokens stored sealed

Credentials

Quorum-based retrieval (e.g., 2-of-3 shares)
Short TTL passwords or ephemeral keys
Immediate rotation after use
No reuse across planes or tenants

Access path

Dedicated bastion with allowlist rules
Command logging on bastion and target
Break-glass security group with time lock
Emergency DNS and IdP fallback documented

Monitoring & evidence

Immutable log of approvals, tokens, and commands
Out-of-band alert to executives on activation
Video capture on custodian stations
Daily hash anchored to public chain or TSA

Testing cadence

Drills that keep it real

Monthly tabletop: IdP outage and ransomware scenarios
Quarterly live test: full path from approval to revoke
Credential fire drill: recover from vaults without production SSO
Evidence check: verify the ledger and external anchors

Plain words

Keep one safe door for bad days. Two people must agree to open it. It closes itself fast. Everything is written down where no one can quietly change it. Practice until it is boring.

Minimal ledger entry — included for trust

{
  "event": "break_glass_activation",
  "incident_id": "INC-2025-10-21-042",
  "scope": ["aws:org:prod-landing", "idp:admin-fallback"],
  "reason": "idp-outage",
  "approved_by": ["CISO", "VP-Engineering"],
  "credential_source": ["vault-a:share1", "vault-b:share2"],
  "ttl_minutes": 90,
  "bastion": "resilience-bastion-01",
  "timestamp": "2025-10-21T17:05:12Z",
  "prev_hash": "b5f1...0a",
  "sha3_256": "7a9c...55",
  "sig": "dilithium3:base64..."
}

FAQ

Why not rely only on PAM and JIT?

PAM and JIT are first choice. Break-glass is for when those controls are down or compromised.

How many break-glass accounts should exist?

One per critical control plane. Keep them disabled and scoped tightly.

What if attackers force activation?

Two-person approval, out-of-band verification, and real-time executive alerts reduce that risk. Evidence trails enable fast response.

How long should access last?

As short as possible. Target 60–120 minutes with auto-expiry and forced rotation.

Takeaway: one narrow, testable emergency path. Two-person control. Short life. Full evidence.

Privileged Access - “Break-Glass” Design