Signals
Operational Debt

Rising manual work and slowdowns signal mounting supervisory and compliance exposure.

Accumulated manual workarounds, undocumented procedures, and deferred maintenance. Unlike technical debt in code, operational debt lives in process. It is the gap between how a system is supposed to run and how it actually runs.

How it starts

A manual step is added to a deployment because automating it would take a sprint that no one can prioritize. A reconciliation script is run by one person who knows the correct sequence of steps. An alerting threshold is set so high that it never fires, because the team is overwhelmed by false positives. A runbook describes a procedure that worked two versions ago. Each compromise is small. The total is an operating environment that depends on specific people remembering specific things.

What it looks like

Symptoms that indicate operational debt is active.

  • Key operational procedures depend on one or two individuals who carry critical context.
  • Release cycles are slowing because each deployment requires manual coordination across teams.
  • Incident frequency is increasing while the systems themselves have not materially changed.
  • On-call rotations are unsustainable because troubleshooting requires deep institutional knowledge.
  • The team spends more time maintaining existing systems than building new capability.

Why it matters

Operational debt creates two forms of exposure. The first is people risk, when critical operations depend on individuals rather than systems, the departure of one person can destabilize production. The second is velocity risk, the organization cannot respond to market changes, regulatory requirements, or business opportunities because its infrastructure team is consumed by maintenance. In a supervisory context, operational debt is often the root cause of control failures that appear to be technology problems.

How we address it

We begin by identifying which manual processes carry the highest risk, not the most time-consuming, but the ones whose failure would be most consequential. These are refactored into standardized, observable components with explicit inputs, outputs, and failure modes. Runbooks are replaced with automation that generates evidence of execution. The goal is not to eliminate all manual work but to ensure that no critical process depends on undocumented human judgment.

Where we've seen this

We encountered operational debt in its most acute form in our ETRM infrastructure engagement, where critical daily processes depended on specific individuals executing undocumented sequences of manual steps. The departure of a single person could destabilize production operations. Our Infrastructure, Governance, and Aggregation mandates all include explicit remediation of operational debt.