Mason Gray
All insights
OperationsNewsletterWhat 800 AI Agents Actually TeAIThe Cost of Letting Small ProbThe Scaling Trap: When Adding

The Fire You Stopped Noticing (And the Structure That Slows You Down)

This week: why slow-burn operational friction is more expensive than the crises you respond to, and why adding process reactively during scale almost always makes alignment worse before it helps.

April 26, 2026|5 min read
Share

The Fire You Stopped Noticing

If the same problem shows up every week and nobody calls it a crisis anymore, that is the most expensive thing in your operation.

Not the outage. Not the client escalation. Not the quarter you missed. The thing that quietly absorbs hours every week, gets a workaround, and gets filed as "just how it works here."

That is the fire you stopped noticing. And it costs more than the fires you actually fight.


The Cost of Letting Small Problems Linger

The pattern most damaging to small and mid-size operations is not catastrophic failure. It is ignored slow-burn friction. Problems that never quite become a crisis, so they never get a fix.

If your team is spending 25% of their week on friction instead of output, you are paying for full-time employees who do nothing but keep broken systems running. On a 10-person team, that is 2.5 people. Every week.

The insidious part is normalization. When a fire burns long enough, it becomes "how we do things." A workaround becomes SOP. Two people become the informal connective tissue for every decision because the actual process was never defined. When one of those people leaves, you find out what the process actually was.

Here is how I separate a governance gap from a one-time problem: I ask whether the fix required a specific person or a repeatable step. If it always requires the same person, that is a governance gap. The fix lives in someone's head, not in the system.

If you want to find the most expensive problem in your operation right now, don't look at your dashboards. Ask your team what they work around. Every workaround is a flag.


Adding Process Is Not the Same as Adding Structure

The instinct when things break at scale is to add more: approvals, documentation, layers. That instinct is right that something needs to change. It is usually wrong about what.

Reactive process-adding is how scaling companies make alignment worse. More documentation and more approval steps layered on top of chaos do not fix coordination. They slow execution and push the original problem down one level.

The tell: a region has consistency problems, so leadership installs a new reporting layer. The region is still inconsistent. Now it is also slower. The layer documented the inconsistency; it did not fix the root cause.

The real question is whether you are designing for the next phase or reacting to the current one. Structure built for the next phase gets installed before the pressure arrives. It anticipates the coordination failures that happen when headcount doubles or a new service line launches. Structure added reactively almost always targets the symptom. An approval gate added because one bad decision got through does not fix the decision-making process. It just slows everything down, and the decisions that needed stopping still find a way through.

Add the smallest structure that solves the specific problem. Not a system designed to solve every possible future problem.


What 800 Agents Actually Tell You

GE Appliances announced 800-plus AI agents in production this week across manufacturing, logistics, and supply chain. The headline is the agent count. The real story is what had to be true before a single one went live.

Eight hundred agents running across interconnected workflows means every process those agents touch was documented, including the exceptions. Roles and handoffs were defined clearly enough to hand off to something that will not pause and ask for clarification. Inputs and outputs were consistent enough that an agent could act on them without producing noise.

Most services and field-ops companies are nowhere near that threshold. Not because the technology is out of reach, but because the process documentation is not there. The exceptions live in someone's head. The handoffs are informal. The inputs vary based on who submitted them.

Goldman Sachs Alternatives made an investment this week in a company specifically for its ability to handle exception-heavy workflows. The investment thesis was exception-handling capability. That is the hard part. The edge cases and informal paths that keep operations running. Agents break on those unless they have been mapped in advance.

The diagnostic question is not whether you should deploy agents. It is whether your processes are defined clearly enough that an agent could execute a step without human intervention and get it right.

Run that through your five highest-volume workflows. If the answer requires naming a specific person who "knows how it works, " you have more documentation to do before you have an AI problem to solve.


The Takeaway

This week, pick the one recurring problem your team works around most often. Write down the last three times it happened, who resolved it, and what they did. If the answer is the same person doing the same informal step every time, that is your governance gap. Fixing the ownership and documenting the step costs you an afternoon. Not fixing it costs you that afternoon every week indefinitely.

Hit reply if you are working through something like this. It is usually simpler to fix than it looks from inside it.

Get the next one

New articles on operations, AI, and building businesses that actually scale. No spam.