Your Documentation Strategy Cannot be "Ask Ben"

Posted on Apr 2, 2026

From my new series, IT Things We Pretend Are Fine. A series about the everyday IT problems organisations quietly accept until they become outages, audit findings, or incidents.

In IT, some problems are obvious. Others become so familiar that we stop seeing them. This series is about the second kind.

There are few phrases in IT more unsettling than, “Ben knows how that works.”

Not because Ben is incompetent. Usually it is the exact opposite. Ben is usually brilliant. The problem is that Ben is not a documentation platform, not a configuration database, and ideally should be allowed to take annual leave without the organisation entering a state of mild operational panic.

Most teams do not decide to rely on tribal knowledge. It is something that happens gradually. A system gets built quickly, a workaround is remembered instead of written down, someone becomes “the person” for a service, and before you know it an important part of the environment is being supported purely by memory, vibes, and the faint hope that nobody leaves unexpectedly.

For a while, this feels brilliantly efficient. It even feels normal.

But hidden inside that normality is risk.

When knowledge lives in people rather than systems

When knowledge lives in people rather than systems, a few things start to happen.

Incidents take longer to resolve

Because the first challenge is not fixing the issue. The first challenge is working out how the thing is supposed to work in the first place.

That is always a bad sign.

An outage should not begin with digital archaeology. Nobody should be squinting at an old Teams message from 2022 trying to work out why a scheduled task points at a server with a name like APP-OLD-02. And yet, here we are.

When documentation is weak, every incident contains an additional tax:

  • What does this actually do?
  • Who owns it?
  • What depends on it?
  • Has it always behaved like this?
  • Is this broken, or is it just badly understood?

That extra layer of confusion is where time disappears. And in IT, time is rarely alone. It usually brings disruption, frustration, and a deeply unnecessary meeting.

Changes become riskier

This one is exponential.

If nobody can clearly describe dependencies, ownership, or expected behaviour, then every change carries far more uncertainty than it should. A firewall rule change is no longer just a firewall rule change. It becomes a small act of faith. A server restart becomes a séance. A permissions tidy-up becomes an invitation to discover five undocumented processes and at least one finance spreadsheet that apparently talks to a line-of-business system using hope.

Good change management depends on visibility.

You do not need a 14-tab spreadsheet and a change advisory board that looks like the planning committee for a moon landing. But you do need enough documentation to answer a very basic question:

What is this likely to break?

If the honest answer is “not entirely sure”, that is not agility. That is just risk with a lanyard on.

Recovery gets weaker

Backups are wonderful things. Accidentally nuke a system? No problem, hit the snapshot restore and enjoy a brief moment of competence.

Except recovery is not just restoring data. Recovery is restoring service.

And if nobody knows:

  • what should be restored first
  • what settings matter
  • which credentials are required
  • which permissions need to exist
  • which services depend on which other services

then recovery can end up introducing a logic bomb into your infrastructure.

A restored system that nobody fully understands is just a fresh mystery with an uptime graph.

The uncomfortable truth is this: if your recovery plan depends on a person being available to explain how the recovered environment should actually behave, you do not have a recovery plan. You have a restoration ritual.

Business IT begins to run on vibes

In small organisations, this is normal and often manageable. Until it is not.

Until Ben is off sick. Until Ben is on leave. Until Ben has left the business. Until Ben is busy dealing with the other thing only Ben knows when something important breaks.

That is the point where everyone suddenly discovers that “Ben knows” was not a support model after all.

It was a single point of failure wearing a helpful smile.

And this is why organisations often pretend the issue is fine. The pain arrives gradually rather than all at once. There is rarely a dramatic moment where someone stands up and shouts, “WE DO NOT HAVE ANY DOCUMENTATION.” There is just a slow, almost elegant shift from “it’s written down somewhere” to “best ask Ben”.

That shift is quiet. But it is also dangerous.

Why this happens

It is worth saying this plainly: most IT teams do not end up here because they are lazy.

They end up here because:

  • they are understaffed
  • they are overloaded
  • delivery always feels more urgent than documentation
  • there is always another incident, another request, another migration, another “quick question”

And because IT is largely invisible until something goes wrong, documentation is one of the first things to be pushed into the fog. It does not shout. It does not usually break immediately. It just quietly gets deferred until it becomes part of the problem.

This is why “we really should document that” is one of the most common recurring themes in IT. It sits somewhere between aspiration and apology.

What not to do

Before we begin the journey of fixing the lack of documentation, let’s get one thing clear.

There is absolutely no point scheduling a “documentation day” in the calendar.

It will not work.

It will get pushed back, then pushed back again, and eventually it will disappear into the same administrative graveyard as:

  • “review shared mailboxes”
  • “tidy old groups”
  • “rationalise file shares”
  • “look at printer naming convention”

Likewise, there is no point writing a 300-page wiki nobody will ever read. That is not documentation. That is an archive of good intentions, much like how your SharePoint configuration looks. Amirite?

And no, the answer is not “let’s get a chatbot to do it”.

That is how you fail twice.

First, by not documenting things properly. Second, by automating the chaos.

Start here instead

If you want to fix this, start by identifying where the risk actually lives.

Ask these questions:

1. Which systems only one person really understands?

Be honest here. Not “who is best placed”. Not “who originally built it”. Which systems would cause real concern if only one person could explain how they work?

2. Which IT services would be difficult to recover without speaking to a specific individual?

If the answer is “quite a few”, you do not have a documentation problem. You have a resilience problem.

3. Which key processes rely on memory rather than written procedure?

This probably extends beyond the realm of IT, but that is not my horse nor my rodeo. Still, if your technical operations depend on unwritten habits, that is worth noticing.

4. Which parts of the IT estate are documented, but that documentation is not maintained?

Out-of-date documentation is its own special flavour of treachery. At least missing documentation is honest.

If you are honestly answering “more than a few”, the issue is not quality of documentation.

It is operational resilience.

And yes, that does matter from a security and assurance perspective too. Cyber controls have a habit of expecting you to know what you own, how it is configured, who has access to it, and how you would recover it when things go sideways. Funny that.

What good documentation actually looks like

Good documentation is not comprehensive for the sake of it.

It is useful.

That means it helps someone answer the following without needing to summon Ben from annual leave:

  • What is this?
  • What does it do?
  • Who owns it?
  • What does it depend on?
  • What depends on it?
  • How is access controlled?
  • How is it backed up?
  • How is it restored?
  • What are the known sharp edges?

That is enough to turn mystery into manageable risk.

You do not need literature. You need clarity.

How I plan to fix this

And now for the admission.

My IT documentation is not up to date.

Shock horror.

But guess what. Almost none of our documentation is. That is not unusual. It is just inconveniently true.

So the fix cannot rely on a grand burst of discipline that vanishes the moment real work turns up. It has to be easy enough to happen as part of the work itself.

That means I am not trying to create a giant documentation project. I am trying to make documentation feel as normal and lightweight as writing a status update.

My plan is simple:

1. Start with high-risk systems

Not everything at once. Start with the systems and services that create the biggest operational dependency. You probably already know these in your head, it’ll be your DC and your Gateway devices and a splash of weird internal IT services that were there when you arrived and nobody knows how they work. This is not unique to you, every IT department older than a decade has spooky systems that nobody touches, and are probably built with in-house code that isn’t in a repo.

Make a list of three of these and then move to step two.

2. Use a minimum viable format

Each service gets the basics:

  • purpose
  • owner
  • dependencies
  • access model
  • backup / recovery notes
  • known risks
  • support notes

I use VSCode for everything and commit my notes (written in Markdown because it is the only format that works everywhere) into a repo accessible by everyone in IT. This step is where you bring the rest of your team into the mix.

Make a template, explain the method behind the madness to your team, and then set the goal of starting (not completing) a one-page document on something that only they know how it works.

Then get them to commit it to your company’s infra repo. If you don’t have one, then you now have a few prerequisites on your hands. Because, if you don’t have one then chances are you aren’t in an organisation where IT has ever used or even heard of Git.

Time to add Git and GitHub (or GitLab, for the IT connisseur) to your team’s training plan.

3. Tie documentation to real events

Documentation gets updated when:

  • a change is made
  • an incident happens
  • a service is onboarded
  • ownership changes

If documentation is separate from work, it will lose every time. This is more about forming habits. If you had a battle on your hands to even introduce the concept of a functional helpdesk into your org, then the next battle was getting your team to leave notes on tickets.

The final boss is getting them to turn those notes into documentation.

4. Make it part of “done”

No change is fully complete if the environment is harder to understand afterwards than it was before.

5. Review what gets used

The best documentation is the documentation people actually use during change, support, and recovery. If nobody references it, it is probably decorative.

Final thought

There is nothing wrong with having brilliant people in IT.

The problem begins when brilliance becomes infrastructure.

If your documentation strategy is “Ask Ben”, the question is not whether that will become a risk.

The question is when.

And if that feels familiar, congratulations. You are not alone. You are just standing in one of the oldest traps in IT: mistaking familiarity for control.

If you’re on LinkedIn you can join the conversation on this article.