In the IT field, heroics are all too common. Staff commonly work long hours, completing upgrades after hours, pulling through the night, often just barely having things completed in time. A critical system goes down, and people work continuously until it’s fixed, missing sleep, meals, and time with family. While outages can never be fully prevented, and things will always break, if this happens often in your business it means one thing:

The processes in your business are broken.

The reality is, you don’t want heroic employees. The stress is higher, burnout is more common, and it means that everything is running less efficient than it should. This issue doesn’t strictly apply to IT, either. Does your bookkeeper routinely work though lunch and after hours to get invoices and paychecks out? Does your office manager have so many tasks that they take work home every night? The good news is that it can be fixed.

Whenever you have an event that requires heroic actions on the part of your employees, try to figure out why. After the dust has settled, sit down and do a postmortem. You may have to go through a couple iterations of asking “why” to get to the real root cause. Ex: Why did the staff have to work through the night? Because there was a change resulting in a system outage. Why was there a change that wasn’t fully tested/vetted? Because there are no change control procedures in place. Why is there no change control procedures? Because there isn’t buy-in from senior management, who doesn’t want to “waste the time”.

Often, what you’ll find is the root cause of the issue exists a few levels up from where the crisis occurred. After the root cause of the issue is defined, try and figure out how to prevent the issue form happening again. Do this every time, and through iterative improvements, everything will start to run smoother. After a while, you’ll find that heroics become less and less common, and “unexpected” issues are often easily preventable and predictable.