Monday, June 4, 2007

Mitigation solves problems

Technologists love solving problems, that's how we are wired. But not all problems can be resolved, or at least not in a timely manner. Mitigation in technology terms means alternatives for a when a problem occurs, reducing or completely removing business impact. I find mitigation to be an under-used tool.

By my own definition, an outage is a system incident which negatively impacts my business, either staff or customers. If an outage occurs which has no impact, then it was never an outage, and will be reclassified as just an incident (significantly less severe than an outage). A clear example of this is our RAID systems, where a disk fails but the outage has no impact. In this case, the "R" in RAID stands for redundancy. Unfortunately, redundancy across all systems is extremely expensive and complex.

I've often asked my teams not to focus on problem-solving, but instead focus on mitigation. If system X fails, how can we get services back up without fixing the problem? Doing this in advance will create a level of preparedness for the inevitable system issues occur.

There's another artifact of mitigation that's less obvious:- Mitigation takes the pressure off a situation. An outage that has no recourse is high stakes, high pressure, and not the best conditions for you and your team's optimal performance. Mitigation can then become the safety net as the team performs their high-wire act. For me, mitigation solves problem.

No comments: