Keep up-to-date with ITIL news. Low volume to-the-point bulletins...
ITIL in Practice from ITILnews.com
By Hank Marquis, www.itsmsolutions.com
 
The IT Infrastructure Library (ITIL) refers to Service or Systems Outage Analysis (SOA) as a method to improve availability. Unfortunately, the ITIL does not indicate how one actually performs SOA! This article explains the benefits of SOA, and gives you a 7-step guide to performing SOA.
 
The objective of SOA is to reduce the frequency and duration of outages while improving Mean Time To Repair (MTTR).
 
The result of SOA is clear exposure of the risk of future outages, as well as recommendations for improvement.
 
SOA is a powerful technique that requires no major investment in tools or training. The process is straightforward. Working with Problem Management and Customers, you examine past outages and identify related Configuration Items (products, people or process). Then you review the impact of the organization and infrastructure on availability.
 
To get started, collect outage data, and assemble a team of people familiar with the outages. Then, guide them through the 7 following steps.
 
1. Group related outages together; create groupings by vendor, product, family, application, customer, etc. Categorize each outage as "significant" or "less significant." Focus only on those labeled "significant" ; monitor the "less significant" for future outages. For each "significant" outage, review the root cause of the unavailability. For example, faulty hardware or software. This is probably already known since the outage is resolved.
 
2. Using a Pareto analysis (80/20 rule), rank the related outages and their causes. You will see that the majority of the outages result from a select few causes. Focus on the "80%" of the outages caused by the "20%" of the causes.
 
3. For each grouping of similar outages, examine the reasons for the duration of the unavailability. For example, the outage may have occurred because of faulty hardware or software; but the duration of the unavailability might have been extended by lack of tools, training, spares, etc.
 
4. Remember to consider the three "P's" - People, Product and Process, and review:
 
a. Existing procedures and support policies that were invoked or used during this outage.
 
b. The actions (or inactions) of staff members, customers and anyone else involved in the outage or restoration.
 
5. Try to determine if anything might have lessened the duration of the outage, or avoided it altogether. The examination should locate a trend, or at least something in common with similar outages. This is what you are looking for - the "smoking gun." An example might be the lack of a tool, process or similarly related item.
 
6. Quantify the avoidable outage time. That is, if one hour of downtime resulted from trying to locate the proper tool, then the avoidable outage time is 1 hour x the number of outages so affected. Identifying the most preventable downtime is your goal.
 
7. Prepare a Request for Change (RFC) to address the most significant generator of preventable downtime!
The end of the SOA is the creation of a report summarizing the number of outages analyzed and the report timeframe; listing of the avoidable outage time; and suggestions for improving or avoiding the outage.
 
Summary
 
When you are done, you will have a documented business case justifying a Change! Most importantly, SOA provides you a clear roadmap that shows exactly how to remove a significant source of downtime from your infrastructure.
 
 
 

4 VISITOR COMMENTS

2011-07-13 by "sumanmanoharan"

Am new to ITIL. This article is very useful.

2012-09-23 by "manjusha.jagtap"

Detailed Steps to look at managing outages. This article provides a simple but effective approach to Outage Management.

2016-10-12 by "ruth.quaile"

Is a maintenance window excluded from the calculation of an outage? If I have a planned change taking place in the maintenance window that will render the service unavailable is this an outage or not?
Reply on 2016-10-14
Yes, the maintenance window should be excluded from outage calculations as this window should have been agreed with the Customer.

If the change is a planned change then I would not count it as an outage as it is scheduled maintenance, again having been agreed with the Customer.

2020-02-05 by "surbhimehrotra16"

One of the best articles I have read and understood so far.

In depth, simple clear language

Please submit any comments you have about this article.

Your feedback will help add value to the content for other ITILnews.com visitors and help us develop the content for the benefit of all.

You will need to provide and verify your e-mail address but your personal information will not be published or passed on to others. To identify each post we take the part of your email address before the @ sign and use that as the identifier, so if you are john.smith@itilnews.com your post will be marked "by john.smith".

NB: We respond personally to every post, if it calls for it.

If you prefer to respond without posting your comment please use our contact form.


Click the REVIEW button below to preview your comments.

Other articles in the same section;
 
Tags; ITIL Service, Outage Analysis, 7 Steps, IT Infrastructure Library, ITIL, Systems Outage Analysis, (SOA), improve availability, benefits of SOA, performing SOA
 
This article has been viewed 32174 times.
NB: This page is © Copyright ITILnews.com and / or the relevant publishing author. You may copy this article only in it's entirety, including any author bio and / or credits, and you must link back to www.itilnews.com.

Keeping up-to-date with ITIL...

Keep up-to-date with ITIL news. Low volume to-the-point bulletins...