Ah, availability management. It sounds very comforting because we all want availability – especially when something is needed. But how easy is the availability management practice in IT service management (ITSM) and the new ITIL 4 best practice?
Availability management, at least in ITIL terms, has been with us since the 90s and when the ITIL authors write about availability management they make it look clever – because nothing looks smarter than a mathematical formula (OK, perhaps me in a tux). So, they gave us this formula to calculate an availability percentage:
Availability = Agreed Service Time – Down Time x 100
Agreed Service Time
On the face of it, it looks nice and easy. Keep your IT services working, and you’ve delivered 100% availability, and everyone is happy. So, how could anything that looks so simple to use (and obvious) be so hard? If you keep reading, I’ll let you know!
Understand that availability, like beauty, is in the eye of the beholder
In most organizations, customers and IT service providers will sit down together to decide on the “agreed service hours” for the service desk – because delivering a service 24×7 can be expensive and suboptimal is not fully used around the clock.
These sat around the table will likely be tempted by the financial savings that a “9 to 5 on working days” IT service desk package can offer them. However, the sales team will definitely think differently when certain staff can’t access the CRM system at just gone 5pm on the last day of the sales period, with multiple high value (and high commission) deals near to closure.
Here the reported availability figures published or the IT service desk at the end of the month might show 100% availability, but the affected salespeople definitely won’t feel that way. Hence, if the opinions and productivity of key business staff are important to the organization, then IT needs to ensure that:
- These key staff are part of the discussions when “agreed service times” are being agreed
- What was agreed is publicized clearly, not just discovered when assistance is actually needed.
Failing to do this will mean that the availability you’re judged on will not be the same as the figure you calculate and report.
Recognize that not all downtime is equal
The simple formula above tacitly assigns the same value to every minute the service is not available. However, real life in organizations is rarely like that.
For the finance department in a company that does a monthly invoice run on the last day of the month, losing their finance system on that day is way more disruptive than losing it on the first day of the next month. Yet both scenarios would account for, say, eight hours of downtime in the availability figures.
For availability calculations and figures to reflect the real pain of downtime, a significant investment in research, discussion, consultation, and agreement will be needed. Maybe a weighting factor can be worked out? However, the more you talk about and tweak your availability calculation, the more complicated it becomes. Which then needs to be documented and explained to everyone who might be affected.
And what does “available” really mean?
Before deciding whether a service is available or not, it’s important to agree on what “available” actually means. For a technology component like a disk drive or a monitor, I think this is obvious – it either works or it doesn’t.
However, for service elements such as the network, it’s less clear. For instance, does the provision of a degraded service count as available? After all, it’s probably still usable – it unfortunately just takes longer than usual and probably tries the end-users’ patience. And things become even less clear when we consider “service availability.”
IT services are now potentially very complex, with a range of features and components delivering facilities to a range of end users. For example – and I’m deliberately keeping things simple – let’s consider an organization’s employee expenses submission, approval, and payment service. This provides employees and human resources (HR)/finance with a range of facilities:
- The ability to enter expenses claims
- The routing of claims and their processing including approvals, queries, etc.
- The processing of expense payments
- Reporting on expenses and progress against budgets, etc.
- The archiving of old records
Clearly, not all of these facilities are equal. In fact, for many employees, the third one – the processing of expense payments – is the most important and only the first three matter at all.
So, do all parts of the service need to be there for it to be officially “available”? Someone needs to decide beforehand, or otherwise the reported availability figure means very little. For example, here if archiving is the only part of the service that works, it should be clear that the service isn’t really available. And, conversely, if it’s only the archiving that doesn’t work, then hopefully most customers would consider the service to be available.
However, the crossover point between “available” and “down” can be very hard to set, because this will be seen to be in different places by different groups of customers and users.
Good communication will help with availability confusion
There are many potential confusion points related to availability’s definition, measurement, and reporting. Hopefully, my simple examples have given you an idea of how what looks simple on first sight, can become more and more complicated the deeper you go using it to gauge performance.
However, what should be clear is that to get meaningful results, and to remove (or at least minimize) availability-based arguments, the time spent in discussion with customers and users is time well spent!
You might also find this article by ITSM legend Stuart Rance useful when considering availability management: How to Define, Measure, and Report IT Service Availability. I’m told it’s still a highly read piece on ITSM.tools after over two years.
How do you approach availability? Do you have any top tips to share? Please let me know in the comments.