Incidents and problems – why do we still struggle? When I look at public IT service management (ITSM) forums such as LinkedIn, I find it disappointing to see how much time is still being wasted arguing over some basic concepts – for instance whether a password reset is an incident or service request. If these things are still unclear, then we owe our customers and colleagues a simple clarification.
One of the things that I’m most disappointed about is the seemingly constant need that some people have to debate whether ITIL is good or bad for each point under discussion. My view is simple, if the guidance in ITIL helps you in a particular situation, take it into account, and “adopt and adapt” to suit your organization and your customers. If the guidance doesn’t help or make sense, then ignore it and move on.
The incidents and problems debate
This particular debate about incidents, problems, faults, etc. seems to be dragging on forever. And I’ve neither not got the time or patience to look through everything that’s been written related to it. Much of the debate seems to be about whether ITIL should have used a different – better – word instead of “incident,” rather than whether the incident management process it suggests is a good starting point or not.
My view is that you can change as many terms as you like. If it helps your organization and customers to rename “incident” as “fault” or even “problem,” then go ahead – just have a record somewhere that makes it clear that, for those familiar with ITIL and its terminology, that when we use this term we’re referring to what ITIL would label an incident.
Using “incident” as an everyday term
Personally, I have a little bit of sympathy for those who find that the term “incident” is not natural customer language.
Few customers will contact the IT service desk to report that they have an incident. They would probably be more likely to use the words problem, issue, fault, error, or something similar – which is all OK, and we certainly shouldn’t be telling the customer that they should really be using the word “incident” just to fit in with what ITIL says.
Incident and problems (in the context of whatever we wish to call them locally)
There are many people who still don’t understand, plus some “experts” who can’t agree on, the difference. I find both very worrying, particularly the latter as these people give consultancy, presentations, advice, and support.
In terms of explaining the difference to people, I find ISO/IEC 20000 useful in understanding the differences:
- “Incident – an unplanned interruption to a service, a reduction in the quality of a service, or an event that has not yet impacted the service to the customer.”
- “Problem – the root cause of one or more incidents. NOTE: The root cause is not usually known at the time a problem record is created and the problem management process is responsible for further investigation.”
- “Known error – a problem that has an identified root cause or a method of reducing or eliminating its impact on a service by working around it.”
Entity types versus records versus process
In my opinion, some of the confusion seems to be in understanding the difference between a “problem,” a “problem ticket,” and the “problem management process” – with the same confusion for “incident.”
Here goes nothing, with my attempt to explain it rather than solely moaning about the ongoing confusion:
- A problem is the root cause of an incident (or multiple incidents), they’re not the same
- The incident is about the impact to the service and customer
- The incident is resolved when there’s no longer a service impact
- The focus of the incident management process is to stop the adverse service impact, either through a workaround or when it’s fixed by the problem being resolved
- The focus of the problem management process, in relation to the incident, is to identify the root cause and to determine action required to fix it
- When the root cause is identified, it’s a known error
- If a change is required in order to resolve the problem, then a request for change will be raised, to be managed by the change management process
Records should be kept of all activity and, to a certain extent, it doesn’t matter how this is done.
Typically, an incident record is used to track the activity of the incident management process, a problem record is used to track the activity of the problem management process, and all records are updated with relevant data – including data from other processes. For example, the incident record is updated with known error information provided by the problem management process.
The processes, of course, also do other things, including problem management performing proactive tasks to identify root causes and their potential preventative action, thus preventing incidents.
Is this really so complicated and confusing? Maybe I’ve completely misunderstood incidents and problems and it’s me getting it wrong.
Matthew Burrows currently serves on the SFIA Council, is Chair of the itSMF International Ethics Review Board, and contributes to the Service Management industry and the development of best practice – including as Design Authority for SFIA.
He is Director and Principal Consultant of BSMimpact, an ISO/IEC 20000 qualified consultant and auditor, and SFIA Accredited Consultant, specialising in implementing pragmatic business service management solutions rather than just theoretical consulting. He has considerable practical and operational experience of Service Management.
Matthew’s authoring credits include SFIA (Skills Framework for the Information Age), Service Management, Portfolio and Programme Management methodologies, white papers, books, articles and publications.