The combination of incident and problem management was one of the great ideas in the original ITIL publication. It represented a major step forward in IT management thinking. However, today, I believe it to be an obsolete model which leads to a mistaken view of service management.
These days, it’s a very limited world where solving root causes makes incidents vanish. I’m not saying that it wouldn’t work for you, it worked for me back in the 1980s when I was a manager in IT operations. My point is that the world has moved on in many ways.
Dissecting the incident-problem processes
When considering problem management, the traditional:
- Incident to
- Problem to
- Root cause identification to
- Service restoration
thinking is mechanistic and technology-centric. It concentrates on the service production processes and systems. It assumes that incidents are caused by service system malfunctions and that they can be always fixed. It also ignores the customer service aspects. A service failure can hurt customers in different ways and the customer service function should look after the customers and solve their problems. I will leave the customer service view to a later article.
The distinction of incident and problem has been difficult to many practitioners. The reason for the confusion is that the two processes are actually the same process. Both try to solve failures in service systems. In many cases the problem management process is just second-level incident management. It’s a far better idea to unite the two processes in a single process. Having a united incident/problem management process offers a more mature approach to failures in services.
The key benefit is that it’s not possible to close the ticket after a workaround has been applied. The faults need to be solved permanently, if that is possible and practical. If not, then the situation is a risk to the service and one then needs to apply risk management. While the incident-problem model is used only in IT service management (ITSM), risk management is a mature framework that is used widely in all business areas.
Risk management 101
Risk is the effect of uncertainty on objectives. The objectives of an organization are valuable to it which means that risk management is about value.
Risk management enables an organization in many ways, for example to:
- Increase the likelihood of achieving objectives
- Encourage proactive management
- Improve governance
- Improve loss prevention and incident management
- Improve organizational resilience
Risks need to be assessed, this includes identification, analysis, and risk evaluation. The assessed risks can then be treated. There are many ways to treat risks, for example risks can be:
- Removed by fixing the risk source
- Made less likely
- Mitigated by changing the consequences
Comparing risk management and incident/problem management
An important distinction between incident/problem management and risk management is the organizational position. A problem manager is typically a technical expert who is not able to handle complex problems which require management decisions. Proper risk management starts from board level and covers the entire organization. IT risks can be very harmful for the whole enterprise – with this missed when viewed though a purely technical lens.
If you think that problem management is a valid process, please study risk management and the Cynefin framework.
In Cynefin terms, incident and problem management works only in the Simple and Complicated domains. Simple problems can be categorized and there is a known solution. With complicated problems it’s not possible to categorize problems directly, but problems can be solved through analysis. Unfortunately, many important problems exist in the Complex and Chaotic domains and there the solutions are political or business decisions.
The ISO 31000 standard is a good source for risk management guidance. I’ve described here only a small part of the whole risk-management picture and I recommend that all ITSM practitioners study it.
Having read this, what do you think about the separation of incident and problem management? Please let me know in the comments.
Aale Roos was an expert in statistical computing and data analysis before he moved on to IT service management. He was Head of Customer Service at an IT service provider when he took the next step and became an ITSM consultant in 1989. He has since been selected as the ITSM Person of the Year by itSMF Finland.