So here’s the thing. ITIL has been around since the late 1980s. We’re currently on version four (ITIL 4) but, while there are books, courses, and blog posts galore about ITIL, there’s still real confusion about where incident management stops and problem management begins. Plus, the difference between the two. If it was just a terminology issue I wouldn’t be so worried about it, but the reality is – confusion about incident and problem management hurts us all.
If there’s confusion about the practices/processes, then we need to sit down and explain the difference between incidents and problems. Otherwise, the same incidents will continue to recur, we’re still reliant on individual heroes to fix things, no root cause analysis gets done so nothing gets fixed permanently, and opportunities for continual improvement are missed.
We’ve all heard of Batman vs. Superman (the comics and the film) and to help end the incident vs. problem confusion once and for all I like to talk about: Batman vs. Columbo.
Batman vs. Columbo Explained
Let’s get back to basics. Incident management is the process that restores service as quickly as possible, with as little adverse impact as possible. In other words, incident management personnel are the superheroes of the IT service management (ITSM) world, swooping in like Batman to save the day, i.e. to get operations and the business back up and running.
However, the primary objectives of problem management are to eliminate recurring incidents (problems) and to minimize the impact of incidents that cannot be prevented. In other words, problem management people pop up after normal service has been restored and, like Columbo, act as detectives to figure out what happened, what caused things to go wrong, how it was fixed, and how to stop recurrence. And, hopefully, no dead bodies are discovered.
You can read more about my Batman vs. Columbo thinking in an article I helped create re ITIL mythbusing for EasyVista.
Top Tips for Batman vs. Columbo In the Real World
So now that I’ve explained the difference between incident and problem management, here are my top tips for getting them right.
1. Capture the Right Information for Problem Management
Incident records are about service restoration or break-fix as it’s commonly known. The following information is typically needed for an incident record (and hopefully much will already be held in your help desk or ITSM tool):
- Contact details
- Employee ID
- Asset tag
- VIP/critical user status
- Service affected
- Assigned teams
- Resolution details
- Fix details
- Related problem record
Problem records and problem management on the other hand are all about root cause analysis. Problem records will typically contain the following information:
- Description of issue
- Service affected and business impact
- Remedial actions to date
- Support team details
- Root cause analysis
- Meeting minutes
- Next steps
- Related incidents
- Related changes
So don’t muddle the two.
2. Have Separate Roles for Incident and Problem Management
Be organized such that there’s no duplication or wasted effort. In short, the incident manager is concerned with speed, whereas the problem manager is concerned with investigation and diagnosis to improve the quality of the end-to-end service.
Key priorities for the incident manager will include coordinating the incident, managing communications with both technical support teams and business customers, and ensuring that the issue is fixed ASAP. Whereas the problem manager and problem management will focus on root cause investigation, trending (has this issue appeared before?), finding a fix (interim workarounds and permanent resolution), and ensuring that any lessons learned are documented and acted on.
3. Have Defined Handover Points Between Incident and Problem Management
It’s really important to keep an eye on business-as-usual operations, as seemingly-small incidents can spiral out of control to have a negative effect on availability levels and customer satisfaction.
Simple things can make a big difference here. For example, placing a whiteboard near the service desk with a list of the top ten problems so that it’s easy for service desk analysts to link incidents to problems such that trends can be identified later on. Or if the service desk has a team meeting, ask the problem manager (or equivalent) to attend to update them on any new problems as well as updates and workarounds for existing problems.
Finally, don’t forget to close the problem management loop and let the service desk know when a problem record has been fixed and closed. There’s nothing worse for a service desk than to have to call a list of customers about an issue that was sorted out months ago.
4. Don’t Forget Continual Improvement
Get proactive! Work as a team to view service performance throughout the month. Have a process to automatically raise a new proactive problem record if availability targets are threatened so that things can be done to prevent further issues. Don’t just sit there waiting to fail the SLA. And keep moving forward – small incremental improvements can really build up over time; it’s called the marginal gains effect.
Build continual improvement into your process for looking at “own goals” when reviewing major incidents and lessons learned when resolving problems and known errors. Add them to your improvement register so that these improvement ideas are documented, tracked, and – most importantly – acted on. Especially for problem managment.
In Summary: Incident vs. Problem Management
Done well, incident management can fix things quickly, effectively, and safely with problem management complementing the fix-effort by providing support in the aftermath by confirming the root cause and preventing a future occurrence.
The value of both processes increases exponentially when they work together rather than working in silos. When incident management is combined with problem management it takes your IT support offering to the next level – instead of just focusing on break-fix, you’re moving to a model that looks at the underlying cause(s) and how to make things better for the customer.
Improved first-time fix rates, improvement in overall service quality, and increased customer satisfaction? Deal us in!