So here’s the thing. ITIL has been around since the late 1980s. We’re currently on version three (v3), with a cheeky refresh in 2011. There’s books, courses, and blog posts galore about ITIL but there’s still real confusion about where incident management stops and problem management begins, and the difference between the two. If it was just a terminology issue I wouldn’t be so worried about it, but the reality is – confusion about incident and problem management hurts us all.

If there’s confusion about the processes, then we need to sit down and explain the difference between incidents and problems. Otherwise the same incidents will continue to recur, we’re still reliant on individual heroes to fix things, no root cause analysis gets done so nothing gets fixed permanently, and opportunities for continual service improvement (CSI) are missed.

We’ve all heard of Batman vs. Superman (the comics and the film) but to help end the incident vs. problem confusion once and for all I like to talk about: Batman vs. Columbo.

Batman vs. Columbo Explained

Let’s get back to basics. Incident management is the process that restores service as quickly as possible, with as little adverse impact as possible. In other words, incident management personnel are the superheroes of the IT service management (ITSM) world, swooping in like Batman to save the day, i.e. to get operations and the business back up and running.

However, the primary objectives of problem management are to eliminate recurring incidents (problems) and to minimize the impact of incidents that cannot be prevented. In other words, problem management people are the guys that pop up after normal service has been restored and, like Columbo, act as detectives to figure out what happened, what caused things to go wrong, how it was fixed, and how to stop recurrence. And hopefully no dead bodies are discovered.

You can read more about my Batman vs. Columbo thinking in an article I helped create re ITIL mythbusing for EasyVista.

Top Tips for Batman vs. Columbo In the Real World

So now that we’ve explained the difference between incident and problem management, here are our top tips for getting them right.

  1. Capture the Right Information

Incident records are about service restoration or break-fix as it’s commonly known. The following information is typically needed for an incident record (and hopefully much will already be held in your help desk or ITSM tool):

  • Name
  • Contact details
  • Employee ID
  • Asset tag
  • VIP/critical user status
  • Location
  • Status
  • Type
  • Priority
  • Category
  • Title
  • Description
  • Service affected
  • Assigned teams
  • Resolution details
  • Fix details
  • Related problem record

Problem records on the other hand are all about root cause analysis. Problem records will typically contain the following information:

  • Description of issue
  • Service affected and business impact
  • Downtime
  • Priority
  • Remedial actions to date
  • Support team details
  • Root cause analysis
  • Meeting minutes
  • Next steps
  • Related incidents
  • Related changes

So don’t muddle the two.

  1. Have Separate Roles

Be organized such that there’s no duplication or wasted effort. In short, the incident manager is concerned with speed, whereas the problem manager is concerned with investigation and diagnosis to improve the quality of the end-to-end service.

Key priorities for the incident manager will include coordinating the incident, managing communications with both technical support teams and business customers, and ensuring that the issue is fixed ASAP. Whereas the problem manager will focus on root cause investigation, trending (has this issue appeared before?), finding a fix (interim workarounds and permanent resolution), and ensuring that any lessons learned are documented and acted on.

  1. Have Defined Handover Points

It’s really important to keep an eye on business-as-usual operations, as seemingly small incidents can spiral out of control to have a negative effect on availability levels and customer satisfaction.

Simple things can make a big difference here. For example, placing a white board near the service desk with a list of the top ten problems so that it’s easy for service desk analysts to link incidents to problems such that trends can be identified later on. Or if the service desk has a team meeting, ask the problem manager (or equivalent) to attend to update them on any new problems as well as updates and workarounds for existing problems.

Finally, don’t forget to close the loop and let the service desk know when a problem record has been fixed and closed. There’s nothing worse for a service desk than to have to call a list of customers about an issue that was sorted out months ago.

  1. Don’t Forget CSI

Get proactive! Work as a team to view service performance throughout the month. Have a process to automatically raise a new proactive problem record if availability targets are threatened so that things can be done to prevent further issues. Don’t just sit there waiting to fail the SLA. And keep moving forward – small incremental improvements can really build up over time; it’s called the marginal gains effect.

Build CSI into your process for looking at “own goals” when reviewing major incidents and lessons learned when resolving problems and known errors. Add them to your CSI register so that these improvement ideas are documented, tracked, and – most importantly – acted on.

In Summary

Done well, incident management can fix things quickly, effectively, and safely with problem management complementing the fix effort by providing support in the aftermath by confirming the root cause and preventing a future occurrence.

The value of both processes increases exponentially when they work together rather than working in silos. When incident management is combined with problem management it takes your IT support offering to the next level – instead of just focusing on break fix, you’re moving to a model that looks at the underlying cause and how to make things better for the customer.

Improved first time fix rates, improvement in overall service quality, and increased customer satisfaction? Deal us in!