So here’s the thing about major incidents. As a Major Incident Manager or Problem Manager, you can do all the ITIL or IT service management (ITSM) training in the world, but nothing can really prepare you for your first ever major incident or crisis. It’s definitely something that becomes easier with experience.
I’ve been a Problem Manager, on and off, for years, so here are my thirteen top tips for dealing with a major incident management crisis without hiding under your desk, chain-smoking, or mainlining vodka.
Here @Vawns shares her top 13 tips for dealing with a crisis that doesn't involve hiding under your desk, chain-smoking, or mainlining vodka. #ITSM #ServiceDesk #ITIL Share on XTip 1: Keep calm (even though it’s a major incident)
Easier said than done with a major incident, I know. It’s hard being the calm, sensible one when everyone else is losing it, so I’ve mastered the art of pretending. The idea is to have an outward appearance of calm during a major incident; because if you’re calm, the people around you will start to calm down, and you effectively take the sense of panic out of the situation.
It might sound obvious, but when you’re dealing with a major incident or crisis, panicking isn’t going to help anyone.
Tip 2: Look after your people in a major incident
Is anyone in immediate danger because of the major incident?
If so, invoke the right safety protocols; be it an emergency power off (EPO) button or removing people from the site and getting them to safety. Once the immediate danger from the major incident has been contained, you can look at who does what and the lessons learned. But take care of your people first.
Tip 3: Seek help with the major incident
How serious is this major incident? Could it be time to invoke disaster recovery (DR) plans? Make change management aware that there may well be the need for an emergency change to fix things. And if your service desk is noticeably struggling, can you get more people in to take the pressure off your existing shift?
Look after your people, seek help, and communicate… just three of the tips @Vawns shares in this article on how to deal with a major incident. #ITSM #ServiceDesk #ITIL Share on XTip 4: Communicate throughout the major incident
The major incident management tone you’re looking for is calm, but brisk, efficiency.
As a Crisis Manager, ensure that everyone involved with the major incident knows to update you so that you can then send out appropriate updates to those that need to be informed. This ensures that not only have you everything captured for the report, but it will save your support teams from being asked the same question by ten different people, freeing them up to help fix the issue.
Be as proactive as you can in getting the major incident management message out. For instance, as a senior manager, there’s nothing worse than being told about an issue by an irate customer. So ensure your senior management team is kept in the loop with everything they need to know about the issue and its impact.
If you’re really unlucky, you may have to deal with the press or regulatory bodies. In the past year or so we’ve seen many big firms experience IT outages, be it Eircom in Ireland, NatWest in the UK, or Target in the US.
My heart always goes out to the major incident, problem, and service desk managers involved because, let’s face it – what always makes managing a crisis that much easier? That’s right, folks, being the main headline on news websites, making the national press, or trending on Twitter for the worst possible reasons.
If you have a CMDB or a service catalog, try to see if the major incident’s impact extends to other customers or service towers and warn them accordingly. Also, make sure the service desk has updated the welcome message on their Automatic Call Distribution (ACD) system to try and stop the avalanche of calls.
Why is this so important, I hear you ask? Well, firstly, nothing is more stressful to a service desk analyst than having multiple calls in the queue waiting to be answered. The second reason is something that happened on my watch a long time ago in a galaxy far, far away.
We got a major incident message onto the ACD system, but by the time we managed to deploy it, the system was unable to cope with the number of calls and crashed. Now the issue was so much worse, not only was the business service down, but so was the service desk, so no one could get through to report other issues, and our support teams had two major incidents to fix rather than one. Not good.
Tip 5: Have a major incident fix? Test and verify
Brilliant. Bob from the server team has a fix for the major incident. But has it been tested and checked?
Remember before when we talked to change management to pre-warn them that an emergency change might be needed as part of the major incident fix effort? Go talk to them and raise a change with all the available details while Bob is testing. The change record doesn’t have to be perfect but will need to have the key activities, who will be involved in doing the work, and rough timings.
Tip 6: Manage the major incident fix after the major incident
Make sure that Bob has everything he needs to get the major incident fix in successfully. Make sure there are enough people on hand, for example, other support teams or third-party support if needed to ensure that there are no hiccups.
Tip 7: Check everyone is up and running
“Hurrah, we’re back in business!” I know the temptation is to shout this from the rooftops but do a quick major incident sanity check first.
If your DNS server was down, check to make sure you can access the outside world.
Telephony down? See if you can make a call.
Website down? See if you can access it and click on some content links to make sure that the whole thing is backed up, not just the landing page.
You get the idea – check to make sure that everything is as it should be before you break out your major incident victory dance.
'After a major incident fix, check everyone is up & running. I know the temptation is to shout this from the rooftops but do a quick sanity check first.' – @Vawns #ITSM #ServiceDesk #ITIL Share on XTip 8: Have a plan for managing challenging behavior
So a major incident is stressful. No one enjoys dealing with a massive system failure and unhappy users, and sometimes stress makes people behave in a way that isn’t ideal.
Do you have a plan in place for managing challenging behavior as part of a major incident? You should says @Vawns. Here she shares some examples. #ITSM #ServiceDesk #ITIL Share on XHere are some major incident behavior examples:
The situation | How to respond |
---|---|
A major incident bridge has been raised, and there’s uncertainty over what to do next; people are starting to flap. | “OK, let’s start the process. We’ll get a comms out and pull a team together. We’ve got this; we’ve just got to work the incident and figure out the next steps.” |
A senior manager joins the major incident call and starts getting a bit stressy, interrupting the fix effort. | “Thanks for your feedback, but we’re trying to keep the focus on the fix effort for now. We’ll focus on X later.” |
You’re a couple of hours in, no one knows what’s causing the issue, and people are starting to panic. | “Let’s quickly recap what we’ve figured out so far and go from there. Are there any other support teams we need to loop in at this point?” |
Someone in one of the support teams is kicking off - blaming another team, a supplier, or the business. | “I hear you, but that’s not something we’ll discuss on this call. We’ll pick it up later in the major incident review” |
You’re on a major incident bridge, and someone is kicking off at a member of your team. | Intervene quickly and go in hard. Disconnect or mute them if necessary, but no one deserves to be treated aggressively or be shouted at. You could try the “actually, we’re focusing on x” script as above, but if someone is kicking off that much, they shouldn’t be on the call. |
Tip 9: Practice makes perfect
Build major incident management into your training material. Have run books so that every person on your service desk knows how to handle a major incident and can respond correctly if they’re the person that takes the call. Have checklists and lots of example scenarios so dealing with a major incident becomes second nature for your people, which in turn will take some of the stress and anxiety out of them.
Tip 10: Change and major incidents
Like a major incident, emergency changes aren’t fun, but they are often necessary to resolve a major fault. Make sure your major incident and emergency change processes are closely aligned, and consider having a member of the change management/enablement team part of the major incident bridge or have them on the comms list so that if an emergency change is needed, the support team has everything they need to get the fix in effectively, efficiently, and safely.
Consider designing your major incident form so that a “needs emergency change to resolve” workflow can be automated when needed.
Are your major incident & emergency change processes closely aligned? They should be, says @Vawns. See why & get more tips here. #ITSM #ServiceDesk #ITIL Share on XTip 11: Deal with the immediate aftermath of the major incident
Capture as much information about the major incident as you possibly can as you’re going along, because once this issue is fixed, people tend to be so focused on the next issue that they forget things. So ensure that you capture everything while it’s still fresh in people’s minds.
Tip 12: Major incident review meeting
AKA the major incident management post-mortem or drains-up meeting, not a witch-hunt.
Set ground rules and reassure everyone in the room that the major incident management meeting is to look at what happened and how it can be prevented from recurring, not to assign blame. If people think that they’re going to get blamed, then they’ll clam up, and you’re not going to get very far. By making people relax and feel comfortable, you’ll get to the root cause much quicker, as well as any actions to prevent a recurrence.
When you capture your lessons learned, make sure they’re documented, shared, and acted on. The easiest way to do this is to add them to a CSI register if your organization has one. Whatever happens, make sure they’re not forgotten – if the same major incident happens six months down the line, people tend to be much less forgiving if it could have been prevented.
In this article @Vawns shares her top major incident management tips for dealing with a crisis. #ITSM #ServiceDesk #ITIL Share on XTip 13: Look after your people in a major incident
So important, it’s worth mentioning twice in the context of major incident management. Okay, so you have restored service, told your stakeholders, dealt with the fallout, and captured lessons learned.
The chances are that you and the team are stressed out and shattered. So now is the time for motivation in the form of time off in lieu, caffeine, or team building in the form of a quick trip to the pub after work. Not something you’ll necessarily find in any major incident book or training course, but it will do wonders for morale.
That’s me done. What are your top major incident management tips for dealing with a crisis? Please let me know in the comments.
Further Reading
If you liked this major incident management article, the following ITSM articles might also be if interest.
Vawns Murphy
Vawns Murphy holds qualifications in ITIL V2 Manager (red badge) and ITIL V3 Expert (purple badge), and also has an SDI Managers certificate. Plus she holds further qualifications in COBIT, ISO 20000, SAM, PRINCE2, and Microsoft. In addition, she is an author of itSMF UK collateral on Service Transition, Software Asset Management, Problem Management and the "How to do CCRM" book. She was also a reviewer for the Service Transition ITIL 3 2011 publication.
In addition to her day job as a Senior ITSM Consultant at i3Works, she is also an Associate Analyst at ITSM.tools.
2 Responses
There’s some great advice here Vawns, but I would add one thing. Don’t make the mistake of thinking you have to diagnose and fix the root cause as part of managing the incident. It’s much better to focus on restoring the business process as the key incident management activity, and allow problem management to take whatever time it needs to do root cause analysis later.
I completely agree Stuart – that’s why when I’m running training courses I describe Incident Managers as the emergency services (“quick, let’s get everyone patched up and back on their feet again) and Problem Managers as detectives (“ok, let’s figure out what happened and how to stop it from happening again”). Maybe my next blog should be about the handover between the two 🙂