In IT and IT service management (ITSM), we’re often aware of the issues that cause us pain, or at least the symptoms of the issues. For instance, that:
- IT engineers are working long hours and overtime, causing stress and burnout.
- The business still suffers from change “big bangs” even with a change management team and a weekly change advisory board (CAB) in place.
- IT support is lurching from crisis to crisis – Priority 1 incidents have reduced but complexity has increased causing more stress and downtime for the business.
- IT suppliers are still not integrated into the corporate ITSM system and it adversely impacts service delivery and support.
- There’s still no management dashboard showing the state of the IT estate – be it critical events, or governance, risk, and compliance (GRC).
A little like continuing to suffer with incidents because of a lack of incident management investment, IT departments can’t see that taking the time to get to the root of their issues is what is needed, not applying Band-Aids that involve support staff working 25-hours overtime each weekend.
Isolating the Common Themes
Of course, there are probably more, but here are five of the most common root causes of IT issues:1. Service ignorance. No-one in the IT organization actually knows what services they provide to the business.
2. Increasing complexity from multiple and ageing systems. Where less than a handful of people truly understand how the services, systems, applications, and infrastructure hang together.
3. Insufficient technical know-how in service management. There’s a whole department of IT people dedicated to service management and ITIL processes who are not “technical” enough to truly understand certain business-impacting issues.
4. Outsourcing with multiple suppliers – service integration and management (SIAM). This theme deserves its own blog but here are a few common issues:
5. Reporting for reporting’s sake. Where report after report talks of services that are green and meeting SLAs, but the reality is that they’re bleeding red.
a. Poor vendor contracts, constructed by sales and commercial people with little understanding of the mechanics of IT.
b. Transitions are done at such speed, to make a sale, that there’s little to no due diligence being performed. The “bottom dollar” contract might win without said vendor being allowed to look “under the hood.” And many enterprise IT organizations are undocumented let alone structured enough to meet the needs of the enterprise. The transition thus is largely a disaster.
c. Tacit technical knowledge walks out of the door – as much as we dislike the typical grumpy engineer, they actually know how the undocumented system hangs together and they cannot be replaced as easily as one might think.
d. Supplier low skilled and leveraged staff – typically it’s the use of unskilled staff with the few assigned supplier “experts” leveraged to the max, meaning that your organization struggles to get the expertise and “bang for bucks” required.
Let’s attack each of these themes in turn.
Understanding What IT Services You Provide as an Organization
This is the basic tenet of your existence in the business.
The issue here is that IT people will say that the business should know what business functions and processes they undertake daily. But the reality is often that the business doesn’t know its processes and almost all of these processes rely on IT. So, you can go blue in the face telling them that it’s “their” problem or you can work with them to get that understanding in place.
You also need to understand that a chasm exists between IT and the business, it always has and always will. And it’s your mission to start to bridge that gap. Employ service owners that own, and are accountable for, the service end–to-end, from IT user to the back-end storage. Ensure that they understand this and, if they regularly blame someone else, find someone else for the role.
A healthy exercise to achieve this is a series of critical process workshops which can also form part of the businesses impact analysis for business continuity. Get the critical functions of the business mapped out, and only then can you see how IT services meet those business functions.
Increasing Complexity from Multiple and Ageing Systems
Money is often the root of all evil in IT problems, with almost all enterprises suffering from legacy and massively complex interwoven systems built on top of each other that never saw the project to replace them come to fruition.
There are two core basics that you must perform well to deal with this – configuration management and event management:
- Configuration management – understand what you’re dealing with, how does it all hang together? What talks to what? Where does it live? How do you drive it? How do you put it back together again? Document it, record it. You should have a configuration management database (CMDB) that’s at least 95% correct. If you don’t know what you’re managing, then you’re asking for trouble. It also encompasses knowledge management such as builds, operations manuals and standard operating procedures, solution architecture documents, and disaster recovery documents. It can be automated discovery or manual, and it can be in Word documents and spreadsheets or ITSM tools. It doesn’t matter, just get it documented and the relationships mapped.
- Event management – what is your event management process? How do your event detection and logging systems support the process? What monitoring systems do you have in place? There should be a level of information running through your IT organization for analysis which helps you to detect, respond, and get proactive around events. If you’re not aware of an event in a system, then you cannot respond to it. It snowballs and typically ends up as a Priority 1 event. If you have the correct event detection systems in place, then you can get on top of it more easily, reducing risk. Once the configuration activities are in place, you should be ensuring logs and monitoring are in place for every critical process that support systems. Someone (in your IT teams) should then be using that data daily to understand the health of those systems.
A Whole Department of People Dedicated to Service Management Who Aren’t “Technical”
We’ve created a “cottage industry” in IT for people to tell us that they aren’t technical at every opportunity. These are potentially the same people chairing your CAB and approving change – is it any wonder that IT changes still manage to regularly interrupt service? And if service management folk cannot understand what technical people are saying to them, then what chance does the business have?
We must ensure that everyone in our IT organization has basic competency in what IT provide. So, run workshops from your technical leads to walk service management and process people through how your IT systems work. Even better, show the whole organization.
Allow the time for these activities – learning and understanding your systems is more important than fighting fires. And instead of people just saying that they’re not technical, teach and empower them how to ask a question if they don’t understand something. It could be important to know.
Outsourcing with Multiple Suppliers
This can be a rather emotive topic for people in IT because IT-outsource suppliers have come in and taken peoples jobs – these were past colleagues and friends. It’s up for debate whether they’re better or worse but remember that it’s often all down to money and that the “bean counters” like cheap and that you get what you pay for.
In the suppliers’ defense, enterprise systems are often not in a fantastic state and are undocumented, unwieldy, complex beasts. And, the transition to an outsourced or SIAM way of working provides only a short period of time to fully understand about the systems.
If suppliers are running an “IT bodyshop” – where they fulfil the contract with low skilled, highly leveraged staff and there are few technical experts that know what they’re doing – then recognize this and apply it to the service you’re receiving. Demand that you have technical expert presence on site and find space for them – otherwise conference calls where you’re driving outsourced engineers will drive you nuts.
It’s too easy for the IT organization to blame the supplier when something goes wrong. Don’t do this. You’re the IT organization and you employ the supplier on behalf of the business, and thus it’s your duty to employ frameworks and methods to ensure that the supplier is governed, is compliant, and manages risk in line with your business’ tolerance. You are the ultimate service integrator – if the supplier is poor, then you have allowed them to be poor.
With regards to the contract, you can blame the sales people that have sailed off into the sunset or you can fix the situation. Dig the contract out and refine it to reflect today’s business needs, or you can just carry on moaning about it.
Reporting for Reporting’s Sake.
If no one reads and takes any actions from a report, then bin it. The reality behind a report telling the reader that everything is okay is not okay. You need trusted, competent people to report on the reality, and quality versus quantity is key here. There’s often a mad notion that because you’ve lots of pretty reports that you’re running a sound IT organization. However, the more irrelevant the reporting the more likely the size of the mess underneath.
Then is service level agreement (SLA) reporting a waste of time? There’s the daily discussion on how long an IT system was down or degraded for, but why? The business knows exactly how long it was down for. It’s become somewhat of an obsession for IT managers to penalize suppliers rather than actually fixing the root causes of the events destroying SLAs. So, set up automated reporting to produce your true SLA statistics and stop wasting time just talking about them. Strive to make SLAs irrelevant through improvement.
For Governance, Risk, and Compliance (GRC) reporting, who is reporting? What are they reporting? Why are they reporting? How are they reporting? If you have 20 risk registers with stuff that hasn’t been actioned for months or even years, ask yourself if they really are risk registers or just to-do lists. You need to be actioning real risk today – don’t leave it. Remember that there’s a difference between operational risk and security risks – many enterprises muddle the two.
Ideally you can serve all of this up through your ITSM tool, with dashboards showing perceived risk so that everyone in the organization has a single pane of glass to view the current state. If you don’t currently have it, it should be on the IT organization’s improvement roadmap to create.
How Do You Achieve All This?Especially while everyone in IT already has their daily workload to contend with.
Here’s my recommendation:
- Find your star players, who are competent and have the drive to achieve operational excellence.
- Sell your vision, show your people that they don’t have to be doing silly hours, on-call, getting burnt out, and living in constant fear of your IT operation.
- Form a task force to hit your pain points against the above. IT teams need to deliver what the task force requires. This could take 6-12 months of work just to get you to an acceptable level of risk.
- Remember this is a journey that never ends. So, never stop improving and never stop challenging people or processes that do things because “that’s the way we have always done it.”
- Good luck.