How AI Reduces Risk in High-Impact IT Change Management

AI Reduces Risk in IT Change Management

The IT service management (ITSM) domain has become a fertile ground for the application of artificial intelligence (AI). Top use cases, such as AI-driven incident response, generative knowledge base solutions, and conversational support for request fulfillment, have demonstrated clear efficiency gains. Now, let’s shift our attention to an area where AI can make a lasting impact: IT change management. With the current AI technology, ITSM practitioners can begin moving away from a risk-prone, manual IT change management (or change enablement) process toward a highly controlled, predictive, and agentic ITSM framework.

Why IT Change Management Still Fails

While ITSM frameworks have incrementally improved IT change management over many years, incorporating lessons learned from countless incidents and outages, human error can remain the Achilles’ heel for failed changes. This vulnerability was recently demonstrated during the AWS outage in the Northern Virginia region, triggered by a technical change that inadvertently created an empty DNS record.

The Persistent Role of Human Error

Even best-practice-driven IT change management processes are vulnerable to human error. And AI models are uniquely positioned to help solve this, offering a path to both accelerate the entire change life cycle and drastically mitigate the risk posed by human actions.

Let’s break down a major change process to understand how AI can enhance it from start to end.

ManageEngine

Breaking Down the Risks in a Traditional Change Process

To illustrate these hidden risks, consider Zylker Inc., an organization with operations across multiple countries. Zylker was undertaking a massive security infrastructure project: upgrading and migrating to a next-gen firewall system. This major change was compounded by dozens of routine changes and an emergency patch roll-out to the organization’s ERP software.

Zylker’s IT change management capabilities followed best practices, yet it ultimately led to a catastrophic availability incident due to unmitigated risk.

Using Zylker’s IT change management framework, a model adopted by most organizations, let’s analyze the key phases and break down the high-level execution of the major change within each stage.

  • Change initiation – The Security Operations Center (SOC) team initiated a major change request for the firewall migration. A major change, in this scenario, has its own workflow with defined checks and balances.
  • Change planning – The ITOps team manually worked on roll-out and back-out plans. It also performed the impact analysis manually, using the infrastructure dependencies shown in the configuration management database (CMDB), spanning over 750 IT services and 650 servers.
  • Change approvals – The change advisory board (CAB) convened to approve the change from all aspects. However, the CAB lacked organization-wide representation.
  • Change execution – The next-gen firewall system was implemented, and deployment was handled in a single go.
  • A major outage – Everything seemed to go according to plan. Then an availability incident occurred, impacting all critical services, including the CRM system, ERP systems, and databases.

Despite following the agreed protocol, Zylker faced an availability incident due to hidden risks accumulated at each step of the IT change management process.

The Hidden Risks Inherent in Many IT Change Management Processes

The Zylker incident highlights several distinct risks – inaccuracy, unrepresentative governance, configuration anomaly, and deployment risk – embedded in the conventional, manual change process:

  • Change planning (inaccuracy) – A manual analysis of a sprawling IT infrastructure is bound to cause human errors and inaccuracies in impact analyses. This is the initial point where Zylker’s outage could have started.
  • Change approvals (unrepresentative governance) – Due to inaccurate impact analyses, the CAB failed to reflect the required representation. Consequently, the initial, flawed impact analysis, stemming from human error, remained undetected during approvals as well.
  • Change execution (configuration anomaly) – There is a risk of issues in filtering rules not being identified, which could have blocked traffic and led to an availability incident.
  • Change execution (deployment risk) – Deployment of changes in one go without testing in a representative environment could have masked issues in the firewall deployment.

How AI Enhances Each Stage of IT Change Management

By leveraging AI, organizations can mitigate inherent risks and accelerate change management. Here’s how Zylker’s IT change management would look with AI capabilities enhancing it at each point of the process.

1. Smarter Change Detection and Initiation

LLM-powered AI agents can recognize that a change is required when the SOC team is finalizing firewall upgrade details via collaboration channels. An IT change management AI agent can then ask to create a change request and begin the process. While this doesn’t directly mitigate risk, it enables process automation.

2. AI-Powered Impact Analysis and Planning

Machine learning models trained on historical ITSM data can understand the success rate of historical changes and the people involved in past changes. Based on the impact and criticality of the configuration items (CIs), the model can suggest the right people to drive the change, reducing the risk of human bias.

Generative AI (GenAI) models can draft elaborate roll-out and back-out plans based on prompts and subsequent inputs, saving significant time.

Deep learning models for impact analysis, leveraging real-time CMDB data feeds, can help identify the true dependencies and paint an accurate picture of the services impacted by the change.

3. Intelligent Approvals with Agentic AI

Based on accurate impact analyses and input from the CMDB, AI agents can bring together the right CAB, representative of the organization and the affected services, with the right service owners.

4. Predictive Execution and Controlled Deployment

Zylker could have avoided the firewall filtration rules anomaly if it had run it through a domain-specific model trained on IT operations data. This model would have flagged the filtration rule issue in the next-gen system and explained what it would have done.

Rather than a full deployment in one go, machine learning models can identify the right cohort that would be representative of the organization’s environment while remaining low-impact. This allows the firewall system to be tested on a small cohort of low-criticality, isolated servers to identify issues.

5. Automated and Actionable Post-Change Review

GenAI agents can create a summary of all the operations of the change, which would then be appended to the change request before the release.

Tangible Benefits of AI-Enabled IT Change Management

Throughout this process, despite Zylker still executing changes with its conventional IT change management process, it can mitigate the risks and speed things up significantly with the help of AI and machine learning:

  • Automation of processes – Natural language case recognition, along with LLM-powered AI agents, automates the initiation of change requests and significantly speeds up the planning phase by generating detailed roll-out/back-out plans and summarizing change operations.
  • Risk mitigation and analysis accuracy – Deep learning models trained on real-time CMDB data provide highly accurate impact analyses, while domain-specific LLMs, such as those trained on IT operations, proactively flag critical issues (e.g., firewall filtering anomalies) before deployment.
  • Intelligent suggestions – Predictive machine learning reduces human bias by suggesting the most qualified personnel for planning and identifying low-impact server cohorts for safe, phased execution (reducing large-scale deployment risk). Agentic AI ensures the right service owners are included in the CAB for informed approvals.

Beyond Hype: What AI-Driven IT Change Management Means for the Future of ITSM

This use case demonstrates how various types of AI systems can converge at different points of the IT change management process to mitigate the inherent risks of a change management process built on traditional best practices.

With the initial surge of AI hype passed, the focus has fundamentally shifted to practical application and measurable results. For enterprises seeking to future-proof their ITOps, incorporating AI-powered capabilities into the IT change management life cycle will lay a robust foundation for more advanced initiatives.

Zephaniah B
Zephaniah B
Product Marketer at ManageEngine
Zephaniah is a product marketer for ManageEngine's ESM suite of products. He loves creating resources to educate IT service desk folks on the best practices for making the most of ITSM. Some of these resources include self-assessment toolkits, a security-first service management framework, detailed guides, blogs, and more.
He is an active participant in various industry conferences across media, such as SupportWorld Live (USA) and events from both SDI and SITS (UK). Zephaniah is also a frequent presenter at the ManageEngine IT conferences held worldwide.

Want ITSM best practice and advice delivered directly to your inbox? Why not sign up for our newsletter? This way you won't miss any of the latest ITSM tips and tricks.

nl subscribe strip imgage

More Topics to Explore

Leave a Reply

Your email address will not be published. Required fields are marked *