A Holistic View of AI-driven Network Incident Management

Pouya Hamadanian; Behnaz Arzani; Sadjad Fouladi; Siva Kesava Reddy Kakarla; Rodrigo Fonseca; Denzican Billor; Ahmad Cheema; Edet Nkposong; Ranveer Chandra

Hot topics in networking (HotNets) | October 2023

Organized by acm

Download BibTex

We discuss the potential improvement large language models (LLM) can provide in incident management and how they can overhaul how operators conduct incident management today.
Despite promising results, initial work in this space only scratched the surface of what we can achieve with LLMs.
We instead propose a holistic framework for building an AI helper for incident management and discuss the several avenues of future research needed to achieve it.

We support our design by thoroughly analyzing the fundamental requirements the community should think about when it designs such helpers.
Our work is based on discussions with operators of a large public cloud provider and their prior experiences both in incident management and with attempts that have tried to improve the incident management experience through various forms of automation.