A Holistic View of AI-driven Network Incident Management

Hot topics in networking (HotNets) |

Organized by acm

We discuss the potential improvement large language models (LLM) can provide in incident management and how they can overhaul how operators conduct incident management today.
Despite promising results, initial work in this space only scratched the surface of what we can achieve with LLMs.
We instead propose a holistic framework for building an AI helper for incident management and discuss the several avenues of future research needed to achieve it.

We support our design by thoroughly analyzing the fundamental requirements the community should think about when it designs such helpers.
Our work is based on discussions with operators of a large public cloud provider and their prior experiences both in incident management and with attempts that have tried to improve the incident management experience through various forms of automation.