NetHealth

Established: January 6, 2007

Overview

Networks are being deployed extensively in large corporations, small offices, and homes. However, a significant number of “pain points” remain for end-users and network administrators. To resolve complaints quickly and efficiently, network administrators need tools that can assist them in detecting, isolating, diagnosing, and correcting faults. Furthermore, such tools should also detect network security breaches, possibly caused by innocent employees. The NetHealth project is about detecting, inferring, diagnosing, and recovering from user perceived performance problems in enterprise networks.

Existing products do a reasonable job of presenting statistical data from the network. However, they do not do a comprehensive job of gathering and analyzing the data to establish the root cause of the problem. Furthermore, on the wireless side, most products gather data from the Access Points (APs) only and neglect the client-side view of the network. Some products that monitor the network from the client’s perspective require hardware sensors, which can be expensive to deploy and maintain. Also, current solutions do not provide any support for disconnected clients even though these are the ones that need the most help. On the wired side, a number of researchers have come up with solutions for diagnosing problems over WANs; however, most of those approaches are not integrated to perform end-to-end inference and diagnostics.

Under the NetHealth umbrella, we are building algorithms and tools that

  • allow generalist operators to diagnose end-to-end performance as “seen” by users
  • produce near real-time and historical-analysis reports of end-to-end performance problems with networked services and components
  • prioritize and raise alerts based on impact analysis on users from performance glitches/problems
  • automatically resolve the problem or offer meaningful resolution strategies
  • provide detailed analysis of wireless failures for mobile devices
  • provide snapshots of the “health” of network elements and services
  • compliment existing detailed networked diagnosis technologies

In contrast to traditional network-based and bolt-on approaches, NetHealth leverages clients and servers. NetHealth agents on the end systems are positioned to harvest available application data, and infer application-level dependencies, rather than reverse this information out from the network or from summarized logs and alerts from computing and network elements, and associated management systems. As a result, the NetHealth approach is well-suited for effective problem location and resolution, and for bringing together the intelligence needed to support meaningful resilience and self-healing, self-* capabilities.