Detecting Application-Level Failures in Component-based Internet Services

IEEE Transactions on Neural Networks: Special Issue on Adaptive Learning Systems in Communication Networks |

Most Internet services (e-commerce, search engines, etc.) suffer faults. Quickly detecting these faults can be the largest bottleneck in improving availability of the system. We present Pinpoint, a methodology for automatic fault detection in Internet services by (1) observing low-level, internal structural behaviors of the service; (2) modeling the majority behavior of the system as correct; and (3) detecting anomalies in these behaviors as possible symptoms of failures. Without requiring any a priori application-specific information, Pinpoint correctly detected 89-96% of major failures in our experiments, as compared to 20-70% detected by current application-generic techniques.