Precision-Integrated Scalable Monitoring


May 21, 2008


Navendu Jain


UT Austin


As networked systems grow in scale and complexity, system introspection becomes an increasingly important and challenging problem. Introspection is the ability to characterize system behavior, from identifying normal conditions to detecting any unexpected or undesirable events—attacks, configuration mistakes, security vulnerabilities, overload, or memory leaks due to buggy applications—before serious harm is done. However, to provide system introspection, monitoring services face two challenges: they must (1) scale to large systems and (2) safeguard accuracy in the face of node and network failures.

In this talk, I will define precision as a new unified abstraction to realize the goal of system introspection. I will first present our work on designing and building PRISM, a scalable monitoring service that makes precision a first-class abstraction. PRISM quantifies (im)precision along a three-dimensional vector: arithmetic imprecision (AI) and temporal imprecision (TI) balance precision against monitoring overhead while network imprecision (NI) addresses the challenge of providing consistency guarantees despite failures. Then, I will describe how our implementation addresses the challenge of providing these metrics while scaling to a large number of nodes and attributes. Finally, I will demonstrate how the unified precision abstraction enables new monitoring applications by presenting experiences from three applications we have built.


Navendu Jain

Navendu Jain is a Ph.D. candidate in Computer Science at UT Austin. He received his Bachelor of Technology and Master of Technology degrees from IIT Delhi. His research interests span data management, networked systems, operating systems, and security. He is a recipient of the IBM Ph.D. fellowship and the Microsoft Graduate fellowship.