Specification-Based Annotation Inference

September 10, 2004
Brian Hackett | Stanford University

A great wealth of information about a program may be implicit in the source code itself; for example, in C and C++ this includes parameter usage and NULL-ness, buffer extents, potential taint, and resource management obligations. When this information is added to existing programs in the form of annotations, aspects of the program’s correctness can be verified by checking the source code on a function-by-function basis. However, it is time-consuming to manually annotate millions of lines of legacy code.

In this talk we present a specification language and engine to be used for automatically inferring annotations in very large programs. A specification consists of a collection of rules for deriving possible program states and annotations in a flow-sensitive manner, and the engine performs a graph search to exhaustively apply those rules. Specifications are independent from the actual program semantics, so the engine may both miss annotations and infer spurious ones; we discuss the usage of soundness (accuracy) and completeness (coverage) as metrics to evaluate specifications.

We present the results of inferring annotations about parameter usage, NULL-ness, and buffer sizes in the complete Windows code base. As part of the Windows SAL effort, these annotations are currently being reviewed and checked in by developers.

Speaker Details

Brian Hackett is a PhD candidate in the Computer Science Department at Stanford, and is a Microsoft Research PhD Fellow. He is interning with the Software Reliability Research group at MSR this summer.