Abstract

While Microsoft product teams have adopted defect prediction models, they have not adopted vulnerability prediction models (VPMs). Seeking to understand this discrepancy, we replicated a VPM for two releases of the Windows Operating System, varying model granularity and statistical learners. We reproduced binary-level prediction precision (~0.75) and recall (~0.2). However, binaries often exceed 1 million lines of code, too large to practically inspect, and engineers expressed preference for source file level predictions. Our source file level models yield precision below 0.5 and recall below 0.2. We suggest that VPMs must be refined to achieve actionable performance, possibly through security-specific metrics.