String-related defects are among the most prevalent in modern software development. For example, in terms of frequency, cross-site scripting vulnerabilities have surpassed traditional exploits like buffer overruns. The state of this problem is particularly disconcerting because it does not just affect legacy code: developing web applications today – even when adhering to best practices and using modern library support – remains error-prone.
In this presentation, I will discuss work that aims to address some of the challenges that arise when trying to prevent or mitigate the effects of string-related defects. My dissertation work focuses on providing a constraint solving interface for a wide range of client applications. A client program analysis can use that interface to reason about strings in the same way it might use a Boolean satisfiability (SAT) solver to reason about binary state. The work identifies a set of string constraints that (a) captures common programming language constructs, and (b) permits effective solving algorithms. I will also talk about the BEK project, which is joint work with researchers at MSR and UC Berkeley. The BEK project provides a domain-specific programming language that directly captures a commonly-used class of string sanitization functions, and allows for deep semantic checks on functions written in the language. Taken together, the string constraint solving work and the BEK project represent a compelling first step toward the precise modeling of code that manipulates string values.
Additional presentation materials are available at: