Symbolic Boolean Derivatives for Efficiently Solving Extended Regular Expression Constraints
- Caleb Stanford ,
- Margus Veanes ,
- Nikolaj Bjørner
MSR-TR-2020-25 |
Published by Microsoft
Updated November 2020. Extended version of paper in PLDI'2021.
The manipulation of raw string data is ubiquitous in security-critical software, and verification of such software relies on efficiently solving string and regular expression constraints via SMT. However, the typical case of Boolean combinations of regular expression constraints exposes blowup in existing techniques. To address solvability of such constraints, we propose a new theory of derivatives of symbolic extended regular expressions (extended meaning that complement and intersection are incorporated), and show how to apply this theory to obtain more efficient decision procedures. Our implementation of these ideas, built on top of Z3, matches or outperforms state-of-the-art solvers on standard and handwritten benchmarks, showing particular benefits on examples with Boolean combinations.
Our work is the first formalization of derivatives of regular expressions which both handles intersection and complement and works symbolically over an arbitrary character theory. It unifies all prior existing approaches involving derivatives of extended regular expressions, alternating automata and Boolean automata by lifting them to a common symbolic platform. It relies on a parsimonious augmentation of regular expressions: a construct for if-then-else is shown to be sufficient to obtain relevant closure properties for derivatives over extended regular expressions.