{"id":683847,"date":"2020-08-10T00:20:37","date_gmt":"2020-08-10T07:20:37","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=683847"},"modified":"2021-06-25T13:25:09","modified_gmt":"2021-06-25T20:25:09","slug":"symbolic-boolean-derivatives-for-efficiently-solving-extended-regular-expression-constraints","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/symbolic-boolean-derivatives-for-efficiently-solving-extended-regular-expression-constraints\/","title":{"rendered":"Symbolic Boolean Derivatives for Efficiently Solving Extended Regular Expression Constraints"},"content":{"rendered":"<p>The manipulation of raw string data is ubiquitous in security-critical software, and verification of such software relies on efficiently solving <strong><em>string and regular expression constraints<\/em><\/strong> via <em>SMT<\/em>. However, the typical case of Boolean combinations of regular expression constraints exposes blowup in existing techniques. To address solvability of such constraints, we propose a <strong><em>new theory of derivatives of symbolic extended regular expressions<\/em><\/strong> (extended meaning that complement and intersection are incorporated), and show how to apply this theory to obtain more efficient decision procedures. Our implementation of these ideas, <strong><em>built on top of Z3<\/em><\/strong>, matches or <strong><em>outperforms state-of-the-art<\/em><\/strong> solvers on standard and handwritten benchmarks, showing particular benefits on examples with Boolean combinations.<\/p>\n<p>Our work is the first formalization of derivatives of regular expressions which both handles intersection and complement and works symbolically over an arbitrary character theory. It <strong><em>unifies all prior existing approaches<\/em><\/strong> involving derivatives of extended regular expressions, alternating automata and Boolean automata by lifting them to a common symbolic platform. It relies on a parsimonious augmentation of regular expressions: a construct for <strong><em>if-then-else<\/em><\/strong> is shown to be sufficient to obtain relevant closure properties for derivatives over extended regular expressions.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The manipulation of raw string data is ubiquitous in security-critical software, and verification of such software relies on efficiently solving string and regular expression constraints via SMT. However, the typical case of Boolean combinations of regular expression constraints exposes blowup in existing techniques. To address solvability of such constraints, we propose a new theory of [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"MSR-TR-2020-25","msr_organization":"Microsoft","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2020-8-1","msr_highlight_text":"","msr_notes":"Updated November 2020. Extended version of paper in PLDI'2021.","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13560],"msr-publication-type":[193718],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-683847","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-programming-languages-software-engineering","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2020-8-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"MSR-TR-2020-25","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"Microsoft","msr_how_published":"","msr_notes":"Updated November 2020. Extended version of paper in PLDI'2021.","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/MSR-TR-2020-25.pdf","id":"707590","title":"msr-tr-2020-25-2","label_id":"243109","label":0},{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/pldi21-SBFA-final.pdf","id":"756940","title":"pldi21-sbfa-final","label_id":"243103","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":756940,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/06\/pldi21-SBFA-final.pdf"},{"id":707590,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/11\/MSR-TR-2020-25.pdf"},{"id":683853,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/08\/MSR-TR-2020-25.pdf"}],"msr-author-ordering":[{"type":"text","value":"Caleb Stanford","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Margus Veanes","user_id":32806,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Margus Veanes"},{"type":"user_nicename","value":"Nikolaj Bj\u00f8rner","user_id":33067,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Nikolaj Bj\u00f8rner"}],"msr_impact_theme":[],"msr_research_lab":[199565],"msr_event":[],"msr_group":[144812],"msr_project":[259698],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"techreport","related_content":{"projects":[{"ID":259698,"post_title":"Automata","post_name":"automata","post_type":"msr-project","post_date":"2016-07-20 18:35:07","post_modified":"2018-12-04 17:02:22","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/automata\/","post_excerpt":"Automata is a .NET library that provides algorithms for composing and analyzing regular expressions, automata, and transducers. In addition to classical word automata, it also includes algorithms for analysis of tree automata and tree transducers. The library covers algorithms over finite alphabets as well as their symbolic counterparts. In symbolic automata concrete characters have been replaced by character predicates. Such predicates can range over very large or even infinite alphabets, like integers. Predicates can be&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/259698"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/683847","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/683847\/revisions"}],"predecessor-version":[{"id":707593,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/683847\/revisions\/707593"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=683847"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=683847"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=683847"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=683847"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=683847"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=683847"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=683847"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=683847"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=683847"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=683847"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=683847"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=683847"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=683847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}