{"id":734818,"date":"2021-05-03T10:09:05","date_gmt":"2021-05-03T17:09:05","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=734818"},"modified":"2022-05-19T11:28:32","modified_gmt":"2022-05-19T18:28:32","slug":"everparse-hardening-critical-attack-surfaces-with-formally-proven-message-parsers","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/everparse-hardening-critical-attack-surfaces-with-formally-proven-message-parsers\/","title":{"rendered":"EverParse: Hardening critical attack surfaces with formally proven message parsers"},"content":{"rendered":"\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/04\/1400x788_Everparse_no_logoV2-1.gif\" alt=\"An animation showing an example of a high-level language message format specified by EverParse. From the message, two arrows labeled \u201cEverParse\u201d point to a rectangle labeled \u201cformal specification\u201d and a rectangle labeled \u201clow-level implementation,\u201d respectively, inside a larger rectangle labeled \u201cF* code.\u201d The figure represents EverParse\u2019s ability to automatically generate safe, correct, and fast F* parsing code. \u201cCorrectness\u201d is defined as \u201cCorrectness:\u202f\u200bparse (serialize msg) = msg\u200b\u201d and \u201cvalid msg ==> serialize (parse msg) = msg\u200b.\u201d The F* logo appears with the description that F* is a type theory\u2013based programming language and proof assistant that can prove theorems about programs\u200b. From the \u201cF* code\u201d rectangle, arrows point from the \u201clow-level implementation\u201d rectangle and a rectangle labeled \u201cverified libraries for combinators\u201d to a rectangle labeled \u201cSafe high-performanc\"\/><\/figure>\n\n\n\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/project-everest.github.io\/everparse\/\"><em>EverParse<\/em><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><em> is a framework for generating provably secure parsers and formatters used to improve the security of critical code bases at Microsoft. EverParse is developed as part of <\/em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/project-everest.github.io\/\"><em>Project Everest<\/em><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><em>, a collaboration between Microsoft Research labs in <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-redmond\/\"><em>Redmond, Washington<\/em><\/a><em>; <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-india\/\"><em>India<\/em><\/a><em>; and <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-cambridge\/\"><em>Cambridge, United Kingdom<\/em><\/a><em>; <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/collaboration\/inria-joint-centre\/\"><em>the Microsoft Research-Inria Joint Centre<\/em><\/a><em>; <\/em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/prosecco.gforge.inria.fr\/\"><em>Inria<\/em><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><em>; <\/em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.andrew.cmu.edu\/user\/bparno\/\"><em>Carnegie Mellon University<\/em><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><em>; and several other open-source contributors. The Everest team has produced several formally proven software components, including the <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/evercrypt-cryptographic-provider-offers-developers-greater-security-assurances\/\"><em>EverCrypt<\/em><\/a><em> cryptographic provider and verified implementations of the <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/implementing-proving-tls-1-3-record-layer\/\"><em>TLS 1.3<\/em><\/a><em> record layer, the <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/security-model-verified-implementation-quic-record-layer\/\"><em>QUIC<\/em><\/a><em> record layer, the <\/em><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/signalstar.gforge.inria.fr\/\"><em>Signal<\/em><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><em> messaging protocol, and the <\/em><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dice-a-formally-verified-implementation-of-dice-measured-boot\/\"><em>DICE<\/em><\/a><em> measured boot protocol. This is the fourth blog post in a series about Project Everest.<\/em><\/p>\n\n\n\n\n\n<ul class=\"wp-block-list\"><li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fresearch%2Fblog%2Fproject-everest-reaching-greater-heights-in-internet-communication-security%2F&data=04%7C01%7Cv-krdod%40microsoft.com%7Ca76b6c5ca88e48bd7a4508d8efd3f96a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637523040986812024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=%2BJ65KO8I0tyvFwDtFm6Z4VgLLlmJTVv%2BaR1sUkPnKX4%3D&reserved=0\" target=\"_blank\" rel=\"noopener noreferrer\">\u201cProject Everest: Reaching greater heights in internet communication security\u201d<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0<\/li><li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fresearch%2Fblog%2Fevercrypt-cryptographic-provider-offers-developers-greater-security-assurances%2F&data=04%7C01%7Cv-krdod%40microsoft.com%7Ca76b6c5ca88e48bd7a4508d8efd3f96a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637523040986812024%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Yx4WE9odRtq4zz9gwwqqqtgSUjzA9cDTFw1hbqnU%2BrI%3D&reserved=0\" target=\"_blank\" rel=\"noopener noreferrer\">\u201cEverCrypt cryptographic provider offers developers greater security assurances\u201d<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0<\/li><li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nam06.safelinks.protection.outlook.com\/?url=https%3A%2F%2Fwww.microsoft.com%2Fen-us%2Fresearch%2Fblog%2Fproject-everest-advancing-the-science-of-program-proof%2F&data=04%7C01%7Cv-krdod%40microsoft.com%7Ca76b6c5ca88e48bd7a4508d8efd3f96a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637523040986822009%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=mjwZOScjcFjr1s%2FK%2FVcxxiZQS1VN5ZENBSH8afNdqvw%3D&reserved=0\" target=\"_blank\" rel=\"noopener noreferrer\">\u201cProject Everest: Advancing the science of program proof\u201d<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0<\/li><\/ul>\n\n\n\n\n\n<p>Software security exploits often begin with an attacker providing an unexpected input to a program, confusing the program and causing it to misbehave in a way that allows the attacker to gain access to a critical system, leading to crashes and service disruption, information disclosure, or even granting the attacker full control of the system.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Framework<\/span>\n\t\t\t<a href=\"https:\/\/project-everest.github.io\/everparse\/\" data-bi-cN=\"EverParse \" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>EverParse <\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>Most programs are designed to validate their inputs before further processing to avoid such scenarios, but input validation is hard to get right, particularly as both the message formats and the validation code are often optimized for performance. Improper input validation is the third most \u201ccommon and impactful\u201d vulnerability, according to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/cwe.mitre.org\/top25\/archive\/2020\/2020_cwe_top25.html\">MITRE\u2019s 2020 CWE Top 25 Most Dangerous Software Weaknesses list<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, with input validation bugs lurking at <em>all<\/em> levels of the software stack. Modern software is built on a tower of abstractions, spanning several trust boundaries. Each time more privileged code receives inputs from less privileged layers, a missing input validation check can cause the tower of abstractions to collapse. Input validation\u2014and, more generally, <em>parsing\u2014<\/em>is especially difficult when input formats are complex, such as when dealing with variable length data and other forms of data dependency. For efficiency reasons, formats can be densely encoded in binary and are expected to be parsed by low-level code in systems programming languages like C or C++, which are hard to use securely. A single missing check can lead to a buffer overrun and system compromise.<\/p>\n\n\n\n<p>One fruitful and widely adopted line of research is to discover flaws in parsers using <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/cacm.acm.org\/magazines\/2020\/2\/242350-fuzzing\/fulltext\">fuzzing<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a technique to automatically generate tests for programs. Fuzzing is indeed an effective strategy for identifying broad classes of vulnerabilities, but it\u2019s challenging to obtain strong guarantees of correctness or security from fuzzing (though not impossible as <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/proving-memory-safety-of-the-ani-windows-image-parser-using-compositional-exhaustive-testing\/\">some research from Microsoft Research shows<\/a>). For the most critical systems, we advocate instead automatically generating input parsers and validators from high-level declarative specifications of binary message formats. Importantly, using the program proof methodology and toolchain of Project Everest, we can produce efficient C code backed by mathematical proofs of correctness and security.<\/p>\n\n\n\n<p>In a&nbsp;<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/everparse\/\" target=\"_blank\" rel=\"noreferrer noopener\">2019 paper<\/a>, we&nbsp;introduced&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/project-everest.github.io\/everparse\/\" target=\"_blank\" rel=\"noopener noreferrer\">EverParse<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a framework that automatically produces high-performance, formally proven C code for parsing binary messages.&nbsp;EverParse&nbsp;is&nbsp;now being used to ensure that certain network virtualization components&nbsp;in the Windows kernel correctly parse attacker-controlled inputs. Put another way,&nbsp;network packets are handled first by&nbsp;EverParse&nbsp;code&nbsp;that is proven to correctly parse and accept only well-formed messages, rejecting potentially malicious inputs before they reach the rest of the system.&nbsp;Virtualized networking hardened with&nbsp;EverParse&nbsp;is available today for use via the&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/insider.windows.com\/en-us\/\" target=\"_blank\" rel=\"noopener noreferrer\">Windows Insider Program<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Future versions of Windows Server will contain the same hardening, improving the security of virtual machines and containers on Microsoft Azure.&nbsp;<\/p>\n\n\n\n<h2 id=\"everparse-a-mathematically-proven-parser-generator\">EverParse: A mathematically proven parser generator<\/h2>\n\n\n\n<p>With EverParse, programmers no longer need to write error-prone binary message parsers. Instead, from a declarative message format, EverParse generates code in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/fstar-lang.org\">F* programming language<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> together with automatically checkable formal proofs that the code is memory safe and correct\u2014for example, that it doesn\u2019t read outside the bounds of arrays and that it successfully parses exactly those messages that are formatted according to the input message specification. Once F* checks these proofs, it emits secure, high-performance C code for the generated parsers, as described in this <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/verified-low-level-programming-embedded-f\/\">paper about verified low-level programming in F*<\/a>. What\u2019s left is for programmers to focus on architecting their software to interpose EverParse\u2019s formally proven parsing code prior to their main application-specific processing.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"A message format specified in EverParse\u2019s input language with the language\u2019s semantic constraints on C\u2019s syntax of type definitions in blue text: \n\n \n\ncasetype _MessageUnion left parenthesis UINT32 tag right parenthesis \n\nLeft brace\u200b \n\n\u202f switch left parenthesis tag right parenthesis Left brace \u200b \n\n\u202f case INIT_MSG colon\u202f\u200b\u202f \n\n\u202f\u202f\u202f Init\u202f init semicolon\u200b \n\n\u202f case QUERY_MSG colon \u200b \n\n\u202f\u202f\u202f Query query semicolon\u200b\u200b \n\n\u202f case HALT_MSG colon\u200b  \n\n\u202f\u202f\u202f Halt\u202f halt semicolon\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u200b \n\n\u202f Right brace  \n\nRight brace MessageUnion semicolon \n\ntypedef struct _Message\u202f\u200b \n\nLeft brace\u200b \n\n\u202f\u202f UINT32 tag semicolon\u200b \n\n\u202f\u202f MessageUnion left parenthesis tag right parenthesis message semicolon\u200b  \n\nRight brace Message semicolon\u200b \n\n\u200btypedef struct _Messages\u202f\u200b \n\nLeft brace \n\n\u202f\u202f UINT32 size left brace size less than or equal to 1024 Right brace semicolon\u200b  \n\n\u202f\u202f Message messages left brace colon byte-size size right brace semicolon\u200b  \n\nRight brace Messages semicolon\u200b \n \n\ncasetype _MessageUnion(UINT32 tag) \u200b \n\n{\u200b \n\n\u202f switch(tag) {\u200b \n\n\u202f case INIT_MSG:\u202f\u200b\u202f \n\n\u202f\u202f\u202f Init\u202f init;\u200b \n\n\u202f case QUERY_MSG: \u200b \n\n\u202f\u202f\u202f Query query;\u200b \n\n\u202f case HALT_MSG:\u200b  \n\n\u202f\u202f\u202f Halt\u202f halt;\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u200b \n\n\u202f }\u200b  \n\n} MessageUnion;\u200b \n\ntypedef struct _Message\u202f\u200b \n\n{\u200b \n\n\u202f\u202f UINT32 tag;\u200b \n\n\u202f\u202f MessageUnion(tag) message;\u200b  \n\n} Message;\u200b \n\n\u200btypedef struct _Messages\u202f\u200b \n\n{\u200b \n\n\u202f\u202f UINT32 size { size <= 1024 };\u200b  \n\n\u202f\u202f Message messages[:byte-size size];\u200b  \n\n} Messages;\u200b \" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/EverParseFig1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"975\" height=\"374\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/EverParseFig1.png\" alt=\"A message format specified in EverParse\u2019s input language with the language\u2019s semantic constraints on C\u2019s syntax of type definitions in blue text: \n\n \n\ncasetype _MessageUnion left parenthesis UINT32 tag right parenthesis \n\nLeft brace\u200b \n\n\u202f switch left parenthesis tag right parenthesis Left brace \u200b \n\n\u202f case INIT_MSG colon\u202f\u200b\u202f \n\n\u202f\u202f\u202f Init\u202f init semicolon\u200b \n\n\u202f case QUERY_MSG colon \u200b \n\n\u202f\u202f\u202f Query query semicolon\u200b\u200b \n\n\u202f case HALT_MSG colon\u200b  \n\n\u202f\u202f\u202f Halt\u202f halt semicolon\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u200b \n\n\u202f Right brace  \n\nRight brace MessageUnion semicolon \n\ntypedef struct _Message\u202f\u200b \n\nLeft brace\u200b \n\n\u202f\u202f UINT32 tag semicolon\u200b \n\n\u202f\u202f MessageUnion left parenthesis tag right parenthesis message semicolon\u200b  \n\nRight brace Message semicolon\u200b \n\n\u200btypedef struct _Messages\u202f\u200b \n\nLeft brace \n\n\u202f\u202f UINT32 size left brace size less than or equal to 1024 Right brace semicolon\u200b  \n\n\u202f\u202f Message messages left brace colon byte-size size right brace semicolon\u200b  \n\nRight brace Messages semicolon\u200b \n \n\ncasetype _MessageUnion(UINT32 tag) \u200b \n\n{\u200b \n\n\u202f switch(tag) {\u200b \n\n\u202f case INIT_MSG:\u202f\u200b\u202f \n\n\u202f\u202f\u202f Init\u202f init;\u200b \n\n\u202f case QUERY_MSG: \u200b \n\n\u202f\u202f\u202f Query query;\u200b \n\n\u202f case HALT_MSG:\u200b  \n\n\u202f\u202f\u202f Halt\u202f halt;\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u202f\u200b \n\n\u202f }\u200b  \n\n} MessageUnion;\u200b \n\ntypedef struct _Message\u202f\u200b \n\n{\u200b \n\n\u202f\u202f UINT32 tag;\u200b \n\n\u202f\u202f MessageUnion(tag) message;\u200b  \n\n} Message;\u200b \n\n\u200btypedef struct _Messages\u202f\u200b \n\n{\u200b \n\n\u202f\u202f UINT32 size { size <= 1024 };\u200b  \n\n\u202f\u202f Message messages[:byte-size size];\u200b  \n\n} Messages;\u200b \" class=\"wp-image-736558\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/EverParseFig1.png 975w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/EverParseFig1-300x115.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/EverParseFig1-768x295.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/03\/EverParseFig1-16x6.png 16w\" sizes=\"auto, (max-width: 975px) 100vw, 975px\" \/><\/a><figcaption>Starting from a high-level language of message formats, EverParse automatically generates parsing code that is safe, correct, and fast (zero-copy). The example here shows a variable-length, data-dependent message format specified in EverParse\u2019s input language. The purple syntax highlighting represents the language\u2019s keywords, its types are in green, and the blue highlighting denotes its semantic constraints.<\/figcaption><\/figure><\/div>\n\n\n\n<p>The code sample above shows a simple example of an EverParse specification. The example highlights EverParse\u2019s support for variable-length, data-dependent message formats; EverParse uses a syntax that augments C\u2019s syntax of type definitions with various kinds of semantic constraints (highlighted in blue). The variable-length <code>Messages<\/code> structure pairs a <code>size<\/code> field, constrained to be at most 1024, with an array of <code>messages<\/code> whose size in bytes is exactly <code>size<\/code>. The <code>Message<\/code> structure itself pairs a tag with a <code>message<\/code> field, a data-dependent union, or a <code>casetype<\/code>, whose case is determined by the tag.<\/p>\n\n\n\n<p>From this specification, EverParse produces a formally proven C procedure to check that an untrusted input contains correctly formatted messages\u2014its signature in C is simply <code>BOOLEAN CheckMessages(uint8_t *bytes, uint32_t len)<\/code><em>. <\/em>That is, given <code>bytes <\/code>a pointer to an array of at least <code>len <\/code>bytes, <code>CheckMessages<\/code> decides whether those bytes correctly represent a <code>Messages <\/code>array. A proof in F* ensures that <code>CheckMessages<\/code> is correct, returning <code>true <\/code>whenever <code>bytes <\/code>contains a valid <code>Messages <\/code>and <code>false <\/code>otherwise, while reading each byte in the <code>bytes <\/code>array at most once and not modifying any existing memory or performing any heap allocation.<\/p>\n\n\n\n<p>What remains is to insert a call to <code>CheckMessages<\/code> in a program just as soon as untrusted input is received, using it to filter out only all ill-formed messages. In addition to simply validating untrusted input, EverParse also offers several other constructs to safely read relevant parts of the input for further application-specific processing and to report parsing errors. The parsers produced by EverParse can be used within a larger F* project, enabling proofs of correctness and security of complete software components. For example, we use EverParse in our verified implementations of the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/implementing-proving-tls-1-3-record-layer\/\">TLS 1.3<\/a> record layer, the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/security-model-verified-implementation-quic-record-layer\/\">QUIC<\/a> record layer, and the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dice-a-formally-verified-implementation-of-dice-measured-boot\/\">DICE<\/a> measured boot protocol, which are programmed and verified in F*. However, the C code produced from EverParse can also be directly integrated within an existing C or C++ project, hardening the attack surface of a critical piece of software by ensuring that messages are validated and parsed properly, as we describe next.<\/p>\n\n\n\n<h2 id=\"everparse-in-network-virtualization\">EverParse in network virtualization<\/h2>\n\n\n\n<p>We now turn to the use of&nbsp;EverParse&nbsp;in the Windows kernel to secure the virtual networking stack, specifically its use in <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/windows-server\/virtualization\/hyper-v-virtual-switch\/hyper-v-virtual-switch\">Hyper-V Virtual Switch<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Virtual Switch provides virtualized access to the network to virtual machines (VM). Virtual Switch runs in the privileged <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/docs.microsoft.com\/en-us\/virtualization\/hyper-v-on-windows\/reference\/hyper-v-architecture\">root partition<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> of Hyper-V and dispatches network packets to and from the VM guest and hardware network interface card. Efficient and secure handling of these network packets is key to the proper functioning of several critical systems, including Microsoft Azure.<\/p>\n\n\n\n<p>Together with the Network Virtualization team, we\u2019ve specified the format of more than a hundred different kinds of messages spanning four different protocol layers handled by Virtual Switch. EverParse produces about 30,000 lines of verified C code for validating and parsing those messages. While describing those message formats in EverParse requires careful specification engineering and discovery (some of these protocols involve proprietary formats with a long history of evolution), the generated C code can be deployed with confidence in its correctness and security. In the particular context of Virtual Switch, in addition to memory safety and correctness, EverParse proves the code free from <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.usenix.org\/conference\/usenixsecurity17\/technical-sessions\/presentation\/wang-pengfei\">double fetches<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, protecting it from time-of-check\/time-of-use bugs.<\/p>\n\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1002645\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">Spotlight: AI-POWERED EXPERIENCE<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" aria-label=\"Microsoft research copilot experience\" data-bi-cN=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2024\/01\/MSR-Chat-Promo.png\" alt=\"\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">Microsoft research copilot experience<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"microsoft-research-copilot-experience\" class=\"large\">Discover more about research at Microsoft through our AI-powered experience<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/aka.ms\/research-copilot\/?OCID=msr_researchforum_Copilot_MCR_Blog_Promo\" aria-describedby=\"microsoft-research-copilot-experience\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"Microsoft research copilot experience\" target=\"_blank\">\n\t\t\t\t\t\t\tStart now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n\n<p>Of course, Virtual Switch is also a performance-critical component. A major concern was whether adding additional input validation checks would significantly reduce its performance. As such, we spent a substantial amount of effort in ensuring the code produced by EverParse is fast. By design, EverParse validators operate in place without copying any data and, because of their double-fetch freedom, guarantee to never read a memory location more than once. Further, we designed our specifications and input validation strategy in a layered manner, to support incremental input validation rather than incurring the upfront cost of validating a packet in its entirety before processing. All these measures ensured that the use of verified parsers didn\u2019t introduce any noticeable performance overhead, and in a few cases, we were able to remove redundant copies and checks in existing code since these checks were already performed by EverParse.<\/p>\n\n\n\n<p>Aside from the technical improvements to security, the use of EverParse also increased the productivity of the team.<\/p>\n\n\n\n<p>\u201cEverParse was <em>instrumental<\/em> in our ability to implement the product,\u201d says <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.linkedin.com\/in\/omarcardona\/\">Omar Cardona<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, the engineering lead of the Network Virtualization team, adding that \u201cthe scope and complexity of protocols\/messages involved could not have been addressed manually to meet the project\u2019s timelines.\u201d <\/p>\n\n\n\n<p>Additionally, while maintaining our code in the Windows code base over the past 18 months, EverParse\u2019s formal proofs of correctness and security have provided confidence that code changes don\u2019t introduce unnecessary risk. For instance, when refactoring EverParse specifications, we prove in F* that no semantic changes were inadvertently introduced.<\/p>\n\n\n\n<h2 id=\"scrap-your-handwritten-parsers\">Scrap your handwritten parsers<\/h2>\n\n\n\n<p>We\u2019ve replaced handwritten input validation code with automatically generated, provably safe parsers in Virtual Switch, but this is just the beginning. There are attack surfaces exposed to input validation bugs throughout the software ecosystem, and hardening them all is a long road. We believe EverParse can help.<\/p>\n\n\n\n<p>To conclude, we offer some perspective on the landscape of input validation and parsing, particularly in the context of security-critical low-level code.<\/p>\n\n\n\n<p>Programmers in high-level languages rarely write message parsers by hand; they use parser generators or language-integrated message formats such as JSON. Yet, for performance-critical low-level code, handwritten parsers for custom binary formats are still the norm, with parsing bugs leading to unexpected attacker-controlled <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/www.usenix.org\/publications\/login\/december-2011-volume-36-number-6\/exploit-programming-buffer-overflows-weird\">\u201cweird machines.\u201d<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> Language-integrated message formats for low-level code are beginning to see some uptake, notably with frameworks like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/protocolbuffers\/protobuf\">protobuf<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/google\/flatbuffers\">FlatBuffers<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/github.com\/capnproto\/capnproto\">Cap\u2019n Proto<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. However, in situations where the wire format is dictated by external constraints\u2014as prescribed by a standard or by legacy constraints, for example\u2014a more flexible way of specifying and parsing custom binary formats is still needed. Besides, existing frameworks don\u2019t provide mathematical assurances about the correctness and security of the generated parsers.<\/p>\n\n\n\n<p>EverParse seeks to make handwritten low-level parsers a thing of the past. With the state of the art in program proofs, we can generate highly efficient, provably correct, and secure low-level parsers from high-level message formats and integrate them within both fully verified components and legacy applications.<\/p>\n\n\n\n<p>EverParse is open source on <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/project-everest.github.io\/everparse\/\">GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Give it a spin and get in touch!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>EverParse (opens in new tab) is a framework for generating provably secure parsers and formatters used to improve the security of critical code bases at Microsoft. EverParse is developed as part of Project Everest (opens in new tab), a collaboration between Microsoft Research labs in Redmond, Washington; India; and Cambridge, United Kingdom; the Microsoft Research-Inria [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":743242,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Tahina Ramananandro","user_id":"36293"},{"type":"user_nicename","value":"Aseem Rastogi","user_id":"36021"},{"type":"user_nicename","value":"Nikhil Swamy","user_id":"33138"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13558],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-734818","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-security-privacy-cryptography","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561,199562,199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144812,663087],"related-projects":[235367],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Tahina Ramananandro","user_id":36293,"display_name":"Tahina Ramananandro","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/taramana\/\" aria-label=\"Visit the profile page for Tahina Ramananandro\">Tahina Ramananandro<\/a>","is_active":false,"last_first":"Ramananandro, Tahina","people_section":0,"alias":"taramana"},{"type":"user_nicename","value":"Aseem Rastogi","user_id":36021,"display_name":"Aseem Rastogi","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/aseemr\/\" aria-label=\"Visit the profile page for Aseem Rastogi\">Aseem Rastogi<\/a>","is_active":false,"last_first":"Rastogi, Aseem","people_section":0,"alias":"aseemr"},{"type":"user_nicename","value":"Nikhil Swamy","user_id":33138,"display_name":"Nikhil Swamy","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nswamy\/\" aria-label=\"Visit the profile page for Nikhil Swamy\">Nikhil Swamy<\/a>","is_active":false,"last_first":"Swamy, Nikhil","people_section":0,"alias":"nswamy"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-960x540.jpg\" class=\"img-object-cover\" alt=\"An animation showing an example of a high-level language message format specified by EverParse. From the message, two arrows labeled \u201cEverParse\u201d point to a rectangle labeled \u201cformal specification\u201d and a rectangle labeled \u201clow-level implementation,\u201d respectively, inside a larger rectangle labeled \u201cF* code.\u201d The figure represents EverParse\u2019s ability to automatically generate safe, correct, and fast F* parsing code. \u201cCorrectness\u201d is defined as \u201cCorrectness:\u202f\u200bparse (serialize msg) = msg\u200b\u201d and \u201cvalid msg ==&gt; serialize (parse msg) = msg\u200b.\u201d The F* logo appears with the description that F* is a type theory\u2013based programming language and proof assistant that can prove theorems about programs\u200b. From the \u201cF* code\u201d rectangle, arrows point from the \u201clow-level implementation\u201d rectangle and a rectangle labeled \u201cverified libraries for combinators\u201d to a rectangle labeled \u201cSafe high-performance C code.\u201d\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-1024x577.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-1536x865.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-16x9.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/05\/1400x788_everparse_no_logo_still-1-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/taramana\/\" title=\"Go to researcher profile for Tahina Ramananandro\" aria-label=\"Go to researcher profile for Tahina Ramananandro\" data-bi-type=\"byline author\" data-bi-cN=\"Tahina Ramananandro\">Tahina Ramananandro<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/aseemr\/\" title=\"Go to researcher profile for Aseem Rastogi\" aria-label=\"Go to researcher profile for Aseem Rastogi\" data-bi-type=\"byline author\" data-bi-cN=\"Aseem Rastogi\">Aseem Rastogi<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/nswamy\/\" title=\"Go to researcher profile for Nikhil Swamy\" aria-label=\"Go to researcher profile for Nikhil Swamy\" data-bi-type=\"byline author\" data-bi-cN=\"Nikhil Swamy\">Nikhil Swamy<\/a>","formattedDate":"May 3, 2021","formattedExcerpt":"EverParse (opens in new tab) is a framework for generating provably secure parsers and formatters used to improve the security of critical code bases at Microsoft. EverParse is developed as part of Project Everest (opens in new tab), a collaboration between Microsoft Research labs in&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/734818","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=734818"}],"version-history":[{"count":15,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/734818\/revisions"}],"predecessor-version":[{"id":845764,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/734818\/revisions\/845764"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/743242"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=734818"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=734818"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=734818"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=734818"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=734818"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=734818"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=734818"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=734818"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=734818"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=734818"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=734818"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}