A Design and Verification Methodology for Secure Isolated Regions

Rohit Sinha1 Manuel Costa2 Akash Lal3 Nuno P. Lopes2
Sriram Rajamani3 Sanjit A. Seshia1 Kapil Vaswani2

1University of California at Berkeley, USA 2Microsoft Research, UK 3Microsoft Research, India
{rsinha,sseshia}@eecs.berkeley.edu {manuelc,akashl,nlopes,sriram,kapilv}@microsoft.com

Abstract
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running on a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime library that implements the interface. The runtime library includes core services such as secure communication channels and memory management. We formalize this restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human-assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application’s machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime’s internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.

Categories and Subject Descriptors D.4.6 [Operating Systems]: Security and Protection — Information flow controls; D.2.4 [Software Engineering]: Software/Program Verification; F.3.1 [Programming Languages]: Specifying and Verifying and Reasoning about Programs

Keywords Enclave Programs; Secure Computation; Confidentiality; Formal Verification

1. Introduction
Building applications that preserve confidentiality of sensitive code and data is a challenging task. An application’s secrets can be compromised due to a host of reasons including vulnerabilities in the operating system and hypervisor, and insufficient access control. Recognizing this problem, processor vendors have started to support hardware-based containers (such as Intel SGX enclaves [21] and ARM TrustZone trustlets [3]) for isolating sensitive code and data from hostile or compromised hosts. We refer to containers that provide isolation as Secure Isolated Regions (SIR). We assume a powerful adversary who controls the host OS, hypervisor, network, storage and other datacenter nodes as illustrated in Figure 1. Since applications must still rely on the compromised host OS for basic services such as storage and communication, the burden of programming SIRs correctly and ensuring confidentiality remains with the programmer. In this paper, we propose a design methodology and tool support for building code to run in SIRs that can be automatically certified to preserve confidentiality, even in the presence of such a powerful adversary.

Confidentiality can be expressed as an information-flow policy that tracks the flow of secrets through the program, and checks whether secrets may explicitly or implicitly flow to some state that the adversary can observe [7 8 16 31]. Some approaches to certifying confidentiality use programming languages that can express information-flow policies [17 36 43]. However, these approaches require use of particular programming languages and incur a heavy annotation burden. More importantly, they place the compiler and the language runtime in the Trusted Computing Base (TCB) [37]. In this paper, we instead explore certifying the machine code loaded into SIRs, to avoid these trust dependencies.

Recent work [40] can provide assurance that machine code satisfies general confidentiality policies, but it does not
scale to large programs. In this paper, we propose a new methodology for certifying confidentiality that addresses this limitation. We propose to verify a specific confidentiality policy: the code inside the SIR can perform arbitrary computations within the region, but it can only generate output data through an encrypted channel. We refer to this property as Information Release Confinement or IRC. This is a meaningful confidentiality policy because it guarantees that attackers can only observe encrypted data. We exclude covert channels and side channels (e.g., timing, power) from our threat model. Previous experience from building sensitive data analytics services using SIRs suggests that IRC is not unduly restrictive. For example, in VC3 [38], only map and reduce functions are hosted in SIRs; the rest of the Hadoop stack is untrusted. Both mappers and reducers follow a stylized idiom where they receive encrypted input from Hadoop’s untrusted communication layer, decrypt the input, process it, and send encrypted output back to Hadoop. No other interaction between these components and the untrusted Hadoop stack is required.

Scalably verifying IRC is challenging, because we aim to have a verification procedure that can automatically certify machine code programs, without assuming the code to be type safe or memory safe; for example, programs may have exploitable bugs or they may unintentionally write information out of the SIR through corrupted pointers. Our approach is based on decomposing programs into a user application \( U \) that contains the application logic and a small runtime library \( L \) that provides core primitives such as memory management and encrypted channels for communication. We restrict the interaction between the user application and the untrusted platform to the narrow interface implemented by \( U \). A key contribution of this paper is how this methodology enables decomposing the confidentiality verification in two parts. For \( U \), we need to verify that it implements a secure encrypted channel and memory management correctly; \( L \) is a small library that can be written and verified once and for all. For \( U \), we need to verify a series of constraints on memory loads, stores, and control-flow transitions. Specifically, we need to check that stores do not write data outside the SIR, stores do not corrupt control information (e.g., return addresses and jump tables) inside the SIR, stores do not corrupt the state of \( L \), loads do not read cryptographic state from \( L \) (since using cryptographic state in \( U \) could break the safety of the encrypted channel [9]), calls go to the start of functions in \( U \) or to the entry points of API functions exported by \( L \), and jumps target legal (i.e. not in the middle of) instructions in the code. These checks on \( U \) are a weak form of control-flow integrity, and along with restrictions on reads and writes, enforce a property which we call WCFI-RW. We show that the functional correctness of \( L \) combined with WCFI-RW of \( U \) implies our desired confidentiality policy (IRC).

In this paper, we formalize WCFI-RW and propose a two-step process to automatically verify that an application satisfies WCFI-RW. We first use a compiler that instruments \( U \) with runtime checks [38]. Next, we automatically verify that the instrumentation in the compiled binary is sufficient for guaranteeing WCFI-RW. Our verifier generates verification conditions from the machine code of the application, and automatically discharges them using an SMT solver. This step effectively removes the compiler as well as third-party libraries from the TCB. By verifying these libraries, users can be certain that they do not leak information either accidentally, through exploitable bugs, or by intentionally writing data out of the SIRs.

Our runtime checks are related to previous work on software fault isolation (SFI [27, 39, 44, 47]), but we note that a simple SFI policy of sandboxing all the code in the SIR would not be enough to guarantee IRC, because \( U \) and \( L \) share the same memory region. Our checks implement a form of fine-grained memory protection inside the SIR, which we use to separate the state of \( U \) and \( L \) and guarantee the integrity of control data. Moreover, our checks work well for SGX enclaves on x64 machines (our target environment), whereas previous work would require significant modifications to work efficiently in this environment. One of our key contributions is showing that the instrumentation of \( U \), together with the properties of \( L \), guarantees IRC; this work is the first that verifies and guarantees IRC using such instrumentation.

Our approach is significantly different from verifying arbitrary user code with unconstrained interfaces for communication with the untrusted platform. We require no annotations from the programmer — all of \( U \)’s memory is considered secret. The TCB is small — it includes the verifier but...
does not include $U$ or the compiler. Furthermore, the assertions required to prove WCFL-RW are simple enough to be discharged using an off-the-shelf Satisfiability Modulo Theories (SMT) solver. The verification procedure is modular, and can be done one procedure at a time, which enables the technique to scale to large programs. Our experiments suggest that our verifier can scale to real-world applications, including map-reduce tasks from VC3 [53].

In summary, the main contributions of this paper are: (1) a design methodology to program SIRs using a narrow interface to a library, (2) a notion called Information Release Confinement (IRC), which allows separation of concerns while proving confidentiality in the presence of a privileged adversary (that controls the OS, hypervisor, etc.), and (3) a modular and scalable verification method for automatically checking IRC directly on application binaries.

2. Overview

Secure Isolated Regions. Protecting confidentiality of applications using trusted processors is a topic of wide interest [3, 24, 28, 34]. These processors provide primitives to create memory regions that are isolated from all the other code in the system, including operating system and hypervisor. We refer to such an abstraction as a Secure Isolated Region or SIR. The processor monitors all accesses to the SIR: only code running in the SIR can access data in the SIR. As an example, Intel’s SGX instructions enable the creation of SIRs (called “enclaves”) in the hosting application’s address space. The SIR can access the entire address space of the hosting application, which enables efficient interaction with the untrusted platform. External code can only invoke code inside the region at statically-defined entry-points (using a call-gate like mechanism). The processor saves and restores register context when threads exit and resume execution inside the SIR. To protect against physical attacks on the RAM, the processor also encrypts cache lines within the SIR on eviction; the cache lines are decrypted and checked for integrity and freshness when they are fetched from physical memory. SGX also supports attestation and sealing. Code inside an SIR can get messages signed using a per processor private key along with a cryptographic digest of the SIR. This enables other trusted entities to verify that messages originated from the SIR. Primitives to create SIRs can also be provided by hypervisors [11, 20, 45], with the caveat that the hypervisor becomes part of the TCB, but in this paper we assume SIRs are provided directly by the processor. Regardless of the primitives and infrastructure used to implement SIRs, application developers who write the code that runs inside SIRs are responsible for maintaining confidentiality of secrets managed by the SIR. The goal of this paper is to provide a methodology and tool support for helping application developers ensure confidentiality.

Threat Model We assume a user that wishes to protect the confidential data processed by an application $U$. The application runs inside an SIR in a hostile or compromised host. We assume that $U$ is not designed to write confidential data outside the SIR. However, $U$ may have bugs such as accidental writes of confidential data to non-SIR memory, as well as exploitable bugs, such as buffer overflows, use-after-free, and dereferences of uninitialized or corrupted pointers, which could result in confidential data leaking out of the SIR. $U$ may also include third-party libraries that intentionally try to write confidential data outside the SIR. Therefore, we treat $U$’s code as untrusted and verify that $U$ cannot leak secrets even if it has exploitable bugs.

The illustration of our threat model in Figure 1 lists all the system components that are under the attacker’s control. The adversary may fully control all of the hardware in the host computer, including disks, network cards, and all the chips (including RAM chips) in the system, except the processor that runs the SIR. The adversary may record, replay, and modify network packets or files, as well as read or modify data after it leaves the processor chip using physical probing, direct memory access (DMA), or similar techniques. We assume that the adversary cannot physically attack the processor to extract secrets. The adversary may also control all of the system software in the host computer, including the operating system and the hypervisor. This adversary is general enough to model privileged malware running in the kernel, as well as a malicious system administrator who may try to access the data by logging into the host and inspecting disks and memory. Denial-of-service attacks, side-channel attacks, and covert-channel attacks are outside the scope of this paper.

Verifying confidentiality. We illustrate the challenges in verifying that code running inside an SIR satisfies confidentiality. Consider the Reduce method in Figure 2, which acts as a reducer in a map-reduce job. The reducer receives encrypted a key and list of values from different mappers. The method first provisions an encryption key (of type KeyAesGcm, not to be confused with the data key) by setting up a secure channel with a component that manages keys (not shown for brevity). It decrypts the key and values, computes the sum of all values, and writes the output to a buffer. The buffer is encrypted and written to a location outside the region specified by the map-reduce framework.

Proving that this Reduce method preserves confidentiality is challenging. The code writes the result of the computation to a stack-allocated buffer without checking the size of the inputs. This may result in a vulnerability that can be exploited to overwrite the return address, execute arbitrary code and leak secrets. Therefore, the proof must show that the cumulative size of key and result does not exceed the buffer size. Such a proof may require non-modular reasoning since the sizes may not be defined locally. Furthermore, Reduce writes to a location outside the SIR. The proof must show that the data written is either encrypted or independent of secrets. The latter requires precise, fine-grained tracking of secrets in SIR’s memory. Also, unrelated to the application logic, we note that the Reduce method manually pro-
Restricting the application to this interface serves an important purpose — it allows us to decompose the task of verifying confidentiality into two sub-tasks: (1) checking that the user application $U$ communicates with the external world only through this interface, and (2) checking that the implementation of the interface in $L$ does not leak secrets. This paper focuses on Information Release Confinement (IRC): $U$ can only write to non-SIR memory by invoking the send API. Proving information leakage properties of the implementation of send (e.g., strong encryption property, resistance to various side channels, etc.) would require one-time human-assisted verification of the library code $L$, and is left for future work. We feel that IRC is an important stepping stone for achieving confidentiality guarantees of SIRs against various adversaries.

We do not find this narrow interface to be restrictive for our target applications: trusted cloud services, which are typically implemented as a collection of distributed and trusted entities. In this setting, the application ($U$) only sends and receives encrypted messages from remote parties. We use this approach in practice to build sensitive data analytics, database services, key manager, etc.

Checking IRC. For scalable and automatic verification, we further decompose IRC into a set of checks on $U$ and contracts on each of the APIs in $L$. First, we verify that $U$ sat-
sifies WCFI-RW: \( U \) transfers control to \( L \) only through the API entry points, and it does not read \( L \)'s memory or write outside \( U \)'s memory. Apart from the constraints on reads and writes, WCFI-RW requires a weak form of control-flow integrity in \( U \). This is needed because if an attacker hijacks the control flow of \( U \) (e.g., by corrupting a function pointer or return address), then \( U \) can execute arbitrary instructions, and we cannot give any guarantees statically. Next, we identify a small contract on \( L \) that, in conjunction with WCFI-RW property of \( U \), is sufficient for proving IRC. The contracts on \( L \) (defined in section 4.2) ensure that only send gets to write to non-SIR memory, and that \( L \) does not modify \( U \)'s state to an extent that WCFI-RW is compromised.

From here on, we describe how we check WCFI-RW directly on the machine code of \( U \), which enables us to keep the compiler and any third-party application libraries out of the TCB. Third-party libraries need to be compiled with our compiler, but they can be shipped in binary form. Furthermore, to help the verification of WCFI-RW, our (untrusted) compiler adds runtime checks to \( U \)'s code, and we use the processor's paging hardware to protect \( L \)'s memory — although the attacker controls the page tables, the processor protects page table entries that map to SIR memory. We use the same runtime checks on memory writes and indirect calls as VC3 [38], but we implement efficient checks on memory reads using the paging hardware.

Next, we describe the checks done by the VC3 instrumentation. We first note that \( U \) and \( L \) share the same address space (Figure 3). The code segments of both \( U \) and \( L \) are placed in executable, non-writable pages. The region also contains a stack shared by both \( U \) and \( L \). We isolate \( L \) from \( U \) by using a separate heap for storing \( L \)'s internal state. The compiler enforces WCFI-RW by instrumenting \( U \) with the following checks:

- **Protecting return addresses**: To enforce that writes through pointers do not corrupt return addresses on the stack, the compiler maintains a bitmap (write_bitmap in Figure 3) to record which areas in \( U \)'s memory are writable. The write_bitmap is updated at runtime, while maintaining the invariant that a return address is never marked writable. Address-taken stack variables are marked writable by inlined code sequences in function prologues, and heap allocations and address-taken globals are marked writable by the runtime library. Store instructions are instrumented with an instruction sequence (instructions from label L1 to L2 in Figure 3) that reads the write_bitmap and terminates the SIR program if the corresponding bit is not set. Note that the bitmap protects itself: the bits corresponding to the bitmap are never set.

- **Protecting indirect control flow transfers**: The compiler maintains a separate bitmap (call_bitmap in Figure 3) that records the entry points of procedures in \( U \) and APIs of \( L \). The compiler instruments indirect calls with an instruction sequence that reads the bit corresponding to the target address, and terminates the SIR program if that bit is unset. Indirect jumps within the procedure are also checked to prevent jumps to the middle of instructions; the checks consist of a range check on indices into jump tables (which are stored in read-only memory).

- **Preventing writes outside SIR**: The compiler adds range checks to prevent writes to non-SIR memory (instructions from label L0 to L1 in Figure 3).

We note that the VC3 instrumentation offers a stronger guarantee than IRC, and prevents several forms of memory corruption inside the SIR. Specifically, the checks guarantee: 1) integrity of all data that is not address-taken, 2) detection of all sequential overflows and underflows on heap and stack, 3) integrity of return addresses, and 4) forward edge CFI. Future work may be able to verify all these properties, but for this work we focus on verifying that the VC3 instrumentation is sufficient to guarantee WCFI-RW, because WCFI-RW together with the correctness of \( L \) implies IRC (as we show in Theorem 1), and IRC is a strong property that provides meaningful security guarantees.

Preventing \( U \) from accessing \( L \)'s memory is also important because \( L \) keeps cryptographic secrets; allowing \( U \) to read such secrets could break the typical assumptions of encryption algorithms (e.g., a key should not be encrypted with itself [9]). We achieve this efficiently by requiring \( L \) to store any such secrets in its own separate heap, and using SGX memory protection instructions [21] to set page permissions to disable read/write access before transferring control to \( U \). Note that we cannot use page permissions to prevent writes to non-SIR memory, because those page tables are controlled by a privileged adversary, e.g., kernel-level malware.

Let’s reconsider the Reduce method in Figure 2(b). Compiling this method with the preceding runtime checks and necessary page permissions ensures that any vulnerability (e.g., line 17) cannot be used to violate WCFI-RW. Any such violation causes the program to halt.

To achieve good performance, the compiler omits runtime checks that it can statically prove to be redundant (e.g., writes to local stack-allocated variables). However, compilers are large code bases and have been shown to have bugs and produce wrong code at times [6, 23, 26, 29, 46]. The rest of the paper describes our approach for automatically verifying that the machine code of \( U \) satisfies WCFI-RW, thereby removing the compiler from the TCB. We also show that such a \( U \) satisfies IRC when linked with an \( L \) that satisfies our contract (defined in section 4.2).

3. Formal Model of User Code and Adversary

In this section we define a language, as shown in Figure 4, to model the user code \( U \). Each machine code instruction in \( U \) is translated to a sequence of statements (Stmt) in the presented language. This translation is intended to reduce the
The syntax of U code.

Variables in \( Vars \) consist of \( \text{regs}, \text{flags}, \) and \( \text{mem} \). \( \text{regs} \) are CPU registers (e.g., \( \text{rax}, \text{r8}, \text{rsp}, \text{rip} \), etc.) that are 64-bit values in the case of x64. CPU flags (e.g., \( \text{CF}, \text{ZF} \), etc.) are 1-bit values. The instruction pointer (\( \text{rip} \)) stores the address of the next instruction to be executed and is incremented automatically after every instruction except in those that change the control flow. Memory (\( \text{mem} \)) is modeled as a map from 64-bit bit-vectors to 8-bit bit-vectors.

Memory accesses are encoded using \( \text{load}_n \) and \( \text{store}_n \) functions, where \( n \) denotes the size of the data in bytes. \( \text{load}_n \) and \( \text{store}_n \) are axiomatized using SMT’s theory of arrays. Bit-vector operations (\( \text{add}, \text{sub}, \) etc) are axiomatized using SMT’s bit-vector theory.

Assignment statements can be one of following two forms: (1) \( v := e \) sets \( v \in Vars \) to the value of expression \( e \), (2) \( \text{reg} := \text{load}_n(e) \) sets \( \text{reg} \in \text{regs} \) to the value of the memory at address \( e \). Statement \( \text{assume} \ \phi \) blocks when executed in a state that does not satisfy \( \phi \), and is equivalent to a no-op when executed in a state that satisfies \( \phi \). Executing \( \text{assert} \ \phi \) in a state that does not satisfy \( \phi \) leads to an error.

Control flow is changed with \( \text{call}, \text{ret}, \) and \( \text{jmp} \) statements, which override the value of \( \text{rip} \). A procedure call returns to the following instruction (we verify a form of control flow integrity to guarantee this). The \( \text{ret} \) statement terminates execution in the procedure and returns back to the caller. Both \( \text{call} \) and \( \text{ret} \) are semantically equivalent to the x64 call and return instructions, respectively — \( \text{call} \) pushes the return address on the stack, and the \( \text{ret} \) instruction pops the stack before returning. \( \text{jmp} \ e \) encodes a jump to an arbitrary location, either in the current procedure or the beginning of a procedure (as an artifact of tail call optimizations).

The choice statement \( s \circ t \) non-deterministically executes either \( s \) or \( t \). Choice statements together with assume statements are used to model conditional statements. A typical conditional statement if \( (\phi) \ {\{s\}} \) else \( \{t\} \) is modeled as \( \{\text{assume} \ \phi; \ s\} \circ \{\text{assume} \ \neg \phi; \ t\} \).

The havoc \( v \) statement assigns a fresh, symbolic value to the variable \( v \in Vars \). When havocing memory variables, a havoc statement may optionally specify a predicate to characterize which memory locations are modified; for instance, \( \text{havoc}_{\phi} \text{mem} \) statement scrambles all memory locations specified by predicate \( \phi \). Intuitively, this creates a new memory \( \text{mem}' \) whose content is the same as in memory \( \text{mem} \) for all addresses that do not satisfy \( \phi \), as follows:

\[
\text{assume} \ \forall a. \ \neg \phi(a) \Rightarrow \text{mem}'[a] = \text{mem}[a];
\]

\[
\text{mem} := \text{mem}'
\]

We note that the user code \( U \) does not contain havoc statements. However, we use the havoc statement to model actions of the adversary, as we show later.

Software interrupts (e.g., \text{int} 3 instruction) terminate the execution of a SIR; we model them using \( \text{assume false} \). Other exceptions (such as division by zero, general protection faults, etc.) also cause the SIR to terminate, which simplifies both the modeling and verification. We do not find this to be a limitation in our experience writing SIR programs.

We define state \( \sigma \) to be a valuation of all variables in \( Vars \). Let \( \sigma(v) \) be the value of a variable \( v \in Vars \) in state \( \sigma \) and similarly let \( \sigma(e) \) be the valuation of expression \( e \) in state \( \sigma \). Let \( \text{stmt}(\sigma) \) be the statement executed in state \( \sigma \) (computed from the instruction pointer and the code in memory). The semantics of a statement \( s \in Stmt \) is given by relation \( R \) over pairs of pre and post states, where \( (\sigma, \sigma') \in R \) if and only if \( s = \text{stmt}(\sigma) \) and there is an execution of \( s \) starting at \( \sigma \) and ending in \( \sigma' \). We define operational semantics for a subset of \( Stmt \) in Figure 4 and use standard semantics for the remaining statements. A sequence \( \pi = [\sigma_0, \ldots, \sigma_n] \) is called an execution trace if \( (\sigma_i, \sigma_{i+1}) \in R \) for each \( i \in \{0, \ldots, n-1\} \). We also use \( \text{stmt}(\pi) \) to denote the sequence of statements executed in \( \pi \). Furthermore, in order to decouple the verification of \( U \) from the verification of \( L \), we only let \( \pi \) to “step into” procedures of \( U \), i.e., procedures in \( L \)’s code are executed atomically and therefore only contribute with one state transition to execution traces.

The initial state \( \sigma_{\text{entry}} \) is the result of a jump from \( L \)’s code into \( U \)’s entry point. Prior to the jump, \( L \) performs the necessary initialization to support the memory management (\text{malloc} and \text{free}) and communication (\text{send} and \text{recv}) services. For instance, \( L \) may engage in a key exchange protocol with remote entities to establish the key used by \text{send} and \text{recv}. \( \sigma_{\text{entry}} \) assigns an arbitrary value to addresses of \text{mem} that include the non-SIR memory, \( U \)’s stack, and \( U \)’s heap. \( \sigma_{\text{entry}} \) also assigns an arbitrary value to all \text{regs}, \text{flags}, except for the stack pointer \text{rsp} which points to \( U \)’s stack.

**Modeling the adversary** Our formal model of the adversary is similar to Moot [40]. The adversary may force the host to transfer control from the SIR to the adversary’s code at any time during the execution of the SIR (by generating
This composed model, hereby called $U_H$, encodes all possible behaviors of $U$ in the presence of $H$.

### 4. Formalizing Confidentiality

Confidentiality is typically defined as a hyper-property [12], where we require that the adversary cannot infer secret state based on observations. Automatically checking if a program satisfies confidentiality usually requires precise tracking of the memory locations that may contain secrets during execution. This is accomplished with either a whole-program analysis of $U_H$, which is hard to scale for machine code, or fine-grained annotations which specify locations with secrets, which is cumbersome for machine code [40]. We follow a different approach in order to scale the verification to large programs without requiring any annotation.

We expect $U_H$ to communicate with non-SIR entities, but follow a methodology where we mandate all communication to occur via the send and recv APIs in $L$. We require (and verify) that $U_H$ does not write to non-SIR memory. Instead, $U_H$ invokes send, which takes as an argument a pointer to the input buffer, and encrypts and integrity-protects the buffer before copying out to non-SIR memory. This methodology leads to a notion called information release confinement (IRC), which mandates that the only statements in $\pi$ that update non-SIR memory (apart from $H$’s havoc operations) are the call send statements.

**DEFINITION 2. Information Release Confinement.** An execution trace $\pi = [\sigma_0, \ldots, \sigma_n]$ of $U_H$ satisfies information release confinement or IRC, if all updates to the adversary observable state (i.e., $\text{mem}_{-\text{SIR}}$ in $\pi$) are caused by either call send statements or havoc-$\text{SIR}$ mem operations (from $H$), i.e., IRC($\pi$) iff:

$$\forall i \in \{0, \ldots, n-1\}. \quad (\text{stmt}(\sigma_i) \neq \text{call send} \land \text{stmt}(\sigma_i) \neq \text{havoc}_{-\text{SIR}} \text{ mem}) \Rightarrow \left( \bigvee a. \neg \text{SIR}(a) \Rightarrow \sigma_i(\text{mem}[a]) = \sigma_{i+1}(\text{mem}[a]) \right)$$

(2)

$U_H$ satisfies IRC iff all traces of $U_H$ satisfy IRC.

**IRC and Confidentiality.** IRC is an important building block in guaranteeing confidentiality. It ensures that, in any execution, the only outbound communication with the environment is via send. Hence, we can arrange for send to encrypt all its received data before transmitting it, to prevent explicit information leaks. In order for the encrypted data to be confidential, we additionally need to ensure that the encryption key in $L$ does not leak or gets overwritten. The definition of IRC enables us to separate properties we require from the application code $U_H$, and properties we require from $L$, in order to guarantee confidentiality.

It is important to note that IRC is not sufficient for protecting secrets from all side channels. Observations of the number and timing of send invocations, memory access patterns, electromagnetic radiation, etc. potentially reveal secrets. Nevertheless, if an application $U_H$ satisfies IRC, then we can eliminate certain channels and obtain various

---

**Figure 5.** Operational semantics of $s \in \text{Stmt}$: $(s, \sigma) \Downarrow \sigma'$ and $\text{stmt}(\sigma) = s, \sigma[x \leftarrow y]$ denotes a state that is identical to $\sigma$, except variable $x$ evaluates to $y$. The memory update expression $\text{mem}[x := y]$ returns a new memory that is equivalent to $\text{mem}$, except for index $x$ — multibyte-sized accesses follow the processor’s endianness semantics. $\text{next}(e)$ is the address of the subsequent instruction in $U$ after decoding the instruction starting at address $e$.

an interrupt, for example). Once the CPU transfers control from the SIR to the adversary, the adversary may execute an arbitrary sequence of instructions before transferring control back to the SIR. The adversarial operations include arbitrary updates to non-SIR memory, privileged state accessible to the OS and hypervisor layers (e.g. page tables), registers, and devices. Most defines the following active adversary $H$, which only havoc non-SIR memory, and proves a theorem that $H$ models all adversarial operations in our threat model.

**DEFINITION 1. Havocing Active Adversary $H$.**

Between any consecutive statements in an execution of $U$, $H$ may observe and perform a single havoc on mem$_{-\text{SIR}}$, where mem$_{-\text{SIR}}$ is the set of locations outside the SIR boundary (¬SIR).

**Composing $U$ with active adversary $H$.** We transform each statement $s$ within $U$ to:

$$\text{havoc}_{-\text{SIR}} \text{ mem}: s$$

(1)
degrees of confidentiality by imposing additional constraints on the implementation of send. We can arrange for send to output messages with constant-sized buffers (using appropriate padding) to prevent the adversary from making any inference based on message sizes. In addition, we can arrange for send to do internal buffering and produce sequences of output messages that are separated by an interval that is independent of secrets, to prevent the adversary from making any inference based on timing. These defenses impose additional constraints only on the implementation of send. We plan to explore guaranteeing such properties of our send implementation in future work.

In the remainder of this section, we formalize the properties on both $U_H$ and $L$, such that the SIR satisfies the IRC property. To decouple the verification, we decompose IRC into 1) checking WCFI-RW of $U_H$, and 2) checking correctness properties of $L$’s API implementation.

4.1 WCFI-RW Property of $U_H$

WCFI-RW further decomposes into the following two properties:

(a) A weak form of control flow integrity (CFI) of $U_H$. A trace $\pi$ of $U_H$ satisfies weak CFI if 1) each call statement in $\pi$ targets the starting address of a procedure in $U$ or API entry point of $L$, 2) each ret statement in $\pi$ uses the return address saved by the matching call statement in $\pi$, 3) each jmp statement in $\pi$ targets a legal instruction within the procedure or the starting address of a procedure in $U$.

(b) $U_H$ does not read from or write to $L$’s memory, and does not write to non-SIR memory.

WCFI-RW guarantees that $U_H$ only calls into $L$ at allowed API entry points, which allows us to soundly decouple the verification of $U_H$ from $L$. WCFI-RW prevents a large class of CFI attacks (e.g., ROP attacks): backward control edges (returns) are fully protected, and forward edges (calls and jumps) are significantly constrained. Furthermore, observe that by preventing jumps into the middle of instructions, we guarantee that the code of $U_H$ that statically verify is same that will execute at runtime. However, this form of CFI is weaker than standard CFI [2] because it allows a procedure in $U_H$ to call any other procedure in $U_H$. In other words, a program that satisfies WCFI-RW may exhibit control transfers that are not present in the source program, and this can bootstrap certain control-flow attacks. Nevertheless, for any such attack that is not blocked by WCFI-RW, we prove that they still cannot break IRC (soundness theorem in section 4.3); in the end, the attacker only observes encrypted values.

To formalize WCFI-RW, we construct a monitor automaton $M$ (defined next) from $U_H$ to check whether $U_H$ satisfies WCFI-RW, similar in spirit to prior works on CFI [2] [9]. $M$ is synchronously composed with $U_H$, such that the statements executed by $U_H$ form the input sequence of $M$. We say that WCFI-RW is violated whenever $M$ reaches a stuck state during the execution of $U_H$. The formalization of WCFI-RW requires the following predicates over addresses in the region (illustrated in Figure 3). For any SIR address $a$, AddrInHeap($a$) is true if $a$ belongs to $U$’s heap. AddrInStack($a$) is true if $a$ belongs to the SIR’s stack (which is shared by both $U$ and $L$). AddrInU($a$) is true if $a$ belongs to $U$’s memory (stack, globals, or heap), and AddrInL($a$) is true if $a$ belongs to $L$’s memory (globals or heap). AddrInCode($a$) is true if $a$ belongs to SIR’s code (either $U$ or $L$’s code). Finally, writable($mem, a$) is true iff the bit corresponding to address $a$ is set in the write bitmap.

**Definition 3. WCFI-RW Monitor Automaton**

$M = (Q, \Sigma, \Gamma, \delta, q_0, Z_0, F)$ is a generalized pushdown automaton where $Q = \{\sigma\}$ is its set of states, $\Sigma = \{call, ret, jmp, v := e, v := load_n(e_a), store_n(e_a, e_d), havoc_sir mem\}$ is its set of inputs, $\Gamma = \{a | AddrInCode(a)\}$ is its finite set of stack symbols, $q_0 = \sigma_{entry}$ is its initial state ($\sigma_{entry}$ being the machine state following the jump from $L$ into $U$’s entry point), $Z_0 = a_{entry}$ is its initial stack, $F = Q$ is its set of accepting states, $\delta: (Q \times \Sigma \times \Gamma^*) \rightarrow 2^P(Q \times \Gamma^*)$ is its transition function. Furthermore, let next($rip$) be the address of the subsequent instruction in $U$ after decoding the instruction starting at address $rip$, and $\sigma'$ be the state resulting from executing a statement $s \in Stmt$ starting in $\sigma$, i.e., $(\sigma, \sigma') \in R$ (as per the operational semantics in Figure 3).

The transition function $\delta$ is as follows:

$\delta(\sigma, call e, \gamma) = \begin{cases} \{\sigma', next(\sigma(rip)) \cdot \gamma\} & \text{iff } \psi_{call} \\ \emptyset & \text{otherwise} \end{cases}$

where $\psi_{call} \equiv (e) is the address of a procedure entry in U

$\delta(\sigma, call x, \gamma) = \{\sigma', \gamma\}$

where $x \in \{malloc, free, send, recv\}

$\delta(\sigma, ret, a \cdot \gamma) = \begin{cases} \{\sigma', \gamma\} & \text{iff } \sigma(mem)|\sigma(rsp) = a \\ \emptyset & \text{otherwise} \end{cases}$

$\delta(\sigma, jmp e, \gamma) = \begin{cases} \{\sigma', \gamma\} & \text{iff } \psi_{jmp} \\ \emptyset & \text{otherwise} \end{cases}$

where $\psi_{jmp} \equiv \psi_{call} \lor \sigma(e) is an instr. in current procedure

$\delta(\sigma, flag := e, \gamma) = \{\sigma', \gamma\}$ where flag $\in flags$

$\delta(\sigma, rsp := e, \gamma) = \begin{cases} \{\sigma', \gamma\} & \text{iff } \psi_{rsp} \\ \emptyset & \text{otherwise} \end{cases}$

where $\psi_{rsp} \equiv AddrInU(\sigma(e))$

$\delta(\sigma, reg := e, \gamma) = \{\sigma', \gamma\}$ where reg $\in regs \setminus \{rsp\}$
\[ \delta(\sigma, \text{reg} := \text{load}_n(e_a), \gamma) = \begin{cases} \{(\sigma', \gamma)\} & \text{iff } \psi_{\text{load}} \\
0 & \text{otherwise} \end{cases} \]

where \( \psi_{\text{load}} \equiv \neg \text{AddrInL}(\sigma(e_a)) \land \neg \text{AddrInL}(\sigma(e_a) + n - 1) \)

\[ \delta(\sigma, \text{store}_n(e_a, e_d), \gamma) = \begin{cases} \{(\sigma', \gamma)\} & \text{iff } \psi_{\text{store}} \\
0 & \text{otherwise} \end{cases} \]

where \( \psi_{\text{store}} \equiv \text{AddrInU}(\sigma(e_a)) \land \text{AddrInU}(\sigma(e_a) + n - 1) \)

\[ \delta(\sigma, \text{havoc-\text{SIR mem}}, \gamma) = \{(\sigma', \gamma)\} \]

**Definition 4. WCFI-RW**

WCFI-RW is violated in an execution trace \( \pi = [\sigma_0, \ldots, \sigma_n] \) when no transition exists in \( M \) for a statement in \( \text{stmt}(\pi) \) i.e. \( M \) gets stuck. Formally, WCFI-RW(\( \pi \)) iff (with starting state \( \sigma_0 = \sigma_{\text{entry}} \) and initial stack \( \gamma_0 = \gamma_{\text{entry}} \)):

\[ \exists \gamma_0, \ldots, \gamma_n \in \Gamma^*, \forall k \in \{0, \ldots, n-1\}, (\sigma_{k+1}, \gamma_{k+1}) \in \delta(\sigma_k, \text{stmt}(\sigma_k), \gamma_k) \]

\( U_\pi \) satisfies WCFI-RW if all traces of \( U_\pi \) satisfy WCFI-RW.

The role of the pushdown stack in Definition 4 is to match the calls and returns. \( M \) only modifies the pushdown stack on \text{call} and \text{ret} statements, and updates the state as per the operational semantics defined in Figure 5. We now describe each case in the definition of the transition function \( \delta \) of \( M \). On a \text{call} to a procedure in \( U \), \( M \) pushes the return address (i.e., the address of the subsequent instruction) onto the pushdown stack, for use by the \text{ret} statement. On a \text{call} to \( L \)'s API, since \( L \) only contributes one step to the trace, and since correctness of \( L \)'s APIs (section 4.2) guarantees that the \text{call} returns to the call site, \( M \) does not push the return address onto the pushdown stack. A \text{ret} produces a valid transition only when the topmost symbol on the pushdown stack matches the return address on the machine’s stack; this transition pops the topmost element off the pushdown stack.

A \text{jmp} produces a valid transition if it targets a legal instruction in the current procedure, or the beginning of a procedure in \( U \). Assignment to \( \text{rsp} \) succeeds if the new value is an address in \( U \)'s memory — this constraint is needed because \text{call} and \text{ret} access \text{mem} at address \text{rsp}, and WCFI-RW requires stores to be contained within \( U \)'s memory. Other registers and flags can be assigned to arbitrary values. Finally, to satisfy WCFI-RW’s constraints on reads and writes, a \text{load} proceeds iff the address is not within \( L \)'s memory, and a \text{store} proceeds iff the address is within \( U \)'s memory.

**4.2 Correctness of \( L \)'s API Implementation**

While we strive for full functional correctness of \( L \), the following contract (in conjunction with WCFI-RW of \( U_\pi \)) is sufficient for proving IRC of the SIR.

(a) \text{malloc(size)} (where the return value \text{ptr} is the starting address of the allocated region) must not 1) modify non-SIR memory or stack frames belonging to \( U \), 2) make any stack location writable, or 3) return a region outside \( U \)'s heap. Formally, when \text{stmt}(\sigma) = \text{call malloc}, we write \((\sigma, \sigma') \in \mathcal{R} \text{ iff } \psi_{\text{malloc}} \) holds, where \( \psi_{\text{malloc}} \) is the conjunction of:

- \( \forall a. \neg (\text{SIR}(a) \lor (\text{AddrInStack}(a) \land a \geq \sigma(\text{rsp}))) \Rightarrow \sigma(\text{mem})[a] = \sigma'(\text{mem})[a] \)
- \( \forall a. \text{AddrInStack}(a) \Rightarrow (\text{writable}(\sigma(\text{mem}), a) \iff \text{writable}(\sigma'(\text{mem}), a)) \)
- \( \sigma'(\text{ptr}) = 0 \lor ((\sigma'(\text{ptr}) \leq \sigma'(\text{ptr}) + \text{size}) \land \text{AddrInHeap}(\sigma'(\text{ptr})) \land \text{AddrInHeap}(\sigma'(\text{ptr}) + \text{size})) \)

First, by forbidding \text{malloc} from modifying \( U \)'s stack above \text{rsp}, we prevent \text{malloc} from overwriting return addresses in \( U \)'s stack frames. Second, by forbidding \text{malloc} from making a stack location writable, we prevent a return address from being corrupted later by code in \( U \) — \text{malloc} should only modify the write bitmap to mark the allocated region writable. Both restrictions are paramount for preventing WCFI-RW exploits. Finally, we require \text{malloc} to return a region from \( U \)'s heap (or the null pointer), and ensure that a machine integer overflow is not exploited to violate IRC.

(b) \text{free(ptr)} must not 1) modify non-SIR memory or stack frames belonging to \( U \), or 2) make any stack location writable. Formally, when \text{stmt}(\sigma) = \text{call free}, we write \((\sigma, \sigma') \in \mathcal{R} \text{ iff } \psi_{\text{free}} \) holds, where \( \psi_{\text{free}} \) is the conjunction of:

- \( \forall a. \neg (\text{SIR}(a) \lor (\text{AddrInStack}(a) \land a \geq \sigma(\text{rsp}))) \Rightarrow \sigma(\text{mem})[a] = \sigma'(\text{mem})[a] \)
- \( \forall a. \text{AddrInStack}(a) \Rightarrow (\text{writable}(\sigma(\text{mem}), a) \iff \text{writable}(\sigma'(\text{mem}), a)) \)

These constraints are equivalent to the constraints on \text{malloc}, and are likewise paramount for preventing WCFI-RW exploits. Note that we do not require \text{malloc} to return a previously unallocated region, nor do we require \text{free} to mark the freed region as invalid; full functional correctness would require such properties. WCFI-RW does not assume any invariants on the heap values, and therefore vulnerabilities such as use-after-free do not compromise WCFI-RW.

(c) \text{send(ptr, size)} must not make any address writable or modify the stack frames belonging to \( U \). Formally, when \text{stmt}(\sigma) = \text{call send}, we write \((\sigma, \sigma') \in \mathcal{R} \text{ iff } \psi_{\text{send}} \) holds, where \( \psi_{\text{send}} \) is:

- \( \forall a. (\text{AddrInBitmap}(a) \lor (\text{AddrInStack}(a) \land a \geq \sigma(\text{rp})) \Rightarrow \sigma(\text{mem})[a] = \sigma'(\text{mem})[a] \)

\text{send} is used to encrypt and sign the message buffer before writing to non-SIR memory, and it is the only API call that is allowed to modify non-SIR memory. However, we forbid \text{send} from modifying a caller’s stack frame or the bitmap. By preventing \text{send} from modifying \( U \)'s stack above \text{rsp}, we prevent \text{send} from overwriting a
return address in any of \( U \)'s stack frames. Furthermore, \( \text{send} \) cannot modify the bitmap and make any location writable, thereby preventing a return address from being modified later by some code in \( U \).

(d) \( \text{recv}(\text{ptr}, \text{size}) \) must 1) check that the destination buffer is a writable region in \( U \)'s memory, and 2) not modify any memory location outside the input buffer. Formally, when \( \text{stat}(\sigma) = \text{call recv} \), we write \( (\sigma, \sigma') \in \mathcal{R} \) iff \( \psi_{\text{recv}} \) holds, where \( \psi_{\text{recv}} \) is the conjunction of:

\[
\forall a. (\sigma(\text{ptr}) \leq a < \sigma(\text{ptr}) + \sigma(\text{size})) \Rightarrow \\
(\text{writable}(\sigma(\text{mem}), a) \land \text{AddPtrU}(a))
\]

\[
\forall a. (\neg (\sigma(\text{ptr}) \leq a < \sigma(\text{ptr}) + \sigma(\text{size})) \Rightarrow \\
\sigma(\text{mem})[a] = \sigma'(\text{mem})[a]
\]

\[
\sigma(\text{ptr}) \leq \sigma(\text{ptr}) + \sigma(\text{size})
\]

\( \text{recv} \) is used to copy an encrypted, signed message from non-SIR memory, decrypt it, verify its signature, and copy the cleartext message buffer to \( U \)'s memory. The first two constraints ensure that the message is written to a writable memory region within \( U \) (which guarantees that a return address is not modified by \( \text{recv} \)), and that the cleartext message is not written out to non-SIR memory. The final constraint ensures that an integer overflow is not exploited to violate IRC.

In addition to the contracts above, we check the following contracts for each of \( \text{malloc} \), \( \text{free} \), \( \text{send} \), and \( \text{recv} \):

- page permissions, following the API call, are set to prevent read and write access to \( L \)'s memory. Write access is disabled to prevent \( U \) from corrupting \( L \)'s state, whereas read access is disabled to prevent reading \( L \)'s secrets.
- stack pointer \( \text{rsp} \) is restored to its original value.
- the API call satisfies the application binary interface (ABI) calling convention (Windows x64 in our case)

For the purposes of this paper, we assume that the implementation of \( L \) satisfies the above contracts. Since \( L \) is written once, and used inside all SIRs, we could potentially verify the implementation of \( L \) once and for all manually.

### 4.3 Soundness

**Theorem 1.** If \( U_H \) satisfies WCRI-RW and the implementation of \( L \)'s API satisfies the correctness properties given in section 4.2 then \( U_H \) satisfies IRC.

A proof of Theorem 1 is given in supplement material [1].

5. Verifying WCRI-RW

In the remainder of this paper, we describe an automatic, static verifier for proving that a developer-provided \( U_H \) satisfies the WCRI-RW property. Verifying such a property at machine code level brings up scalability concerns. Our benchmarks consist of SIR programs that are upwards of 100 KBs in binary size, and therefore whole-program analy-

ses would be challenging to scale. Intra-procedural analysis, on the other hand, can produce too many false alarms due to missing assumptions on the caller-produced inputs and state. For instance, the caller may pass to its callee a pointer to some heap allocated structure, which the callee is expected to modify. Without any preconditions on the pointer, a modular verifier might claim that the callee writes to non-SIR memory, or corrupts a return address, etc.

Instead of verifying WCRI-RW of arbitrary machine code, our solution is to generate machine code using a compiler that emits runtime checks to enforce WCRI-RW, and automatically verify that the compiler has not missed any check. Our compiler emits runtime checks that enforce that unconstrained pointers (e.g., from inputs) are not used to corrupt critical regions (e.g., return addresses on the stack), write to non-SIR memory, or jump to arbitrary code, etc. As our experiments show, the presence of these checks eliminates the unconstrained verification environments described above. Consequently, most verification conditions (VCs) that we generate can be discharged easily. Even in cases where the compiler eliminates checks for efficiency, the compiler does not perform any inter-procedural optimization, and we demonstrate that a modular verifier can prove that eliminating the check is safe.

5.1 Runtime Checks

We use the compiler to 1) prepend checks on \( \text{store} \) instructions to protect return addresses in the stack, 2) prepend checks on \( \text{store} \) instructions to prevent writes to non-SIR memory, and 3) prepend checks on indirect \( \text{call} \) and \( \text{jmp} \) instructions to enforce valid jump targets. We also use the processor’s page-level access checks for efficiently preventing reads and writes to \( L \)'s memory by code in \( U_H \).

**Runtime Checks on Stores:** To enforce that writes through pointers do not corrupt return addresses on the stack, the compiler maintains a bitmap (see write_bitmap in Figure 3) to record which areas in \( U \)'s memory are writable, while maintaining the invariant that a return address is never marked writable. The write_bitmap maps every 8-byte slot of \( U \)'s memory to one bit, which is set to one when those 8 bytes are writable. The bitmap is updated at runtime, typically to mark address-taken local variables and malloc-returned regions as writable, and to reset the bitmap at procedure exits or calls to \( \text{free} \). For instance, if a caller expects a callee to populate the results in a stack-allocated local variable, the caller must mark the addresses of that local variable as writable before invoking the callee. A \( \text{store} \) instruction is prepended with an instruction sequence that reads the write_bitmap and terminates the SIR program if the corresponding bit is zero (see instructions from L1 to L2 in Figure 3). This check on \( \text{store} \) can enable stronger properties than backward edge CFI in that it also prevents many forms of memory corruptions. While this gives us stronger security guarantees at runtime, we only require a weak form of CFI for proving WCRI-RW.
In addition, the compiler prepends store with range checks that prevent writes outside the SIR region (see instructions from L0 to L1 in Figure 3).

Finally, we use the processor’s paging instructions to re-zone write permissions on L’s memory while \( U_H \) executes, and to rezone write permissions following an API call to L — we also use page permissions to make code pages and the call_bitmap non-writable at all times. The SIR provider guarantees that the processor respects the page permissions of SIR memory. In the case of Intel SGX 2.0, the processor provides the emodpr instruction to change page permissions of enclave (SIR) memory. The processor hardware performs the page-level access checks without any noticeable performance overhead. Note that we cannot use the page permissions to prevent writes to non-SIR memory because the adversary \( \mathcal{H} \) controls the page tables for all non-SIR memory.

Runtime Checks on Loads: WCRI-RW mandates that \( U_H \) does not load from L’s memory. This ensures that \( U_H \) never reads secrets such as the secure channel’s cryptographic keys, which is necessary because strong encryption properties no longer hold if the key itself is used as plain text. To that end, L disables read access to its memory by setting the appropriate bits in the page tables. On each API call, L first sets the page permissions to allow access to its own memory, and resets it before returning back to \( U_H \).

On a side note, although we would like \( U_H \) to only read SIR memory (and use recv to fetch inputs from non-SIR memory), we avoid introducing range checks for two reasons: 1) WCRI-RW does not require this guarantee, and 2) loads are frequent, and the range checks incur significant additional performance penalty.

Runtime Checks on Indirect Control Transfers: The compiler maintains a separate bitmap (see call_bitmap in Figure 3) that records the entry points of procedures in \( U \) and APIs of L. The call_bitmap maps every 16-byte slot of \( U \)’s memory to one bit, and the compiler accordingly places each procedure’s entry point in code at a 16-byte aligned address. The compiler prepends indirect calls with an instruction sequence that reads the bit within call_bitmap corresponding to the target address, and terminates the SIR program if that bit is zero. Indirect jumps to within the procedure are also checked to prevent jumps to the middle of x64 instructions, which can lead to control-flow hijacks.

The reader may question our choice of runtime checks, as one could simply instrument instructions implementing the validity checks on the corresponding transitions in the WCRI-RW monitor \( \mathcal{M} \) (from Definition 3). However, this would require us to build a provably correct implementation of a shadow stack within SIR memory, and use the shadow stack during call and ret instructions to check that the processor uses the same return address as the one pushed by the matching call instruction. However, it is non-trivial to protect the shadow stack from code running at the same privilege level — doing so might require the very techniques that we use in our runtime checks.

5.2 Static Verifier for WCRI-RW

We present a modular and fully automatic program verifier, called /CONFIDENTIAL, for checking that the compiler-generated machine code satisfies the WCRI-RW property. Since runtime checks incur a performance penalty, the compiler omits those checks that it considers to be redundant. For instance, the compiler eliminates checks on writes to local scalar variables (and therefore does not need to make them writable in the write_bitmap) and to variables whose addresses are statically known. The compiler also tries to hoist checks out of loops whenever possible. These optimizations do not compromise WCRI-RW, and /CONFIDENTIAL proves them safe so as to avoid trusting the compiler. Since we verify WCRI-RW at the machine code level, we do not need to trust the implementation of these optimizations.

/CONFIDENTIAL is based on a set of proof obligations for verifying that the output machine code (modeled as \( U_H \)) satisfies WCRI-RW. It modularly verifies each procedure in isolation and is still able to prove WCRI-RW for the entire program — this soundness guarantee is formalized as a theorem that we present later in this section. It is important to note that modular verification is possible because the compiler does not perform any global analysis to optimize away the runtime checks. /CONFIDENTIAL generates proof obligations for each store, load, call, ret, jmp, and rsp update in the procedure. While generating proof obligations, /CONFIDENTIAL does not distinguish statements based on whether they originated from \( U \)’s source code or the runtime checks, since the compiler is untrusted. These proof obligations are implemented by instrumenting \( U_H \) with static assertions, which are discharged automatically using an SMT solver by the process of VC generation. We present the instrumentation rules in Table 1 and describe them below.

The instrumentation rules use the following functions, which are defined for a given \( U \):

- `policy(e)` is true iff address \( e \) is the starting address of a procedure in \( U \) or an API entrypoint of L. This predicate is consistent with the state of call_bitmap, which remains constant throughout the SIR’s execution.
- `writable(mem, e)` is true iff the bit corresponding to address \( e \) is set to one in the write_bitmap region of mem.
- `b(mem, e_a, e_d)` is a partial function (only defined for values of \( e_a \) for which AddrInBitmap(e_a) holds) that returns the largest address that is marked writable as a result of executing `store(e_a, e_d)`
- `start(p)` returns the starting address of procedure \( p \)
- `end(p)` returns the ending address of procedure \( p \)
- `legal(e)` is true for any \( e \) that is the starting address of an instruction in \( U \) — we need this predicate because x64 instructions have variable lengths.
old(rsp) is the value of rsp at procedure entry, and is modeled as a symbolic variable because the procedure may be called at an arbitrary depth in the call stack.

Functions policy, start, and end are defined by parsing the executable (DLL format) produced by the compiler. legal is defined by disassembling the executable code, which is a precursor to the formal modeling step that produces $U_H$. Since the memory layout and code pages remain constant throughout execution, these functions are defined once for a given $U$ and are independent of the current state. Functions writable and b are evaluated on the contents of write_bitmap within mem, and their definition involves a load from mem and several bitwise operations. We also recall predicates AddrInStack, AddrInBitmap, AddrInL, and SIR from section 4 which are used to define various regions in the SIR’s memory (see Figure 3).

### Static Assertions on Calls:
On each statement of the type call $e$, we assert that 1) the target address $e$ is either a procedure in $U$ or an API entrypoint in $L$, 2) all addresses in the callee’s stack frame are initially unwritable, and 3) the caller follows the Windows x64 calling convention by allocating 32 bytes of scratch space for use by the callee.

### Static Assertions on Stores: $U_H$ may invoke $\text{store}_n(e_a, e_d)$ on an arbitrary virtual address $e_a$ with arbitrary data $e_d$. Hence, we must argue that the proof obligations prevent all store instructions that violate WCRI-RW. We case split this safety argument for each memory region in the virtual address space (Figure 3). The call_bitmap, the code pages, and $L$’s memory are marked non-writable in the page tables — store to these areas results in an exception, followed by termination. Within $U$’s memory, /CONFIDENTIAL treats all writes to the heap and globals as safe because WCRI-RW does not require any invariants on their state — while the heap and global area may store code and data pointers, /CONFIDENTIAL instruments the necessary assertions on indirect control transfers and dereferences, respectively. We are left with potential stores to $U$’s stack, write_bitmap, and non-SIR memory, and their proof obligations are:

- **AddrInStack($e_a$):** if $e_a$ is an address in a caller’s stack frame but not in the 32-byte scratch space (which is addressed from old(rsp) + 8 to old(rsp) + 40), then $e_a$ must be marked by the write_bitmap as writable. On the other hand, $U_H$ is allowed to write arbitrary values to the current stack frame or the 32-byte scratch space.

- **AddrInBitmap($e_a$):** only addresses in the current stack frame can be made writable. It suffices to check that the largest address whose write permission is being toggled is below old(rsp). Note that unaligned stores may change two words at once. Since the instrumentation code only checks the write permissions of the first word (for performance reasons), we further restrict the largest address to be below old(rsp) − 8 to account for unaligned stores of up to 8 bytes.

- **SIR($e_a$):** WCRI-RW mandates that $U_H$ does not store to non-SIR memory, for which /CONFIDENTIAL generates a proof obligation: assert SIR($e_a$).

### Static Assertions on Assignments to rsp: For each statement of the type $\text{rpm} := e$, we check that the new stack pointer $e$ is 8-byte aligned. 2) does not point to a caller’s stack frame (i.e., must not be greater than the old rsp). The constraint that the rsp never points to a caller’s stack frame is necessary for modular verification. We use a guard page (i.e., a page without read or write page permissions) to protect against stack overflows — in the case where the procedure needs stack space larger than a page, we check that the compiler introduces a dummy load that is guaranteed to hit the guard page and cause an exception, thus preventing the procedure from writing past the guard page.

### Static Assertions on Returns: For each ret statement, we check that 1) rsp has been restored to its original value, and 2) the procedure has reset the write_bitmap so that all addresses in the current stack frame are unwritable.

### Static Assertions on Jumps: A jmp is safe if it either targets a legal address within the current procedure $p$ (i.e., not in the middle of an instruction), or the start of a procedure (often used for performing tail calls). In the case of jmp to a

<table>
<thead>
<tr>
<th>Smt $s$</th>
<th>Instrumented Smt $I(s)$</th>
</tr>
</thead>
<tbody>
<tr>
<td>call $e$</td>
<td>assert policy($e$) $\land$ (forall i. (AddrInStack(i) $\land$ $i &lt; \text{rsp}$) $\Rightarrow$ ¬writable(mem, i)) $\land$ ($\text{rsp} \leq \text{old(rsp)} - 32$);</td>
</tr>
<tr>
<td>store$_n(e_a, e_d)$</td>
<td>assert ($\forall i \leq e_a + e_d + n$ 1 \ (($\text{AddrInStack}(i)$ $\land$ $i \geq \text{old(rsp)}$ $\land$ ¬(old(rsp) + 8 $\leq i &lt; \text{old(rsp)} + 40$)) $\Rightarrow$ writable(mem, $e_a$);</td>
</tr>
<tr>
<td></td>
<td>assert ($\forall i \leq e_a + e_d + n$ 1 \ (($\text{AddrInBitmap}(i)$ $\Rightarrow$ $\text{b}(m_{\text{mem}}, i, e_d)[$$8 * (i + 1 - e_a) + 8 * (i - e_a)] &lt; \text{old(rsp)} - 8$));</td>
</tr>
<tr>
<td></td>
<td>assert SIR($e_a$);</td>
</tr>
<tr>
<td></td>
<td>store$_n(e_a, e_d)$</td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
procedure, we check the same properties as a call instruction, except that rsp is restored to its original value.

**Syntactic Check for SIR instructions**: Code in $U_H$ runs at the same privilege level as $L$, and hence may invoke instructions to override page permissions (such as emodpr in SGX 2.0). We guard against this vulnerability by simply checking for the presence of special SIR instructions (all SGX instructions) in $U_H$, which is captured by a regular expression on the disassembled machine code. Though not strictly required for WCFL-RW, /CONFIDENTIAL forbids all SGX instructions because of certain instructions (such as etgetkey for accessing the sealing key) which return cryptographic secrets to the user code.

### 5.3 Optimization to the Proof Obligations

If we can estimate the stack size needed for a procedure, then we can optimize the proof obligations for store, ret, and call statements (see Table 2). The modifications are:

- **store**: further assert that updating the write bitmap does not mark any address below the estimated stack space to be writable.
- **call**: further assert that the current rsp is not within the estimated stack space (which would otherwise falsify the callee’s precondition that the stack space is non-writable). The modified assertion on store also allows us to omit the proof obligation that all addresses below the current rsp are non-writable (prior to the call).
- **ret**: now asserts non-writability of only the addresses in the estimated stack space, instead of all addresses below the old(rsp). Since the range of addresses is bounded, we instantiate the $\forall$ quantifier, and help the SMT solver to eliminate hundreds of timeouts in our experiments.

Although the optimization is sound for any positive value of estimate, we are able to compute a precise estimate for all of our benchmarks. The estimate is computed by aggregating all the stack subtractions and checking that there is no assignment to rsp within a loop. If rsp is assigned within a loop body, then the optimization is disabled. Furthermore, this optimization may lead to false positives in rare cases of safe programs, in which case we fall back to the unoptimized implementation. For instance, this may happen if a procedure decrements rsp after making a procedure call, but we have not encountered such cases in our evaluation.

### 5.4 Soundness

The following theorem states that our proof obligations imply WCFL-RW.

**Theorem 2. (Soundness of $I$)** Let $p$ be a procedure, and $I(p)$ be procedure $p$ instrumented with the assertions given in Table 1 and with optimizations in Table 2. If for each $p$ in $U_H$, $I(p)$ is safe (i.e., no trace of $I(p)$ violates such an assertion), then $U_H$ satisfies WCFL-RW.

A proof of Theorem 2 is given in supplement material [1].

### 6. Implementation

We develop a toolchain for building IRC-preserving SIRs. The developer first compiles $U$’s code (written in C/C++) using the VC3 compiler [33], to insert checks on indirect control-flow transfers, and checks on stores to prevent tampering of return addresses and to prevent stores from exceeding the SIR boundary — the instrumentation also protects against several classes of memory corruption errors, but we do not leverage these guarantees for proving IRC.

/CONFIDENTIAL takes as input a DLL with $U$’s code, and an implementation of $L$ that provides the required guarantees. First, /CONFIDENTIAL parses the DLL to extract all the procedures. Next, for each procedure, /CONFIDENTIAL invokes the Binary Analysis Platform (BAP [10]) to lift the x64 instructions to statements in our language (Figure 4), which are then converted to BoogiePL [15]. Indirect jmp and call instructions require some preprocessing before modeling them in BoogiePL, and we discuss this detail later in this section.

Next, for proving WCFL-RW, /CONFIDENTIAL instruments each procedure in BoogiePL with assert statements as given in Table 1 and Table 2. The Boogie verifier [4] generates VCs and automatically discharges them using the Z3 SMT solver [14]. If all assert statements in all procedures are valid, then by Theorem 1 and Theorem 2, $U_H$ satisfies IRC. /CONFIDENTIAL checks the validity of each assert in parallel, which in combination with the modular analysis, allows /CONFIDENTIAL to scale to realistic SIR programs.

**Modeling Procedure Calls** Since the analysis is modular, /CONFIDENTIAL replaces each procedure call by a havoc to the machine state in lieu of specific procedure summaries. The havoc is performed by assigning fresh, symbolic values to volatile registers and CPU flags, and assigning a fresh, symbolic memory (called new_mem below) which is subject to certain constraints as shown below. We encode the constrained havoc to machine state using the following statements in order, which are instrumented after the call statement in $U_H$.

- **assum** $\forall i. (\text{AddrInStack}(i) \land i < \text{rsp} \land \neg \text{writable}(\text{mem}, i)) 
\Rightarrow \text{load}_{i}(\text{mem}, i) = \text{load}_{i}(\text{new_mem}, i)$

A callee procedure may have an unbounded number of store instructions that can modify any memory location marked writable in the write bitmap — the havoc must preserve the non-writable locations in the current stack frame because our instrumentation guarantees that a callee cannot change their writability (as opposed to the heap which may be made writable by calling malloc, and then modified).

- **assum** $\forall i. \text{AddrInStack}(b(\text{mem}, i, 11111111)) 
\Rightarrow \text{load}_{i}(\text{mem}, i) = \text{load}_{i}(\text{new_mem}, i)$

Our instrumentation guarantees that a portion of the write bitmap (specifically the part that controls stack addresses) is restored on ret, which validates this assumption.
assert (AddrInBitmap(i) ∧ i ≥ old(rsp) ∧ ¬(old(rsp) + 8 ≤ i < old(rsp) + 40)) ⇒ writable(mem, ea);
assert (AddrInStack(i) ∧ i ≥ old(rsp) ∧ ¬(old(rsp) + 32 ≤ i < old(rsp) + 64)) ⇒ writable(mem, ea);

Table 2. Optimized instrumentation rules for store, call, and ret statements

<table>
<thead>
<tr>
<th>Stmt s</th>
<th>Instrumented Stmt l(s)</th>
</tr>
</thead>
<tbody>
<tr>
<td>storeₕ  (eₐ, eₕ)</td>
<td>assert (AddrInStack(i) ∧ i ≥ old(rsp) ∧ ¬(old(rsp) + 8 ≤ i &lt; old(rsp) + 40)) ⇒ writable(mem, eₕ);</td>
</tr>
<tr>
<td>call e</td>
<td>assert policy(e) ∧ (rsp ≤ old(rsp) − 32) ∧ (rsect ≤ old(rsp) − estimate);</td>
</tr>
<tr>
<td>ret</td>
<td>assert (rsp = old(rsp)) ∧ (∀i. (i &lt; old(rsp) ∧ i ≥ old(rsp) − estimate) ⇒ ¬writable(mem, i));</td>
</tr>
</tbody>
</table>

▷ mem := new_mem
This step assigns a new memory that is related to the old memory by the above assume statements.

▷ havoc rax, rcx, rdx, r8, r9, r10, r11;
This step havoc all volatile registers (as defined by the Windows x64 calling convention) with fresh, symbolic values.

▷ havoc ZF, AF, OF, SF, CF, PF;
The callee may cause arbitrary updates to the CPU flags, which is modeled by this havoc.

The constrained havoc above models an arbitrary U procedure that has an unbounded number of statements in any order — we prove this lemma within the proof of the soundness theorem [2]. The constrained havoc (in the statements above) is followed by a jump to the next instruction, as computed during disassembly. This is sound because WCFI-RW guarantees that the callee uses the return address placed by the caller. There is a caveat that a call to L’s API is replaced by its contract (defined in section 4.2) in lieu of the constrained havoc defined above.

We also assume the following preconditions at the beginning of each procedure:

▷ ∀i. (AddrInStack(i) ∧ i ≥ old(rsp)) ⇒ ¬writable(mem, i)
This assumption treats all addresses in the local stack frame as non-writable upon procedure entry. It is upon the procedure to explicitly update the write_bitmap to make parts of its stack frame writable. This precondition is sound since we enforce it at all call sites.

▷ AddrInStack(old(rsp)) ∧ old(rsp)[32:0] = 000
We assume the initial stack pointer must be within the stack region and that it is 8-byte aligned. This precondition is sound because we enforce this property on every assignment to rsp.

Modeling Indirect Control Transfers. In order to use VC Generation and SMT solving, we need to statically approximate the set of jump targets for each register-based indirect jump. While we can use standard off-the-shelf value analysis, we observed that the compiler idiomatically places a jump table in memory, which is indexed dynamically to compute the target. Correspondingly, our verifier determines the base address of the jump table and reads its contents to compute the set of potential jump targets; this step is not trusted. An indirect jump is then modeled as a “switch” over direct jump statements to the potential targets, with the default case being assert false. The presence of assert false allows the approximation step to be untrusted. Indirect calls are handled using a similar method as direct calls; we introduce a constrained havoc on writable memory, volatile registers, and all CPU flags.

Modeling Havocs from ℳ. While our adversary model requires inserting havoc-SIR mem before each statement in U, it is efficient and sound to do so only before load statements [40]. We havoc the result of a load statement if the address is a location in non-SIR memory; reg := loadₙ(e) is transformed to if (SIR(e)) {reg := loadₙ(e)} else {havoc reg}.

Verifying Procedures with Loops. /CONFIDENTIAL uses a candidate loop invariant that a portion of the writeBitmap (specifically the part that controls stack addresses) is preserved across loop iterations — we expect this invariant to hold because 1) the VC3 compiler tends to set the write_bitmap only in the procedure’s prologue, which occurs before loop bodies, and 2) our proof obligations guarantee that callees preserve this portion of the write_bitmap. Empirically, we find that this loop invariant is sufficient for proving our assertions within loop bodies.

7. Evaluation

We evaluate /CONFIDENTIAL on several SIR programs that process sensitive data, and we summarize the results in Table 3 and Figure 6. We choose the three largest MapReduce examples from VC3 [38]: Revenue, IoVolumes, and UserUsage. UserUsage and IoVolumes processes sensitive resource usage data from a cloud platform. IoVolumes processes storage I/O statistics, and UserUsage aggregates the total execution time per user. Revenue reads a log file from a website and calculates the total ad revenue per IP address. Each of these benchmarks implement the mappers and reducers within the SIRs, and place large parts of the untrusted Hadoop stack outside the SIR boundary. Performance evaluation of these applications is described in [38], which reports that the average cost of the run-time checks is 15%. We also
experiment with three SPEC CPU2006 benchmarks: bzip2, astar, and lbm.

As Table 3 and Figure 6 show, /CONFIDENTIAL is able to prove almost all assertions needed to check WCFL-RW in less than 20 seconds each, which demonstrates the potential in scaling our approach. We performed full verification of the lbm benchmark (i.e., no timeouts or false positives), which took roughly 3 hours of wall clock time. We also discovered few procedures across many benchmarks that have instructions that BAP [10] could not process, and we plan to experiment with alternative tools in future.

All benchmarks were compiled with the optimization level at -O2. All experiments were performed on a machine with 160GB RAM and 12 Intel Xeon E5-2440 cores running at 2.40GHz. As mentioned previously, /CONFIDENTIAL parallelizes the verification by spawning several instances of the Z3 SMT solver, where each instance is responsible for proving one of the instrumented static assertions.

<table>
<thead>
<tr>
<th>Benchmark</th>
<th>Code Size</th>
<th>Verified asserts</th>
<th>Timed out asserts</th>
<th>False Positives</th>
</tr>
</thead>
<tbody>
<tr>
<td>UserUsage</td>
<td>14 KB</td>
<td>2125</td>
<td>2</td>
<td>4</td>
</tr>
<tr>
<td>IoVolumes</td>
<td>17 KB</td>
<td>2391</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>Revenue</td>
<td>18 KB</td>
<td>1534</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>lbm</td>
<td>38 KB</td>
<td>1192</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>astar</td>
<td>115 KB</td>
<td>6468</td>
<td>2</td>
<td>0</td>
</tr>
<tr>
<td>bzip2</td>
<td>155 KB</td>
<td>10287</td>
<td>36</td>
<td>0</td>
</tr>
</tbody>
</table>

Table 3. Summary of results.

Timeouts. Prior to implementing the optimizations in section 5.3, we had experienced several hundred timeouts. After the optimizations, only roughly 0.2% of all assertions (across all benchmarks) do not verify within the 30 minute timeout, as shown in Table 3. The main source of complexity in the generated VC comes from the combination of quantifiers and multiple theories such as arrays and bitvectors that are typically hard for SMT solvers. Another reason is the presence of a few large procedures in the SPEC benchmarks — largest procedure has above 700 x64 instructions and 5200 BoogiePL statements — which generated large SMT formulas. These large procedures (considering those above 420 x64 instructions and 3500 BoogiePL statements) accounted for 31 (69%) timeouts. We found that all of these assertions are associated with store instructions, and they enforce that the store targets a writable region in memory — one feature of these large procedures is heavy use of the stack space in memory, potentially causing the SMT solver to reason heavily about aliasing in order to prove that the target address is marked non-writable. Ten (22%) of the timeouts are on assertions associated with ret instructions, where the solver struggled to prove that all locations in the current stack frame are made non-writable (even with the optimization in section 5.3) — unless the procedure explicitly resets the write_bitmap prior to the ret, the SMT solver must prove that none of the stores in the procedure are unsafe. The remainder of the timeouts, roughly 9%, were neither on the return instructions, nor in large procedures — we find that they are associated with store instructions, where the solver is able to prove that the store targets the write_bitmap but not whether the written value is safe.

We manually investigated the assertions that time out (by experimenting at the level of the BoogiePL program) and were able to prove some of them using additional invariants and abstractions, without requiring any specific knowledge of the benchmark or its source code. Although we machine check these proofs (using Boogie and Z3), we continue to count them as timeouts in Table 3, since our goal is to have a fully automatic verifier of WCFL-RW. For roughly half of the timeouts observed on ret instructions, we hypothesized intermediate lemmas (a few instructions prior to the ret) and simplified the VC by introducing a havoc to mem followed by an assume that is strong enough needed to prove the final assertion — we also prove that the intermediate lemma holds prior to the havoc, making this transformation sound. Specifically, for a stack address a, we either 1) havoc the write_bitmap if the procedure contains instructions to reset the write_bitmap corresponding to address a, and these instructions are sufficient to prove the final assertion ¬writable(mem,a), or 2) we introduce assert ¬writable(mem,a) at earlier points in the program, if the procedure does not make a writable. This approach eliminates 6 of the 10 timeouts on ret instructions.

We also experimented with the 31 timeouts on store instructions within the large procedures. With the exception of 3 of these 31 timeouts, we were not able to get Z3 to prove

False Positives. We found four assertions that produced spurious counterexample traces, all within a single procedure of UserUsage. The violating procedure is a C++ constructor method, which writes the vtable pointer in the newly allocated object. Since the memory allocator terminates the SIR if it fails to allocate memory, the compiler optimizes away the range checks on the pointer returned by the memory allocator — this is the only observed instance of the compiler performing global optimization. Since /CONFIDENTIAL does not do any global analysis, it flags this method as unsafe. We could fix the issue by disabling this optimization.
the assertions, even after simplifying the VC with intermediate lemmas and havoc statements. These 3 assertions were at relatively shallow depths in the control flow graph of the procedure, where there are fewer loads and stores leading to the assertion. Finally, we tried the CVC4 [5] solver, but we did not succeed in eliminating any more timeouts.

Having performed this investigation, we are hopeful that with improving SMT solvers and better syntactic heuristics for simplifying the VCs, we will eliminate all timeouts.

8. Related Work

Confinement of programs to prevent information release has been studied for many years [22]. We are interested in achieving confinement in user programs, even with buggy or malicious privileged code. We use trusted processors [5] [24] [28] [29] to create isolated memory regions where we keep confidential application code and data, but our techniques are also applicable if isolation is provided by the hypervisor [11] [20] [45], or runtime checks [13]. Independently of the isolation provider, we need to reason about the application code to provide formal guarantees of confinement, which is the goal of our work.

Our method to check for WCFT-RW draws inspiration from prior work to perform instrumentation in order to satisfy control-flow integrity (CFI [2] [33]) and software fault isolation (SFI [27] [39] [44]). Like SFI, we introduce run-time checks to constrain memory references. Our run-time checks are similar to the ones used in VC3 [38], but importantly we use the paging hardware to check reads, which is more efficient than relying on compiler instrumentation. Unlike VC3, we verify that our checks guarantee IRC. Native Client [47] also enforces a form of SFI, but its run-time checks for 64-bit Intel processors would require us to create SIRs with a minimum size of 100GB [39], which is not practical for our target environment (x86-64 CPUs with SGX extensions). This is because the SIR’s address space must be statically configured and physically mapped by the CPU upon the SIR’s creation, whereas the 64-bit Native Client scheme was implemented in a setting where the virtual address space can be large. Our run-time checks also enforce stronger security properties; for example, Native Client does not guarantee that calls return to the instruction immediately after the call. Native Client ultimately enforces a different policy: it aims to sandbox browser extensions and trusts the host OS, while we aim to isolate an application from a hostile host. This requires us to model a powerful, privileged adversary (H) while reasoning about the application’s execution.

There have also been efforts to perform SFI with formally verified correctness guarantees. RockSalt [30] uses Coq to reason about an x86 processor model and guarantee SFI; it works for 32-bit x86 code, while our system works for the x86-64 ISA. ARMor [49] uses HOL to reason about ARM processor model and guarantee SFI. Native Client, XFI [18] and [48] include verifiers that work on machine code. Our verification scheme is different from these works since it uses different runtime checks (which provide stronger guarantees) and it supports aggressive compiler optimizations that remove redundant checks. We require more complex reasoning and thus use an SMT solver to build our verifier.

Unlike all of the above works, our ultimate goal is preserving confidentiality of a trusted application running in an untrusted and hostile host. Our specific definition of WCFT-RW, together with contracts we assume on the library methods guarantees IRC, which is the novel aspect of our work. We also prove that all the pieces (the compiler checks, the static verification, and the contracts on library methods) all combine together and guarantee IRC. Moat [40] has the same goal as our work, and the main difference is that Moat works for any code, and our work requires the application to perform all communications through a narrowly constrained interface. On the flip-side, Moat performs global analysis, tracks secrets in a fine-grained manner, and is not scalable beyond programs containing few hundred x86 instructions. In contrast, our approach is modular, avoids fine-grained tracking of secrets, and hence scales to larger programs. As mentioned before, our notion of confidentiality does not prevent information leaks via side channels such as memory access patterns. This channel has been addressed in GhostRider [25], which presents a co-designed compiler and hardware (containing Oblivious RAM) for guaranteeing memory trace oblivious computation.

Translation validation (e.g., [32] [35] [41] [42]) is a set of techniques that attempt to prove that compiler optimizations did not change the semantics of the program given as input (after the optimizer run). Our work is similar in spirit to translation validation since we use an off-the-shelf, untrusted compiler and validate whether the code it produced satisfies the security properties we are interested in.

9. Conclusion

We presented a methodology for designing Secure Isolated Regions, which enables certification of applications that need their code and data to remain confidential. Our methodology comprises enforcing the user code to communicate with the external world through a narrow interface, compiling the user code with a compiler that inserts run-time checks that aid verification, and linking it with a verified runtime that implements secure communication channels. We formalized the constraints on user code as Information Release Confinement (IRC), and presented a modular automatic verifier to check IRC. We believe that IRC, together with additional requirements on the implementation of the runtime, can guarantee a strong notion of confidentiality.

Acknowledgments

The authors from the University of California at Berkeley are supported in part by the NSF STARSS 1528108 and SRC 2460.001 grants and by Microsoft Research. We gratefully acknowledge our shepherd Michael Hicks and the anonymous reviewers for their insightful feedback.
References

[1] https://slashconfidential.github.io


