The Checked C research project is investigating how to extend the C programming language so that programmers can write more secure and reliable C programs. The project is developing an extension to C called Checked C that adds checking to C to detect or prevent common programming errors such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts. The extension is designed to be used for existing system software written in C.
Finding out more
Checked C is an open, collaborative research project. Researchers and developers can contribute to the design and the compiler implementation on Github. We are working with Michael Hicks and Andrew Ruef at the University of Maryland on Checked C, as well as researchers at Samsung. Sam Elliott from the University of Washington PLSE group will be joining us for an internship working on Checked C soon.
David Tarditi gave a research talk on Checked C at the University of Washington in October, 2016. The talk is available on YouTube. The latest version of the Checked C language extension design is here. The Checked C repository has the LaTex files for the design and compiler tests for the language extension. We are implementing Checked C in LLVM/clang. The Checked C clang repository and the Checked C LLVM repository have the source code for the compiler implementation.
Most system software is written in C or C++, which is based on C. System software includes operating systems, browsers, databases, and programming language interpreters. System software is the “infrastructure” software that the world runs on.
There are certain kinds of programming errors such as buffer overruns and incorrect type casts that programmers can make when writing C or C++ programs. These errors can lead to security vulnerabilities or software reliability problems. The Checked C extension will let programmers add checking to their programs to detect these kinds of errors when a program runs or while it is being written. Existing system software can be modified incrementally in a backwards-compatible fashion to have this checking.
In C, programmers use pointers to access data. A pointer is the address of a memory cell. It is easy for programmers to make mistakes when working with pointers, such that a program reads or writes the wrong data. These mistakes can cause programs to crash, misbehave, or allow the program to be taken over by a malicious adversary. Checked C allows programmers to better describe how they intend to use pointers and the range of memory occupied by data that a pointer points to. This information is then used to add checking at runtime to detect mistakes where the wrong data is accessed, instead of the error occurring silently and without detection. This information also can be used detect programming errors while the program is being written. The checking is called “bounds-checking” because it checks that data is being accessed within its intended bounds. The name Checked C reflects the fact that static and dynamic checking is being added to C.
Many programming languages already have bounds checking. C# and Java are examples of such languages. However, those languages automatically add the information needed for bounds checking to data structures. This is a problem for system software, where the programmer needs precise control over what a program is doing. In Checked C, the programmer controls the placement of information needed for bounds-checking and how the information flows through the program, so the programmer retains precise control over what a program is doing.