The Checked C research project is investigating how to extend the C programming language so that programmers can write more secure and reliable C programs. The project is developing an extension to C called Checked C that adds checking to C to detect or prevent common programming errors such as buffer overruns, out-of-bounds memory accesses, and incorrect type casts. The extension is designed to be used for existing system software written in C.
Finding out more
Checked C is an open, collaborative research project. The latest version of the Checked C language extension design is available on GitHub here You can see our on-going work on the language design and compiler implementation on GitHub. Developers and researchers are welcome to try it out, provide feedback, or contribute to the efforts.
We are working with Michael Hicks and Andrew Ruef at the University of Maryland on Checked C. We have also worked in the past with researchers at Samsung, who contributed to the implementation and provided feedback on the design.
David Tarditi gave a research talk on Checked C at the University of Washington in October, 2016. The talk is available on YouTube. The Checked C repository has the LaTex files for the design and compiler tests for the language extension. We are implementing Checked C in LLVM/clang. The Checked C clang repository and the Checked C LLVM repository have the source code for the compiler implementation.
- Sam Elliott (University of Washington): Sam worked an implementing the dynamic checking for Checked C. He wrote a technical report describing his work in detail.
- Jay Lim (Rutgers University): Jay extended the Checked C implementation of clang to support polymorphically-typed functions. This provides a type-safe replacement for many uses of void pointers in C
- Andrew Ruef (University of Maryland): Andrew wrote a tool for rewriting C programs to use Checked C extensions, specifically the ptr type. Andrew continues to work on this tool.
Most system software is written in C or C++, which is based on C. System software includes operating systems, browsers, databases, and programming language interpreters. System software is the “infrastructure” software that the world runs on.
There are certain kinds of programming errors such as buffer overruns and incorrect type casts that programmers can make when writing C or C++ programs. These errors can lead to security vulnerabilities or software reliability problems. The Checked C extension will let programmers add checking to their programs to detect these kinds of errors when a program runs or while it is being written. Existing system software can be modified incrementally in a backwards-compatible fashion to have this checking.
In C, programmers use pointers to access data. A pointer is the address of a memory cell. It is easy for programmers to make mistakes when working with pointers, such that a program reads or writes the wrong data. These mistakes can cause programs to crash, misbehave, or allow the program to be taken over by a malicious adversary. Checked C allows programmers to better describe how they intend to use pointers and the range of memory occupied by data that a pointer points to. This information is then used to add checking at runtime to detect mistakes where the wrong data is accessed, instead of the error occurring silently and without detection. This information also can be used detect programming errors while the program is being written. The checking is called “bounds-checking” because it checks that data is being accessed within its intended bounds. The name Checked C reflects the fact that static and dynamic checking is being added to C.
Many programming languages already have bounds checking. C# and Java are examples of such languages. However, those languages automatically add the information needed for bounds checking to data structures. This is a problem for system software, where the programmer needs precise control over what a program is doing. In Checked C, the programmer controls the placement of information needed for bounds-checking and how the information flows through the program, so the programmer retains precise control over what a program is doing.