By Janie Chang, Writer, Microsoft Research
Computer security has been described as a game of one-upmanship, an ongoing escalation of techniques as both sides attempt to find new ways to assault and protect system vulnerabilities. The most prevalent forms of incursion over the last decade have been aimed at computer memory—and of these, the newest, most popular weapon of choice for attackers is a technique known as heap spraying.
Heap spraying works by allocating multiple objects containing the attacker’s exploit code in the program’s heap, the area of memory used for dynamic memory allocation. Many recent high-profile attacks, such as an Internet Explorer exploit in December 2008 and one of Adobe Reader in February 2009, were examples of heap spraying.
Heap-spray attacks are difficult to detect reliably, but Ben Livshits and Ben Zorn, researcher and principal researcher at Microsoft Research Redmond, respectively, have been studying this problem. They are confident that, in Nozzle, they have a tool for identifying heap-spray attacks that is reliable, general, and practical. During the 18th Usenix Security Symposium, held in August in Montreal, not only did they present their paper, Nozzle: A Defense Against Heap-spraying Code Injection Attacks, co-authored with Paruj Ratanaworabhan of Cornell University, but they also showed a live demo of their solution while on stage.
A Brief History of Memory Exploits
The goal of any attack is to get the targeted computer to run exploit code supplied by the attacker. To achieve this, two things must happen: The code must end up on the computer, and the computer must run that code.
The earliest type of memory exploit took advantage of buffer-stack overflows. Attackers found ways to overwrite a buffer on the stack and used that vulnerability to change or insert program code to make the program jump to instructions provided by the attacker. Stack-overflow attacks diminished in effectiveness as programming languages evolved to prevent buffer overflows.
Memory exploits then focused on heap-based overflows, in which, instead of placing instructions on the stack, attackers found ways to insert them into the program’s heap. Nowadays, heap-based exploits are more difficult to achieve. Operating systems such as Windows Vista use a technique called address-based layout randomization, in which the base address of the code, the heap, and the stack change each time the program runs. This prevents attackers from reliably predicting target addresses for code locations, and if there is one copy of the exploit code in a large heap, it’s akin to finding the proverbial needle in a haystack.
Heap spraying circumvents this challenge by allocating, or “spraying,” multiple copies of exploit code to increase the odds of finding a copy in the heap. The attacker can allocate hundreds of thousands of copies of exploit code into the heap. All that’s needed is for one random program jump to land on one copy of such code, and a successful attack begins.
But how does an outside entity manage to allocate thousands of copies of exploit code onto a remote computer?
Such an approach may not work every time; there is always an element of luck involved. But if the attacker lures enough users to the page, enough of the attempts will succeed, and enough machines would be compromised to cause nuisance or damage.
Kittens of Doom
Just about any form of data can be used for exploitation, Zorn says. To drive this point home during the Usenix Security Symposium, the researchers displayed a slide titled “Kittens of Doom: Is No Data Sacred?”
“We wanted to convey that the most innocent of files can be used for exploitation,” Livshits says. “This is an apparently harmless image of a kitten, but there is a malicious payload in the comment field of the image that initiates a heap-spraying attack on the browser.
“Not every heap-spraying attack works, so it’s possible the data you receive had passed harmlessly through other users, because the spray worked but the exploit failed. What’s benign to another user else could be a problem for you.”
All Roads Lead to Shell Code
Given that any data can be used for exploitation, the researchers took the perspective that they should examine all objects on the heap. In some cases, data can look like code and vice versa, making it even more difficult to reliably identify harmful objects.
The first breakthrough for the team came when they decided that, instead of looking at individual instructions in an object, they would analyze its control flow.
“The ultimate goal of these objects is to get to the shell code.” Zorn says, “That’s what we call the code that causes actual damage. If the object can’t direct control to the shell code, the attack fails.
“If there is an object and, no matter where we jump into it, we almost always end up going to the same place, then it qualifies as suspicious. Now, there could be non-malicious objects in the heap that contain what look like instructions—but it’s very unlikely that they will also try to make you go to the same place. So control flow is a semantic property that helped us zero in on malicious objects.”
This approach proved more reliable than other detection schemes, with only a 10 percent false-positive rate. The researchers, though, were aiming for zero false-positives, if possible.
“We are talking about stopping the program each time we detect a suspicious object,” Livshits says. “If objects are actually harmless 10 percent of the time, it’s an unacceptable amount of disruption to the user.”
Profile of an Exploit
Fortunately, there is another characteristic of heap spraying the researchers could leverage: To be successful, attackers have to allocate thousands of objects into the heap. This understanding led to the researchers’ second breakthrough: the notion of the global heap metric index, an aggregate of measurements across all heap objects.
“In a spray attack, we don’t have just a few suspicious objects.” Zorn says. “There are thousands, representing a large percentage of the heap. So we came up with an index that would indicate the health of the entire heap—essentially a measure of the fraction of the heap that contains suspicious objects.”
A few suspicious objects won’t raise an alarm. But a high density of suspicious objects is a reliable indication of a heap-spraying attack. The global heap metric index dramatically reduced the false-positive rate.
“We take advantage of the very scheme attackers depend on for exploitation,” Zorn says. “In order for such attacks to work, they must allocate many, many objects; so we monitor whether a significant percentage of the heap contains suspicious objects.”
Putting Results to Work
Using their findings, the team built Nozzle, a run-time tool that takes a two-level approach to detecting heap-spraying attacks.
Nozzle’s lightweight emulator scans heap-allocated objects and treats them as though they are code: It disassembles the code, follows the flow, and builds a control-flow graph. This analysis identifies potentially unsafe code within a safe environment. Nozzle also maintains the global heap metric index.
During testing, Nozzle proved its effectiveness by detecting 100 percent of 12 published and 2,000 synthetically constructed heap-spraying exploits. Even with a detection threshold set six times lower than what is required to detect published malicious attacks, Nozzle reported zero false positives when run against 150 popular Internet sites.
But how does Nozzle affect application performance?
“When we designed the algorithms,” Livshits says, “given that our primary target for protection is the highly competitive browser market, we had to minimize overhead. If browsers slow down to a crawl when Nozzle is running, the technology just wouldn’t be interesting to manufacturers.
“If we examine every single heap object, Nozzle slows down execution by 2 to 14 times. However, by using sampling, we can achieve effective detection and reduce overhead significantly. A sampling rate of 1 in 20 worked for us and incurred only a 5 to 10 percent performance overhead.”
Not Just for Browsers
When the researchers started work on the Nozzle project, they had browsers in mind. But when they heard about the PDF exploit earlier this year, they tried an experiment.
Adds Livshits: “The point is that Nozzle raises the bar considerably for the state of the art in this space. It is the first defensive technology to explicitly go after heap spraying. Plus, there is evidence to show that it can be an effective and reliable general tool.”
Always a Next Step
Zorn and Livshits would never suggest that Nozzle alone is sufficient protection against heap spraying. Defense in depth is their recommendation, a combination of tactics to counteract memory exploits.
“Nozzle is orthogonal to other defensive strategies,” Zorn says. “Defensive programming will always be important, and as more systems support mechanisms such as Data Execution Prevention (DEP), there will be more obstacles to heap exploits.”
But doesn’t enabling DEP to prevent execution of code within the heap effectively block all instances of heap exploits?
“There are technical and compatibility issues that prevent DEP from being used in some environments,” Livshits says. “Plus, we are already hearing of attacks that start by turning off DEP. Also, there have been code-injection-spraying attacks in areas where DEP can’t be used. So DEP is not the silver bullet—and neither is Nozzle.”
There is no doubt that once heap-memory exploits are a thing of the past, other threats will appear. Livshits and Zorn relish the challenge, though, because the results of their work are so satisfyingly obvious and demonstrable.
For now, they are interested in those recent code-injection-spraying attacks that foil DEP. They plan to show that Nozzle also can be effective in detecting such attacks―and that the forces for good can count on another tool to help keep software secure.