Applications of Grammatical Inference in Security and Program Analysis


August 15, 2011


Domagoj Babic


University of California


Grammatical inference is a form of machine learning focusing on induction of formal grammars. Such induction is particularly useful for learning abstractions of software systems, either actively through queries, or passively through observation of behavior. In this talk, I will give three examples of how grammatical inference can be applied in security and software analysis.

First, I will discuss inference of protocol state machines in the realistic high-latency network setting, and applications of such inference to the analysis of botnet Command and Control (C&C) protocols. The proposed technique enables an order of magnitude reduction in the number of queries and time needed to learn a proprietary botnet C&C protocol compared to classic algorithms. I will also show that the computed protocol state machines enable formal analysis for botnet defense.

Second, I will show how the inference of protocol state machines can be combined with Directed Automated Random Testing (DART). The combination, called Model-inference-Assisted Concolic Execution (MACE) learns a state machine that abstracts the implemented protocol state machine. Using the inferred state machine, MACE guides further search and keeps refining the abstraction. Experiments show that MACE finds 7x more vulnerabilities than the baseline approach, achieves up to 58% higher instruction coverage, and gets significantly deeper into the search space.

Third, I will discuss a novel approach to automated malware detection and classification based on tree automata inference. After a brief introduction to malware analysis, I’m going to present a technique for inference of k-testable tree automata from kernel system call dependency graphs and show how such automata can be used for detection and classification of malware.

Finally, I will discuss possible directions for the future research.


Domagoj Babic

Domagoj Babic received his Dipl.Ing. in Electrical Engineering and M.Sc. in Computer Science from the Zagreb University (Faculty of Electrical Engineering and Computing) in 2001 and 2003. He received his Ph.D. in Computer Science in 2008 from the University of British Columbia. After spending some time in industry, he joined UC Berkeley, as a research scientist.

Domagoj’s research focuses on various aspects of software analysis (software verification, security, and testing), decision procedures, grammatical inference, and applied formal methods in general.

He is a recipient of the Canada’s NSERC PDF Research Fellowship (2010-2012), Microsoft Graduate Research Fellowship (2005-2007), Fulbright Fellowship (declined to attend UBC), and several awards at international programming competitions (1st place at the 2007 Satisfiability Modulo Theories competition in the bit-vector arithmetic category and 3rd place at the 2005 Satisfiability Testing competition in the satisfiable-crafted instances category).

For more, see: