Better k-best Parsing, Hypergraphs, and Dynamic Programming


December 7, 2005


Liang Huang


University of Pennsylvania


Joint work with David Chiang (University of Maryland)

K-best parsing (and k-best processing in general) has become a popular technique in natural language processing. However, fast and exact k-best algorithms are largely unknown to the parsing community. In this work, we develop efficient algorithms for exact k-best derivation trees in the framework of directed monotonic hypergraphs. To demonstrate the efficiency, scalability and accuracy of these algorithms, we present experiments on Bikel’s implementation of Collins’ lexicalized PCFG model, and on Chiang’s CFG-based decoder for hierarchical phrase-based translation. We show in particular how the improved output of our algorithms has the potential to improve results from parse reranking systems and other applications.

These algorithms have been re-implemented by other researchers in the field, including Eugene Charniak for his n-best parser, Ryan McDonald and Microsoft Research for dependency parsers, and the USC/ISI team for the CKY-based decoder in their syntax-based machine translation system. All of these experiments confirmed the findings in our work.

Liang Huang and David Chiang (2005). Better k-best Parsing. Proceedings of the 9th International Workshop on Parsing Technologies (IWPT).


Liang Huang

Liang Huang is a PhD student at the University of Pennsylvania, under Aravind Joshi. He is mainly interested in algorithmic problems in computational linguistics, esp. parsing and syntax-based machine translation. His most recent work has been on efficient integration of n-gram models for syntax-based decoding while visiting USC/ISI under Kevin Knight. He also works on applying dynamic programming and formal grammars to structural biology, in collaboration with Ken Dill of UCSF.