Kneser-Ney smoothing with a correcting function for small data sets


February 22, 2008


Peter Taraba


Smart Desktop


We present a technique which improves the Kneser- Ney smoothing algorithm on small data sets for bigrams and we develop a numerical algorithm which computes the parameters for the heuristic formula with a correction. We give motivation for the formula with correction on a simple example. Using the same example we show the possible difficulties one may run into with the numerical algorithm. Applying the algorithm to test data we show how the new formula improves the results on cross-entropy.


Peter Taraba

Peter Taraba received the M.S. degree in electrical engineering from the Department of Automatic Control, Slovak Technical University, Bratislava, Slovakia, in 2002.Currently, he is a Machine Learning Software Engineer with Smart Desktop, Seattle, WA. He was with Taureus (currently Ardaco), where he worked on PDMark solution, and ST Microelectronics (currently Upek), where he worked on algorithms for a TouchStrip fingerprint sensor (currently filed as a patent), and he was Intern with Microsoft Redmond and Haifa, where he worked in Hardware User Experience Group, Speech Server Group, and Haystack Team. He has research experience from Slovak Technical University (control theory) and J.W.Goethe University, Frankfurt, Germany (applied mathematics). He has published several papers in journals, conferences, and workshops. His main research interests are mathematical models, applied mathematics, control theory, speech recognition, moving finite elements, and image processing.