Capturing Long Distance Dependency in Language Modeling: An Empirical Study
This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word kipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on the task of Japanese Kana-Kanji conversion. We show that the two techniques, while simple, outperform existing methods studied in this paper, and lead to language models that perform significantly better than a word trigram model. We also investigate how factors such astraining corpus size and genre affect the performance of the models.