A Comparison of Bayesian Estimators for Unsupervised Hidden Markov Model POS Taggers

There is growing interest in applying Bayesian techniques to NLP problems. There are a number of different estimators for Bayesian models, and it is useful to know what kinds of tasks each does well on. This paper compares a variety of different Bayesian estimators for Hidden Markov Model POS taggers with various numbers of hidden states on data sets of different sizes. Recent papers have given contradictory results when comparing Bayesian estimators to ExpectationMaximization (EM) for unsupervised HMM POS tagging, and we
show that the difference in reported results is largely due to differences in the size of the training data and the number of states in the HMM.We investigate a variety of samplers for HMMs, including some that these earlier papers did not study. We find that all of Gibbs samplers do well with small data sets and few states, and that Variational Bayes does well on large data sets and is competitive with the Gibbs samplers. In terms of times of convergence, we find that Variational Bayes was the fastest of all the estimators, especially on large data sets, and that explicitGibbs sampler (both pointwise and sentence-blocked) were generally faster than their collapsed counterparts on large data sets.