Estimating the Reliability of MDP Policies: a Confidence Interval Approach

Proceedings Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics |

View Publication

Data sparsity is one of the major issues that NLP researchers always wrestle with. That is, does one have enough data to make reliable conclusions in an experiment? Using Reinforcement Learning to improve a spoken dialogue system is no exception. Past approaches in this area have simply assumed that there was enough collected data to derive reliable dialog control policies or used thousands of user simulations to overcome the sparsity issue. In this paper we present a methodology for numerically constructing confidence bounds on the expected reward for a constructed policy, and use these bounds to better estimate the reliability of that policy. We apply this methodology to a prior experiment of using MDP’s to predict the best features to include in a model of the dialogue state. Our results show that policies developed in the prior work were not as reliable as previously determined but the overall ranking of features remains the same.