Leveraging Web Query Logs to Learn User Intent Via Bayesian Latent Variable Model

ICML Workshop on Combining Learning Strategies to Reduce Label Cost |

A key task in Spoken Language Understanding (SLU) is interpreting user intentions from speech utterances. This task is considered to be a classification problem with the goal of categorizing a given speech utterance into one of many semantic intent classes. Due to substantial utterance var, significant quantity of labeled utterances is needed to build robust intent detection systems. In this paper, we approach intent detection as a two-stage semi-supervised learning problem, which utilizes a large number of unlabeled queries collected from internet seach engine click logs. We first capture the underlying structure of the user queries using bayesian latent feature model. We then propagate this structure onto the unlabeled queries to obtain quality training data via a graph summarization algorithm. Our approach improves intent detection compared to comparison to our baseline, which uses a standard classification model with actual features.