Abstract

Understanding the behavior of satisfied and unsatisfied Web search users is very important for improving users search experience. Collecting labeled data that characterizes search behavior is a very challenging problem. Most of the previous work used a limited amount of data collected in lab studies or annotated by judges lacking information about the actual intent. In this work, we performed a large scale user study where we collected explicit judgments of user satisfaction with the entire search task. Results were analyzed using sequence models that incorporate user behavior to predict whether the user ended up being satisfied with a search or not. We test our metric on millions of queries collected from real Web search traffic and show empirically that user behavior models trained using explicit judgments of user satisfaction outperform several other search quality metrics. The proposed model can also be used to optimize different search engine components. We propose a method that uses task level success prediction to provide a better interpretation of clickthrough data. Clickthough data has been widely used to improve relevance estimation. We use our user satisfaction model to distinguish between clicks that lead to satisfaction and clicks that do not. We show that adding new features derived from this metric allowed us to improve the estimation of document relevance.