Abstract

We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. In contrast, we show that an active exploration strategy can provide data that leads to much faster learning. Specifically, we develop a Bayesian approach for selecting rankings to present users so that interactions result in more informative training data. Our results using the TREC-10 Web corpus, as well as synthetic data, demonstrate that a directed exploration strategy quickly leads to users being presented improved rankings in an online learning setting. We find that active exploration substantially outperforms passive observation and random exploration.