Query performance prediction estimates the effectiveness of a query in advance of human judgements. Accurate prediction could be used, for example, to trigger special processing, select query variants, or choose whether to search at all.
Prediction evaluations have not distinguished effects due to query wording from effects due to the underlying information need, nor from effects due to performance of the retrieval system itself. Here we use five rankers, 100 tasks, and 28,869 queries to distinguish these three effects over six pre-retrieval predictors. We see that task effects dominate those due to query or ranker; that many “query performance predictors” are in fact predicting task difficulty; and that this makes it difficult to use these algorithms.