Relevance judgments sit at the core of test collection construction, and are assumed to model the utility of documents to real users. However, comparisons of judgments with signals of relevance obtained from real users, such as click counts and dwell time, have demonstrated a systematic mismatch.
In this paper, we study one important source of the mismatch between user data and relevance judgments: Those due to the high degree of effort required by users to identify and consume the information in a document. Information retrieval relevance judges are trained to search for evidence of relevance when assessing documents. For complex documents, this can lead to judges spending substantial time considering each document. However, in practice, search users are often much more impatient: if they do not see evidence of relevance quickly, they tend to give up.
Our results demonstrate that the amount of effort required to find the relevant information in a document plays an important role in the utility of that document to a real user. This effort is ignored in the way relevance judgments are currently obtained, despite the expectation that judges inform us about real users. We propose that if the goal is to evaluate the likelihood of utility to the user, effort as well as relevance should be taken into consideration, and possibly characterized independently, when judgments are obtained.