Challenges for Supporting Faceted Search in Large, Heterogeneous Corpora like the Web

Faceted search systems help people find what they are looking by allowing them to specify not just keywords related to their information need, but also metadata.  While such systems hold great potential and have been successfully used in vertical domains, there are many challenges in extending them to large, heterogeneous collections like the Web, corporate intranets, or federated search engines that access many different data silos.  In this position paper we discuss the challenges in greater detail.  Those that we have identified stem from the fact that such datasets are 1) very large, making it difficult to assign quality meta-data to every document and to retrieve the full set of results and associated metadata at query time, and 2) heterogeneous, making it difficult to apply the same metadata to every result or every query.