How do they Compare? Automatic Identification of Comparable Entities on the Web
- Alpa Jain ,
- Patrick Pantel
IEEE Conference on Information Reuse and Integration (IEEE-IRI-11). Las Vegas, NV. |
Published by IEEE
People love comparing things: from home mortgages and digital cameras to travel destinations and political philosophies. Today, we are mostly limited to browsing documents after issuing comparative queries to Web search engines, such as“15-year vs. 30-year mortgage”, “Nikon D90 / Canon 40D”, “Oahu or Maui”, and “communism vs. fascism”. There is an opportunity to improve the search experience by automatically offering comparisons to users. In this paper, we propose a first step towards this goal of comparative analysis by mining a broad class of comparable entities from search query logs and a large Web crawl. Example comparables that we extract include medicines, appliances, electronics, vacation destinations, and many more. We present an extensive empirical analysis showing that our methods generate comparables with high precision and recall, and showing that Web search query logs are a superior source for mining such entities as compared to Web pages, typically used for extraction tasks. We further compare the performance of our methods with “related entities” reported by Google Sets, and show a gain of 39% in average precision and a gain of 30% in NCDG.