Improving Similarity Measures for Short Segments of Text

Scott Wen-tau Yih, Chris Meek

Proceedings of AAAI 2007 |

Published by American Association for Artificial Intelligence

View Publication

In this paper we improve previous work on measuring the similarity of short segments of text in two ways. First, we introduce a Web-relevance similarity measure and demonstrate its effectiveness. This measure extends the Web-kernel similarity function introduced by Sahami and Heilman (2006) by using relevance weighted inner-product of term occurrences rather than TF×IDF. Second, we show that one can further improve the accuracy of similarity measures by using a machine learning approach. Our methods outperform other state-of-the-art methods in a general query suggestion task for multiple evaluation metrics.