Abstract

Entity synonyms are critical for many applications like information retrieval and named entity recognition in documents. The current trend is to automatically discover entity synonyms using statistical techniques on web data. Prior techniques suffer from several limitations like click log sparsity and inability to distinguish between entities of different concept classes. In this paper, we propose a general framework for robustly discovering entity synonym with two novel similarity functions that overcome the limitations of prior techniques. We develop efficient and scalable techniques leveraging the MapReduce framework to discover synonyms at large scale. To handle long entity names with extraneous tokens, we propose techniques to effectively map long entity names to short queries in query log. Our experiments on real data from different entity domains demonstrate the superior quality of our synonyms as well as the efficiency of our algorithms. The entity synonyms produced by our system is in production in Bing Shopping and Video search, with experiments showing the significance it brings in improving search experience.