To Link or Not to Link? A Study on End-to-End Tweet Entity Linking

NAACL-HLT 2013 |

Information extraction on microblog posts is an important task nowadays, as microblogs capture an unprecedented amount of information and provide a view into the pulse of the world. Given that the current definition of named entity recognition is too limited, we consider the task of Twitter entity linking in this paper.
In the current entity linking literature, mention detection and entity disambiguation are frequently cast as equally important but distinct problems. However, in our task, we find that mention detection is often the performance bottleneck. The reason is that messages on micro-blogs are short, noisy, and informal texts with little context, and often contain phrases with ambiguous meanings.
To rigorously address the Twitter entity linking problem, we propose a structural SVM algorithm for entity linking that jointly optimizes mention detection and entity disambiguation as a single end-to-end task. By combining structural learning and a variety of firstorder, second-order, and context-sensitive features, our system is able to outperform existing state-of-the art entity linking systems by 15% F1.