Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web
Extracting semantic relations among entities is an important first step in various tasks in Web mining and natural language processing such as information extraction, relation detection, and social network mining. A relation can be expressed extensionally by stating all the instances of that relation or intensionally by defining all the paraphrases of that relation. For example, consider the ACQUISITION relation between two companies. An extensional definition of ACQUISITION contains all pairs of companies in which one company is acquired by another (e.g. (YouTube, Google)) or (Powerset, Microsoft)). On the other hand we can intensionally define ACQUISITION as the relation described by lexical patterns such as X is acquired by Y, or Y purchased X, where X and Y denote two companies. We use this dual representation of semantic relations to propose a novel sequential co-clustering algorithm that can extract numerous relations efficiently from unlabeled data. We provide an efficient heuristic to find the parameters of the proposed co-clustering algorithm. Using the clusters produced by the algorithm, we train an L1 regularized logistic regression model to identify the representative patterns that describe the relation expressed by each cluster. We evaluate the proposed method in three different tasks: measuring relational similarity between entity pairs, open information extraction (Open IE), and classifying relations in a social network system. Experiments conducted using a benchmark dataset show that the proposed method improves existing relational similarity measures. Moreover, the proposed method significantly outperforms the current state-of-the-art Open IE systems in terms of both precision and recall.
Danushka Bollegala is an assistant professor in the Graduate School of Engineering at the University of Tokyo, Japan. He obtained his PhD in Information Science from The University of Tokyo in 2009 under the supervision of professor Mitsuru Ishizuka. His research interests are Artificial Intelligence, Computational Linguistic and Web Mining. He has worked in the past and is working currently on various topics related to the above fields including measuring semantic and relational similarity from Web data, personal name disambiguation, name alias extraction, and information ordering in multi-document text summarization.
- Danushka Bollegala
- The University of Tokyo