Abstract

Humans understand the world by classifying objects into an appropriate level of categories. This process is often automatic and subconscious. Psychologists and linguists call it as Basic-level Categorization (BLC). BLC can benefit lots of applications such as knowledge panel, advertising and recommendation. However, how to quantify basic-level concepts is still an open problem. Recently, much work focuses on constructing knowledge bases or semantic networks from web scale text corpora, which makes it possible for the first time to analyze computational approaches for deriving BLC. In this paper, we introduce a method based on typicality and PMI for BLC. We compare it with a few existing measures such as NPMI and commute time to understand its essence, and conduct extensive experiments to show the effectiveness of our approach. We also give a real application example to show how BLC can help sponsored search.

Thanks for your interests in this paper. Please also pay attentions to our ACL 2016 short text understanding tutorial: Understanding Short Texts – ACL 2016 Tutorial, presented by Zhongyuan Wang.