Concept-based Short Text Classification and Ranking
- Fang Wang ,
- Zhongyuan Wang ,
- Zhoujun Li ,
- Ji-Rong Wen
ACM International Conference on Information and Knowledge Management (CIKM) |
Published by ACM - Association for Computing Machinery
Most existing approaches for text classification represent texts as vectors of words, namely “Bag-of-Words.” This text representation results in a very high dimensionality of feature space and frequently suffers from surface mismatching. Short texts make these issues even more serious, due to their shortness and sparsity. In this paper, we propose using “Bag-of-Concepts” in short text representation, aiming to avoid the surface mismatching and handle the synonym and polysemy problem. Based on “Bag-of-Concepts,” a novel framework is proposed for lightweight short text classification applications. By leveraging a large taxonomy knowledgebase, it learns a concept model for each category, and conceptualizes a short text to a set of relevant concepts. A concept-based similarity mechanism is presented to classify the given short text to the most similar category. One advantage of this mechanism is that it facilitates short text ranking after classification, which is needed in many applications, such as query or ad recommendation. We demonstrate the usage of our proposed framework through a real online application: Channel-based Query Recommendation. Experiments show that our framework can map queries to channels with a high degree of precision (avg: precision = 90 :3%), which is critical for recommendation applications.
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.