Understanding the intent behind a user’s query can help search engine
to automatically route the query to some corresponding vertical search engines to
obtain particularly relevant contents, thus, greatly improving user satisfaction.
There are three major challenges to the query intent classification problem: (1)
Intent representation; (2) Domain coverage and (3) Semantic interpretation.
Current approaches to predict the user’s intent mainly utilize machine learning
techniques. However, it is difficult and often requires much human efforts to meet
all these challenges by the statistical machine learning approaches. In this paper,
we propose a general methodology to the problem of query intent classification.
With very little human effort, our method can discover large quantities of intent
concepts by leveraging Wikipedia, one of the best human knowledge base. The
Wikipedia concepts are used as the intent representation space, thus, each intent
domain is represented as a set of Wikipedia articles and categories. The intent of
any input query is identified through mapping the query into the Wikipedia
representation space. Compared with previous approaches, our proposed method
can achieve much better coverage to classify queries in an intent domain even
through the number of seed intent examples is very small. Moreover, the method is
very general and can be easily applied to various intent domains. We demonstrate
the effectiveness of this method in three different applications, i.e., travel, job, and
person name. In each of the three cases, only a couple of seed intent queries are
provided. We perform the quantitative evaluations in comparison with two baseline
methods, and the experimental results shows that our method significantly
outperforms other methods in each intent domain.