Abstract

Past work on query segmentation has exclusively focused on at or non-hierarchical segmentation, where query words are simply partitioned into non-overlapping contiguous chunks of words. Such an approach suffers from the problem of granularity, and consequent difficulties in IR application. Here, we explore nested or hierarchical query segmentation, where segments are defined recursively as consisting of contiguous sequences of segments or query words, as an effective and powerful alternative representation of a query. We design a lightweight and unsupervised nested segmentation scheme, and propose how to use the tree arising out of the nested representation of a query to improve retrieval performance. We examine several aspects of the IR application framework and show that nested segmentation can be suitably exploited for the re-ranking of documents leading to significant gains over baselines that include the state-of-the-art at segmentation strategies, and can provide benefits when combined with the latest term proximity model.