Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

CIKM 2015: Best paper awards announced

October 19, 2015 | Posted by Microsoft Research Blog

CIKM 2015

Best paper

Assessing the impact of syntactic and semantic structures for answer passages reranking

Kateryna Tymoshenko (University of Trento); Alessandro Moschitti (Qatar Computing Research Institute)

In this paper, we extensively study the use of syntactic and semantic structures obtained with shallow and deeper syntactic parsers in the answer passage reranking task. We propose several dependency-based structures enriched with Linked Open Data (LD) knowledge for representing pairs of questions and answer passages. We use such tree structures in learning to rank (L2R) algorithms based on tree kernel. The latter can represent questions and passages in a tree fragment space, where each substructure represents a powerful syntactic/semantic feature. Additionally since we define links between structures, tree kernels also generate relational features spanning question and passage structures. We derive very important findings, which can be useful to build state-of-the-art systems: (i) full syntactic dependencies can outperform shallow models also using external knowledge and (ii) the semantic information should be derived by effective and high-coverage resources, e.g., LD, and incorporated in syntactic structures to be effective. We demonstrate our findings by carrying out an extensive comparative experimentation on two different TREC QA corpora and one community question answer dataset, namely Answerbag. Our comparative analysis on well-defined answer selection benchmarks consistently demonstrates that our structural semantic models largely outperform the state of the art in passage reranking.

Runner-up best paper

Private analysis of infinite data streams via retroactive grouping

Rui Chen, Yilin Shen, Hongxia Jin (Samsung Research America)

With the rapid advances in hardware technology, data streams are being generated daily in large volumes, enabling a wide range of real-time analytical tasks. Yet data streams from many sources are inherently sensitive, and thus providing continuous privacy protection in data streams has been a growing demand. In this paper, we consider the problem of private analysis of infinite data streams under differential privacy. We propose a novel data stream sanitization framework that periodically releases histograms summarizing the event distributions over sliding windows to support diverse data analysis tasks. Our framework consists of two modules, a sampling-based change monitoring module and a continuous histogram publication module. The monitoring module features an adaptive Bernoulli sampling process to accurately track the evolution of a data stream. We for the first time conduct error analysis of sampling under differential privacy, which allows to select the best sampling rate. The publication module features three different publishing strategies, including a novel technique called retroactive grouping to enjoy reduced noise. We provide theoretical analysis of the utility, privacy and complexity of our framework. Extensive experiments over real datasets demonstrate that our solution substantially outperforms the state-of-the-art competitors.

Best student paper

Struggling and success in web search

Daan Odijk (University of Amsterdam); Ryen B. White, Ahmed Hassan Awadallah, Susan T. Dumais (Microsoft Research)

Web searchers sometimes struggle to find relevant information. Struggling leads to frustrating and dissatisfying search experiences, even if searchers ultimately meet their search objectives. Better understanding of search tasks where people struggle is important in improving search systems. We address this important issue using a mixed methods study using large-scale logs, crowd-sourced labeling, and predictive modeling. We analyze anonymized search logs from the Microsoft Bing Web search engine to characterize aspects of struggling searches and better explain the relationship between struggling and search success. To broaden our understanding of the struggling process beyond the behavioral signals in log data, we develop and utilize a crowd-sourced labeling methodology. We collect third-party judgments about why searchers appear to struggle and, if appropriate, where in the search task it became clear to the judges that searches would succeed (i.e., the pivotal query). We use our findings to propose ways in which systems can help searchers reduce struggling. Key components of such support are algorithms that accurately predict the nature of future actions and their anticipated impact on search outcomes. Our findings have implications for the design of search systems that help searchers struggle less and succeed more.

Runner-up best student paper

Personalized trip recommendation with POI availability and uncertain traveling time

Chenyi Zhang, Hongwei Liang, Ke Wang (Simon Fraser University); Jianling Sun (Zhejiang University)

As location-based social network (LBSN) services become increasingly popular, trip recommendation that recommends a sequence of points of interest (POIs) to visit for a user emerges as one of many important applications of LBSNs. Personalized trip recommendation tailors to users’ specific tastes by learning from past check-in behaviors of users and their peers. Finding the optimal trip that maximizes user’s experiences for a given time budget constraint is an NP hard problem and previous solutions do not consider two practical and important constraints. One constraint is POI availability where a POI may be only available during a certain time window. Another constraint is uncertain traveling time where the traveling time between two POIs is uncertain. This work presents efficient solutions to personalized trip recommendation by incorporating these constraints to prune the search space. We evaluated the efficiency and effectiveness of our solutions on real life LBSN data sets.

For more computer science research news, visit