Microsoft Information-Seeking Conversations (MISC) data set
We introduce the Microsoft Information-Seeking Conversation data (MISC), a set of recordings of information-seeking conversations between human “seekers” and “intermediaries”. MISC includes audio and video signals; transcripts of conversation; affectual and physiological signals; recordings of search and other computer use; and post-task surveys on emotion, success, and effort.
Phlat is a new interface for Windows Desktop Search, enabling search through a user’s own e-mail, files, and viewed Web pages. Phlat makes it easy for users to specify queries and filters, attempting to integrate search and browsing in one intuitive interface.
This package implements several algorithms for language identification, and includes two sets of pre-compiled language profiles. One set covers 52 languages and was trained on Wikipedia (i.e. a well-written corpus); the other covers 26 languages and was constructed from Twitter (i.e. a highly colloquial corpus). The language identifiers are packaged up as a C# library,…
The basic idea of AdaRank is constructing “weak rankers” repeatedly based on reweighted training queries and linearly combining the weak rankers for making ranking predictions. In learning, AdaRank minimizes a loss function directly defined on performance measures. The details of AdaRank can be found in the paper “AdaRank: A Boosting Algorithm for Information Retrieval.”.
Pivot is an experimental application for exploring large data sets with smooth visual interactions. The application originally was released by Microsoft Live Labs in October 2009, and it is being re-released by Microsoft Research to enable the research community to continue to use it for experiments. If you have Internet Explorer 9 installed, disable GPU…
We offer a collection of common information-retrieval tools written in the DryadLINQ data parallel language. The tools are useful to the information-retrieval practitioner and instructive in the use of DryadLINQ.
Privacy Integrated Queries (PINQ) is a LINQ-like API for writing programs against sensitive data sets, while providing differential privacy guarantees for the underlying records. This first release provides the PINQ infrastructure, several example data analysis applications, and should be suitable for prototyping many differentially-private data analyses.
A collection of short programs to compute standard information-retrieval performance measures—Recall, Precision, F-measure, Mean Average Precision, Mean Reciprocal Rank, Normalized Discounted Cumulative Gain—in the presence of tied scores.