The importance of intelligibility and transparency in machine learning Most real datasets have hidden biases. Being able to detect the impact of the bias in the data on the model, and then to repair the model, is critical if we are going to deploy machine learning in applications that affect people’s health, welfare, and social opportunities. This requires models that are intelligible. In machine learning, there is often a tradeoff between accuracy and intelligibility: the…
I am an Applied Researcher at Bing in Bellevue Washington, working on core relevance. I also lead a small team of Applied Researchers who are embedded in Microsoft Research Cambridge in the UK.
I’m a coordinator of the TREC-2009 Web Track, which evaluates search relevance on a 1 billion page web crawl [ ClueWeb09 ]. This year we are conducting a “TREC adhoc” evaluation and a diversity-aware evaluation.
I am interested in Web search evaluation. I built the VLC, VLC2, WT2g and .GOV test collections, which have been made available to research groups around the world. David Hawking and I coordinated the TREC Web Track experiments.
I also work on effective Web search, which means making use of information in pages, link structure and URL structure to generate more useful Web search results.
My PhD was in distributed information retrieval which means building a system on top of multiple engines/databases that already exist.
Established: January 21, 2016
The Dual Embedding Space Model (DESM) is an information retrieval model that uses two word embeddings, one for query words and one for document words. It takes into account the vector similarity between each query word vector and all document word vectors. A key challenge for information retrieval is to model document aboutness. The traditional approach uses term frequency, with more occurrences of a query word indicating that the document is more likely to be…