LETOR: Learning to Rank for Information Retrieval

Established: January 1, 2009

LETOR is a package of benchmark data sets for research on LEarning TO Rank. LETOR3.0 contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines, for the OHSUMED data collection and the ‘.gov’ data collection. Version 1.0 was released in April 2007. Version 2.0 was released in Dec. 2007. Version 3.0 was released in Dec. 2008.

Recent Updates

Similarity relation (opens in new tab) of OHSUMED collection is released.
Sitemap (opens in new tab) of Gov collection is released.
Link graph (opens in new tab) of Gov collection is released.

What’s new in LETOR3.0?

LETOR3.0 contains several significant updates comparing with version 2.0:

Add four new datasets: homepage finding 2003, homepage finding 2004, named page finding 2003 and named page finding 2004. Plus the three datasets (OHSUMED, topic distillation 2003 and topic distillation 2004) in LETOR2.0, there are seven datasets in LETOR3.0.
New document sampling strategy for each query; and so the three datasets in LETOR3.0 are different from those in LETOR2.0;
New low level features for learning;
Meta data is provided for better investigation of ranking features;
More baselines;

Introduction to LETOR3.0 datasets

Please access this page (opens in new tab) for download.

A brief description about the directory tree is as follows:

Folder or file	Description
Letor.pdf	An incomplete document about the whole dataset.
EvaluationTool	The evaluation tools
Gov	Contain 6 datasets in .Gov
Gov\Meta	Meta data for all queries in 6 datasets in .Gov. The information can be used to extract some new features.
Gov\Feature_null	Original feature files of 6 datasets in .Gov. Since some document may do not contain query terms, we use “NULL” to indicate language model features, for which would be minus infinity values.
Gov\Feature_min	Replace the “NULL” value in Gov\Feature_null with the minimal vale of this feature under a same query. This data can be directly used for learning.
Gov\QueryLevelNorm	Conduct query level normalization based on data files in Gov\Feature_min. This data can be directly used for learning.
OHSUMED	Contain the OHSUMED dataset
OHSUMED\Meta	Meta data for all queries in 6 datasets in .gov. The information can be used to extract some new features.
OHSUMED\Feature_null	Original feature files of OHSUMED. Since some document may do not contain query terms, we use “NULL” to indicate language model features, for which would be a minus infinity values.
OHSUMED\Feature_min	Replace the “NULL” value in OHSUMED \Feature_null with the minimal vale of this feature under a same query. This data can be directly used for learning.
OHSUMED\QueryLevelNorm	Conduct query level normalization based on data files in OHSUMED \Feature_min. This data can be directly used for learning.

More Information

After the release of LETOR3.0, we have recieved many valuable suggestions and feedbacks. According to the suggestions, we release more information about the datasets.

Similarity relation of OHSUMED collection

Similarity relation (opens in new tab). The data is organized by queries. The order of queries in the file is the same as that in OHSUMED\Feature_null\ALL\OHSUMED.txt. The documents of a query in the similarity file are also in the same order as the OHSUMED\Feature_null\ALL\OHSUMED.txt file The similarity graph among documents under a specific query is encoded by a upper triangle matrix. Here is the example for a query:

============================

S(1,2) S(1,3) S(1,4) … S(1,N)

S(2,3) S(2,4) … S(2,N)

…

S(N-2,N-1) S(N-2,N)

S(N-1,N)

============================

in which N is the number of documents under this query, S(i,j) means the similarity between the i-th and j-th documents of the query. We simply use cosine similarity beteen the contents of two documents.

Sitemap of Gov collection

Sitemap (opens in new tab). Each line is a web page. The first column is the MSRA doc id of the page, the second column is the depth of the url (number of slashes), the third column is the lenghth of url (without “http://”), the fourth column is the number of its child pages in the sitemap, the fifth column is the MSRA doc id of its parent page (-1 indicates no parent page).

Mapping from MSRA doc id to TREC doc id (opens in new tab)

Link graph of Gov collection

Link graph (opens in new tab). Each line is a hyperlink. The first column is the MSRA doc id of the source of the hyperlink, and the second column is the MSRA doc id of the destination of the hyperlink.Mapping from MSRA doc id to TREC doc id (opens in new tab)

Additional Notes

The old version of LETOR can be found here (opens in new tab).
The following people contributed to the the construction of the LETOR dataset: Tao Qin (opens in new tab), Tie-Yan Liu (opens in new tab), Jun Xu (opens in new tab), Chaoliang Zhong, Kang Ji, and Hang Li (opens in new tab).
If you have any questions or suggestions with this version, please kindly let us know. Our goal is to make the dataset reliable and useful for the community.

Algorithms with linear ranking function

	TD2003	TD2004	NP2003	NP2004	HP2003	HP2004	OHSUMED	Prediction files on test set	Notes	Experiments by
Regression	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	test scores (opens in new tab)	Algorithm details (opens in new tab)	Da Kuang
RankSVM	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	test scores (opens in new tab)	Algorithm details (opens in new tab)	Chaoliang Zhong
ListNet	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	test scores (opens in new tab)	Algorithm details (opens in new tab)	Da Kuang
AdaRank-MAP	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	test scores (opens in new tab)	Algorithm details (opens in new tab)	Chaoliang Zhong
AdaRank-NDCG	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	test scores (opens in new tab)	Algorithm details (opens in new tab)	Chaoliang Zhong
SVMMAP	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	not available	Algorithm details (opens in new tab)	Yisong Yue

Algorithms with nonlinear ranking function

	TD2003	TD2004	NP2003	NP2004	HP2003	HP2004	OHSUMED	Prediction files on test set	Notes	Experiments by
RankBoost	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	test scores (opens in new tab)	Algorithm details (opens in new tab)	Yong-Deok Kim
FRank	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	test scores (opens in new tab)	Algorithm details (opens in new tab)	Ming-Feng Tsai

Recently added algorithms (with linear ranking function)

Please note that the above experimental results are still primal, since the result of almost every algorithm can be further improved. For example, for regression, we can add regularization item to make it more robust; for RankSVM, we can run more steps of iteration so as to guarantee a better convergence of the optimization; for ListNet, we can also add regularization item to its loss function and make it more generalizable to the test set. Any updates about the above algorithms or new ranking algorithms are welcome. The following table lists the updated results of several algorithms (Regression and RankSVM) and a new algorithm SmoothRank.We would like to thank Dr. Olivier Chapelle and Prof. Thorsten Joachims for kindly contributing the results.

	TD2003	TD2004	NP2003	NP2004	HP2003	HP2004	OHSUMED	Notes	Experiments by
Regression+L2 reg	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	Algorithm details (opens in new tab)	Dr. Olivier Chapelle
RankSVM-Primal	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	Algorithm details (opens in new tab)	Dr. Olivier Chapelle
RankSVM-Struct	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	Algorithm details (opens in new tab)	Prof. Thorsten Joachims
SmoothRank	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	here (opens in new tab)	Algorithm details (opens in new tab)	Dr. Olivier Chapelle

Summary of all algorithms and datasets

Excel file (opens in new tab)

How to compare with the baselines?

We note that different setting of experiments may greatly affect the performance of a ranking algorithm. To make fair comparisons, we encourage everyone to follow these common settings while using LETOR; deviations from these defaults must be noted when reporting results.

All reported algorithms use the “QueryLevelNorm” version of the datasets (i.e. query level normalization for feature processing). You are encouraged to use the same version and should indicate if you use a different one.
The test set cannot be used in any manner to make decisions about the structure or parameters of the model.
The validation set can only be used for model selection (setting hyper-parameters and model structure), but cannot be used for learning. Most baselines released in LETOR website use MAP on the validation set for model selection; you are encouraged to use the same strategy and should indicate if you use a different one.
All reported results must use the provided evaluation utility. While using the evaluation script, please use the original dataset. The evaluation tool (Eval-Score-3.0.pl) sorts the documents with same ranking scores according to their input order. That is, it is sensitive to the document order in the input file.
Please explicitly show the function class of ranking models (e.g. linear model, two layer neural net, or decision trees) in your work.

Additional Notes

The prediction score files on test set can be viewed by any text editor such as notepad.
More algorithms will be added in future.
If you would be like to publish the results of your algorithm here, please let us know

To use the datasets, you must read and accept the online agreement (opens in new tab). By using the datasets, you agree to be bound by the terms of its license.

Update: Due to website update, all the datasets are moved to cloud (hosted on OneDrive) and can be downloaded here (opens in new tab). You can get the file name as below and find the corresponding file in OneDrive. Please contact {taoqin AT microsoft DOT com} if any questions.

Download

“Gov.rar (opens in new tab)“, the .Gov dataset (about 1G),

“OHSUMED.rar”, the OHSUMED dataset (about 30M),

and “EvaluationTool.zip”, the evaluation tools (about 400k).