LETOR: Learning to Rank for Information Retrieval

Established: January 1, 2009

LETOR is a package of benchmark data sets for research on LEarning TO Rank, which contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines. Version 1.0 was released in April 2007. Version 2.0 was released in Dec. 2007. Version 3.0 was released in Dec. 2008. This version, 4.0, was released in July 2009. Very different from previous versions (V3.0 is an update based on V2.0 and V2.0 is an update based on V1.0), LETOR4.0 is a totally new release. It uses the Gov2 web page collection (~25M pages) and two query sets from Million Query track of TREC 2007 and TREC 2008. We call the two query sets MQ2007 and MQ2008 for short. There are about 1700 queries in MQ2007 with labeled documents and about 800 queries in MQ2008 with labeled documents.

 

People

LETOR 4.0

Datasets

LETOR is a package of benchmark data sets for research on LEarning TO Rank, which contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines. Version 1.0 was released in April 2007. Version 2.0 was released in Dec. 2007. Version 3.0 was released in Dec. 2008. This version, 4.0, was released in July 2009. Very different from previous versions (V3.0 is an update based on V2.0 and V2.0 is an update based on V1.0), LETOR4.0 is a totally new release. It uses the Gov2 web page collection (~25M pages) and two query sets from Million Query track of TREC 2007 and TREC 2008. We call the two query sets MQ2007 and MQ2008 for short. There are about 1700 queries in MQ2007 with labeled documents and about 800 queries in MQ2008 with labeled documents.

Datasets

LETOR4.0 contains 8 datasets for four ranking settings derived from the two query sets and the Gov2 web page collection. The 5-fold cross validation strategy is adopted and the 5-fold partitions are included in the package. In each fold, there are three subsets for learning: training set, validation set and testing set.

Setting Datasets
Supervised ranking MQ2007
MQ2008
Semi-supervised ranking MQ2007-semi
MQ2008-semi
Rank aggregation MQ2007-agg
MQ2008-agg
Listwise ranking MQ2007-list
MQ2008-list

Descriptions

  • Supervised rankingThere are three versions for each dataset in this setting: NULL, MIN, QueryLevelNorm.
    • NULL verion: Since some document may do not contain query terms, we use “NULL” to indicate language model features, for which would be minus infinity values. This version of the data cannot be directly be used for learning; the “NULL” should be processed first.
    • MIN version: Replace the “NULL” value in NULL version with the minimal vale of this feature under a same query. This data can be directly used for learning.
    • QueryLevelNorm version: Conduct query level normalization based on data in MIN version. This data can be directly used for learning. We further provide 5 fold partitions of this version for cross fold validation.

    Each row is a query-document pair. The first column is relevance label of this pair, the second column is query id, the following columns are features, and the end of the row is comment about the pair, including id of the document. The larger the relevance label, the more relevant the query-document pair. A query-document pair is represented by a 46-dimensional feature vector. Here are several example rows from MQ2007 dataset:

    =================================

    2 qid:10032 1:0.056537 2:0.000000 3:0.666667 4:1.000000 5:0.067138 … 45:0.000000 46:0.076923 #docid = GX029-35-5894638 inc = 0.0119881192468859 prob = 0.139842

    0 qid:10032 1:0.279152 2:0.000000 3:0.000000 4:0.000000 5:0.279152 … 45:0.250000 46:1.000000 #docid = GX030-77-6315042 inc = 1 prob = 0.341364

    0 qid:10032 1:0.130742 2:0.000000 3:0.333333 4:0.000000 5:0.134276 … 45:0.750000 46:1.000000 #docid = GX140-98-13566007 inc = 1 prob = 0.0701303

    1 qid:10032 1:0.593640 2:1.000000 3:0.000000 4:0.000000 5:0.600707 … 45:0.500000 46:0.000000 #docid = GX256-43-0740276 inc = 0.0136292023050293 prob = 0.400738

    =================================

  • Semi-supervised rankingThe data format in this setting is the same as that in supervised ranking setting. The only difference is that the datasets in this setting contains both judged and undged query-document pair (in training set but not in validation and testing set) while the datasets in supervised ranking contain only judged query-document pair. The relevance label “-1” indicates the query-document pair is not judged. An example is shown as follow.

==============================

-1 qid:18219 1:0.022594 2:0.000000 3:0.250000 4:0.166667 … 45:0.004237 46:0.081600 #docid = GX004-66-12099765 inc = -1 prob = 0.223732

0 qid:18219 1:0.027615 2:0.500000 3:0.750000 4:0.333333 … 45:0.010291 46:0.046400 #docid = GX004-93-7097963 inc = 0.0428115405134536 prob = 0.860366

-1 qid:18219 1:0.018410 2:0.000000 3:0.250000 4:0.166667 … 45:0.003632 46:0.033600 #docid = GX005-04-11520874 inc = -1 prob = 0.0980801

==============================

  • Rank aggregationIn the setting, a query is associated with a set of input ranked lists. The task of rank aggregation is to output a better final ranked list by aggregating the multiple input lists. A row in the data indicate a query-document pair. Several rows are shown as below.

==============================

0 qid:10002 1:1 2:30 3:48 4:133 5:NULL … 25:NULL #docid = GX008-86-4444840 inc = 1 prob = 0.086622

0 qid:10002 1:NULL 2:NULL 3:NULL 4:NULL 5:NULL … 25:NULL #docid = GX037-06-11625428 inc = 0.0031586555555558 prob = 0.0897452

2 qid:10032 1:6 2:96 3:88 4:NULL 5:NULL … 25:NULL #docid = GX029-35-5894638 inc = 0.0119881192468859 prob = 0.139842

==============================

The first column is relevance label of this pair, the second column is query id, the following columns are ranks of the document in the input ranked lists, and the end of the row is comment about the pair, including id of the document.In the above example, 2:30 means that the ranks of the document is 30 in the second input list. Note that large ranks mean top positions in the input ranked list, and “NULL” means the document does not appear in a ranked list. The larger the relevance label, the more relevant the query-document pair. There are 21 input lists in MQ2007-agg dataset and 25 input lists in MQ2008-agg dataset.

  • Listwise ranking

The data format in the setting is very similar to that in supervised ranking. The difference is that the ground truth of this setting is a permutation for a query instead of multiple level relevance judgements. As shown in the following examples, the first column is the relevance degree of a document in ground truth permutation. Large value of the relevance degree means top position of the document in the permutation. The other columns are the same as that in the setting of supervised ranking.

==============================

1008 qid:10 1:0.004356 2:0.080000 3:0.036364 4:0.000000 … 46:0.000000 #docid = GX057-59-4044939 inc = 1 prob = 0.698286

1007 qid:10 1:0.004901 2:0.000000 3:0.036364 4:0.333333 … 46:0.000000 #docid = GX235-84-0891544 inc = 1 prob = 0.567746

1006 qid:10 1:0.019058 2:0.240000 3:0.072727 4:0.500000 … 46:0.000000 #docid = GX016-48-5543459 inc = 1 prob = 0.775913

1005 qid:10 1:0.004901 2:0.160000 3:0.018182 4:0.666667 … 46:0.000000 #docid = GX068-48-12934837 inc = 1 prob = 0.659932

==============================

More low level information

Update: Due to website update, all the datasets are moved to cloud (hosted on OneDrive) and can be downloaded here. You can get the file name from the following link and find the corresponding file in OneDrive. Please contact {taoqin AT microsoft DOT com} if any questions.

  • Meta dataMeta data for all queries in the two query sets. The information can be used to reproduce some features like BM25 and LMIR, and can also be used to construct some new features. Meta data for MQ2007 query set (~ 60M). Meta data for MQ2008 query set (~ 50M). Collection info (~1 k).
  • Link graph (~ 480M) of Gov2 collectionEach line contains the inlinks of a web page. The first column is the MSRA doc id of the web page, the second column is the number of inlinks of this page, and the following columns list the MSRA doc ids of all the inlinks of this page. The mapping from MSRA doc id to TREC doc id can be found here (~ 140M).
  • Sitemap (~ 65M) of Gov2 collectionEach line is a web page. The first column is the MSRA doc id of the page, the second column is the depth of the url (number of slashes), the third column is the length of url (without “http://”), the fourth column is the number of its child pages in the sitemap, the fifth column is the MSRA doc id of its parent page (-1 indicates no parent page).
  • Similarity relation of Gov2 collection

The data is organized by queries. Similarity for MQ2007 query set (~ 4.3G), similarity for MQ2008 query set(part1 and part2,  ~ 4.9G).The order of queries in the two files is the same as that in Large_null.txt in the MQ2007-semi dataset and MQ2008-semi dataset.

The order of documents of a query in the two files is also the same as that in Large_null.txt in the MQ2007-semi dataset and MQ2008-semi dataset.

Each row in the similarity files describes the similarity between a page and all the other pages under a same query. Note that i-th row in the similiar files is exactly corresponding to the i-th row in Large_null.txt in MQ2007-semi dataset or MQ2008-semi dataset. Here is the an example line:

============================

qid:10002 qdid:1 406:0.785623 178:0.785519 481:0.784446 63:0.741556 882:0.512454 …

============================

The first column shows the query id, and the second column shows the page index under the query. For example, for a query with 1000 web pages, the page index ranges from 1 to 1000. The following columns show the similarity between this page and the other pages. For example, 406:0.785623 indicates that the similarity between this page (with index 1 under the query) and the page (with index 406 under the query) is 0.785623. We sort the pages according to the descending order of similarity. The similarity between two pages is consine similarity between the contents of the two pages.

Additional Notes

  • The following people contributed to the the construction of the LETOR4.0 dataset: Tao Qin, Tie-Yan Liu, Wenkui Ding, Jun Xu, Hang Li, Ben Carterette, Javed Aslam, James Allan, Stephen Robertson, Virgil Pavlu, and Emine Yilmaz .
  • We would like to thank the following teams to kindly and generiously share their runs submitted to TREC2007/2008: NEU team, U. Massachusetts team, I3S_Group_of_ICT team, ARSC team, IBM Haifa team, MPI-d5 team, Sabir.buckley team, HIT team, RMIT team, U. Amsterdam team, U. Melbourne team,
  • If you have any questions or suggestions with this version, please kindly let us know. Our goal is to make the dataset reliable and useful for the community.

Baselines

Baselines for supervised ranking

Update: Due to website update, all the datasets are moved to cloud (hosted on OneDrive) and can be downloaded here. You can get the file name from the following link and find the corresponding file in OneDrive. Please contact {taoqin AT microsoft DOT com} if any questions.

Baselines for rank aggregation

Algorithms MQ2007-agg MQ2008-agg Notes Experiments by
BordaCount here here algorithm details Tao Qin
CPS-KendallTau here here algorithm details Xiubo Geng
CPS-SpearmanFootrule here here algorithm details Xiubo Geng
CPS-SpearmanRankCorrelation here here algorithm details Xiubo Geng

Download

To use the datasets, you must read and accept the online agreement. By using the datasets, you agree to be bound by the terms of its license.

Datasets

Update: Due to website update, all the datasets are moved to cloud (hosted on OneDrive) and can be downloaded here. You can get the file name from the following table and fetch the corresponding file in OneDrive. Please contact {taoqin AT microsoft DOT com} if any  questions.

Note that the two semi-supervised ranking datasets have been updated on Jan. 7, 2010. Please download the new version if you are using the old ones.

Setting Datasets Size
Supervised ranking MQ2007.rar ~ 65M
MQ2008.rar ~ 15M
Semi-supervised ranking MQ2007-semi.rar ~ 940M
MQ2008-semi.rar ~ 650M
Rank aggregation MQ2007-agg.rar ~ 20M
MQ2008-agg.rar ~ 4M
Listwise ranking MQ2007-list.rar ~ 950M
MQ2008-list.rar ~ 670M

Feature list for supervised ranking, semi-supervised ranking and listwise ranking can be found in this document.

Evaluation tools

The evaluation scripts for LETOR4.0 are a little different from those for LETOR3.0.

Please do not use the tools across LETOR3.0 and LETOR4.0.

Evaluation script for supervised ranking, semi-supervised ranking and rank aggregation

Evaluation script for listwise ranking

Significance test script for all the four settings

Possible issuesIf you are using a linux machine and meet some problems with the scripts, you may try the solution from Sergio Daniel. Thank Sergio for sharing!

————————-

The evaluation script (http://research.microsoft.com/en-us/um/beijing/projects/letor//LETOR4.0/Evaluation/Eval-Score-4.0.pl.txt) isn’t working for me on the letor 4.0 MQ2008 dataset. I use perl v5.14.2 on a linux machine. I made a little modification and now it is running =)

I replaced the line:

if ($lnFea =~ m/^(\d+) qid\:([^\s]+).*?\#docid = ([^\s]+) inc = ([^\s]+) prob = ([^\s]+)$/)

with:

if ($lnFea =~ m/^(\d+) qid\:([^\s]+).*?\#docid = ([^\s]+) inc = ([^\s]+) prob = ([^\s]+).$/)

Sergio.

————————-

Low level information

Data Size
Meta data Meta data for MQ2007 query set ~ 60M
Meta data for MQ2008 query set ~ 50M
Collection info ~1 k
Relation information Link graph of Gov2 collection ~ 480M
Sitemap of Gov2 collection ~ 65M
Similarity for MQ2007 query set ~ 4.3G
similarity for MQ2008 query set ~ 4.9G

LETOR 3.0

Datasets

LETOR is a package of benchmark data sets for research on LEarning TO Rank. LETOR3.0 contains standard features, relevance judgments, data partitioning, evaluation tools, and several baselines, for the OHSUMED data collection and the ‘.gov’ data collection. Version 1.0 was released in April 2007. Version 2.0 was released in Dec. 2007. Version 3.0 was released in Dec. 2008.

Recent Updates

What’s new in LETOR3.0?

LETOR3.0 contains several significant updates comparing with version 2.0:

  • Add four new datasets: homepage finding 2003, homepage finding 2004, named page finding 2003 and named page finding 2004. Plus the three datasets (OHSUMED, topic distillation 2003 and topic distillation 2004) in LETOR2.0, there are seven datasets in LETOR3.0.
  • New document sampling strategy for each query; and so the three datasets in LETOR3.0 are different from those in LETOR2.0;
  • New low level features for learning;
  • Meta data is provided for better investigation of ranking features;
  • More baselines;

Introduction to LETOR3.0 datasets

Please access this page for download.

A brief description about the directory tree is as follows:

Folder or file Description
Letor.pdf An incomplete document about the whole dataset.
EvaluationTool The evaluation tools
Gov Contain 6 datasets in .Gov
Gov\Meta Meta data for all queries in 6 datasets in .Gov. The information can be used to extract some new features.
Gov\Feature_null Original feature files of 6 datasets in .Gov. Since some document may do not contain query terms, we use “NULL” to indicate language model features, for which would be minus infinity values.
Gov\Feature_min Replace the “NULL” value in Gov\Feature_null with the minimal vale of this feature under a same query. This data can be directly used for learning.
Gov\QueryLevelNorm Conduct query level normalization based on data files in Gov\Feature_min. This data can be directly used for learning.
OHSUMED Contain the OHSUMED dataset
OHSUMED\Meta Meta data for all queries in 6 datasets in .gov. The information can be used to extract some new features.
OHSUMED\Feature_null Original feature files of OHSUMED. Since some document may do not contain query terms, we use “NULL” to indicate language model features, for which would be a minus infinity values.
OHSUMED\Feature_min Replace the “NULL” value in OHSUMED \Feature_null with the minimal vale of this feature under a same query. This data can be directly used for learning.
OHSUMED\QueryLevelNorm Conduct query level normalization based on data files in OHSUMED \Feature_min. This data can be directly used for learning.

More Information

After the release of LETOR3.0, we have recieved many valuable suggestions and feedbacks. According to the suggestions, we release more information about the datasets.

  • Similarity relation of OHSUMED collection

Similarity relation. The data is organized by queries. The order of queries in the file is the same as that in OHSUMED\Feature_null\ALL\OHSUMED.txt. The documents of a query in the similarity file are also in the same order as the OHSUMED\Feature_null\ALL\OHSUMED.txt file The similarity graph among documents under a specific query is encoded by a upper triangle matrix. Here is the example for a query:

============================

N

S(1,2) S(1,3) S(1,4) … S(1,N)

S(2,3) S(2,4) … S(2,N)

S(N-2,N-1) S(N-2,N)

S(N-1,N)

============================

in which N is the number of documents under this query, S(i,j) means the similarity between the i-th and j-th documents of the query. We simply use cosine similarity beteen the contents of two documents.

  • Sitemap of Gov collection

Sitemap. Each line is a web page. The first column is the MSRA doc id of the page, the second column is the depth of the url (number of slashes), the third column is the lenghth of url (without “http://”), the fourth column is the number of its child pages in the sitemap, the fifth column is the MSRA doc id of its parent page (-1 indicates no parent page).

Mapping from MSRA doc id to TREC doc id

  • Link graph of Gov collection

Link graph. Each line is a hyperlink. The first column is the MSRA doc id of the source of the hyperlink, and the second column is the MSRA doc id of the destination of the hyperlink.Mapping from MSRA doc id to TREC doc id

Additional Notes

  • The old version of LETOR can be found here.
  • The following people contributed to the the construction of the LETOR dataset: Tao Qin, Tie-Yan Liu, Jun Xu, Chaoliang Zhong, Kang Ji, and Hang Li.
  • If you have any questions or suggestions with this version, please kindly let us know. Our goal is to make the dataset reliable and useful for the community.

Baselines

Algorithms with linear ranking function

TD2003 TD2004 NP2003 NP2004 HP2003 HP2004 OHSUMED Prediction files on test set Notes Experiments by
Regression here here here here here here here test scores Algorithm details Da Kuang
RankSVM here here here here here here here test scores Algorithm details Chaoliang Zhong
ListNet here here here here here here here test scores Algorithm details Da Kuang
AdaRank-MAP here here here here here here here test scores Algorithm details Chaoliang Zhong
AdaRank-NDCG here here here here here here here test scores Algorithm details Chaoliang Zhong
SVMMAP here here here here here here here not available Algorithm details Yisong Yue

Algorithms with nonlinear ranking function

TD2003 TD2004 NP2003 NP2004 HP2003 HP2004 OHSUMED Prediction files on test set Notes Experiments by
RankBoost here here here here here here here test scores Algorithm details Yong-Deok Kim
FRank here here here here here here here test scores Algorithm details Ming-Feng Tsai

Recently added algorithms (with linear ranking function)

Please note that the above experimental results are still primal, since the result of almost every algorithm can be further improved. For example, for regression, we can add regularization item to make it more robust; for RankSVM, we can run more steps of iteration so as to guarantee a better convergence of the optimization; for ListNet, we can also add regularization item to its loss function and make it more generalizable to the test set. Any updates about the above algorithms or new ranking algorithms are welcome. The following table lists the updated results of several algorithms (Regression and RankSVM) and a new algorithm SmoothRank.We would like to thank Dr. Olivier Chapelle and Prof. Thorsten Joachims for kindly contributing the results.

TD2003 TD2004 NP2003 NP2004 HP2003 HP2004 OHSUMED Prediction files on test set Notes Experiments by
Regression+L2 reg here here here here here here here Algorithm details Dr. Olivier Chapelle
RankSVM-Primal here here here here here here here Algorithm details Dr. Olivier Chapelle
RankSVM-Struct here here here here here here here Algorithm details Prof. Thorsten Joachims
SmoothRank here here here here here here here Algorithm details Dr. Olivier Chapelle

Summary of all algorithms and datasets

Excel file

How to compare with the baselines?

We note that different setting of experiments may greatly affect the performance of a ranking algorithm. To make fair comparisons, we encourage everyone to follow these common settings while using LETOR; deviations from these defaults must be noted when reporting results.

  • All reported algorithms use the “QueryLevelNorm” version of the datasets (i.e. query level normalization for feature processing). You are encouraged to use the same version and should indicate if you use a different one.
  • The test set cannot be used in any manner to make decisions about the structure or parameters of the model.
  • The validation set can only be used for model selection (setting hyper-parameters and model structure), but cannot be used for learning. Most baselines released in LETOR website use MAP on the validation set for model selection; you are encouraged to use the same strategy and should indicate if you use a different one.
  • All reported results must use the provided evaluation utility. While using the evaluation script, please use the original dataset. The evaluation tool (Eval-Score-3.0.pl) sorts the documents with same ranking scores according to their input order. That is, it is sensitive to the document order in the input file.
  • Please explicitly show the function class of ranking models (e.g. linear model, two layer neural net, or decision trees) in your work.

Additional Notes

  • The prediction score files on test set can be viewed by any text editor such as notepad.
  • More algorithms will be added in future.
  • If you would be like to publish the results of your algorithm here, please let us know

Download

To use the datasets, you must read and accept the online agreement. By using the datasets, you agree to be bound by the terms of its license.

Update: Due to website update, all the datasets are moved to cloud (hosted on OneDrive) and can be downloaded here. You can get the file name as below and find the corresponding file in OneDrive. Please contact {taoqin AT microsoft DOT com} if any questions.

Download

“Gov.rar”,  the .Gov dataset (about 1G),

“OHSUMED.rar”, the OHSUMED dataset (about 30M),

and “EvaluationTool.zip”, the evaluation tools (about 400k).

Resources

Papers

Because of the fast development of this area, it is difficult to keep the list up-to-date and comprehensive. For a comprehensive list and more recent papers, please refer to

If your paper is not listed, please let us know taoqin@microsoft.com.

  1. James Petterson, Tiberio Caetano, Julian McAuley and Jin Yu. Exponential Family Graph Matching and Ranking. In NIPS 2009.
  2. C. Rudin. The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List. Journal of Machine Learning Research, 10 (2009) 2233-2271.
  3. C. Rudin, and R. Schapire. Margin-Based Ranking and an Equivalence Between AdaBoost and RankBoost. Journal of Machine Learning Research, 10 (2009) 2193-2232.
  4. C. Rudin, R. Passonneau, A. Radeva, H. Dutta, S. Ierome, and D. Isaac. A Process for Predicting Manhole Events in Manhattan. To appear, Machine Learning, 2010.
  5. S. Agarwal, T. Graepel, T. Herbrich, S. Har-Peled, and D. Roth. Generalization bounds for the area under the roc curve. Journal of Machine Learning, 6:393-425, 2005.
  6. S. Agarwal and P. Niyogi. Stability and generalization of bipartite ranking algorithms. In COLT 2005, pages 32-47, 2005.
  7. E. Agichtein, E. Brill, S. T. Dumais, and R. Ragno. Learning user interaction models for predicting web search result preferences. In SIGIR 2006, pages 3-10, 2006.
  8. N. Ailon and MehryarMohri. An efficient reduction from ranking to classification. In COLT 2008, 2008.
  9. H. Almeida, M. Goncalves, M. Cristo, and P. Calado. A combined component approach for finding collection-adapted ranking functions based on genetic programming. In SIGIR 2007, pages 399-406, 2007.
  10. M.-R. Amini, T.-V. Truong, and C. Goutte. A boosting algorithm for learning bipartite ranking functions with partially labeled data. In SIGIR 2008, pages 99-106, 2008.
  11. M.-F. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. B. Sorkin. Robust reductions from ranking to classification. In COLT 2007, 2007.
  12. B. Bartell, G. W. Cottrell, and R. Belew. Learning to retrieve information. In SCC 1995, 1995.
  13. C. J. Burges, R. Ragno, and Q. V. Le. Learning to rank with nonsmooth cost functions. In NIPS 2006, pages 395-402, 2006.
  14. C. J. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML 2005, pages 89-96, 2005.
  15. G. Cao, J. Nie, L. Si, J. Bai, Learning to Rank Documents for Ad-Hoc Retrieval with Regularized Models, SIGIR 2007 workshop: Learning to Rank for Information Retrieval, 2007
  16. Y. Cao, J. Xu, T.-Y. Liu, H. Li, Y. Huang, and H.-W. Hon. Adapting ranking svm to document retrieval. In SIGIR 2006, pages 186-193, 2006.
  17. Z. Cao, T. Qin, T.-Y. Liu, M.-F. Tsai, and H. Li. Learning to rank: from pairwise approach to listwise approach. In ICML 2007, pages 129-136, 2007.
  18. V. R. Carvalho, J. L. Elsas, W. W. Cohen, and J. G. Carbonell. A metalearningapproach for robust rank learning. In SIGIR 2008 workshop on Learning to Rank for Information Retrieval, 2008.
  19. S. Chakrabarti, R. Khanna, U. Sawant, and C. Bhattacharyya. Structured learning for non-smooth ranking losses. In SIGKDD 2008, pages 88-96, 2008.
  20. O. Chapelle, Q. Le, and A. Smola. Large margin optimization of ranking measures. In NIPS workshop on Machine Learning for Web Search 2007, 2007.
  21. W. Chu and Z. Ghahramani. Gaussian processes for ordinal regression. Journal of Machine Learning Research, 6:1019-1041, 2005.
  22. W. Chu and Z. Ghahramani. Preference learning with Gaussian processes. In ICML 2005, pages 137-144, 2005.
  23. W. Chu and S. S. Keerthi. New approaches to support vector ordinal regression. In ICML 2005, pages 145-152, 2005.
  24. S. Clemenson, G. Lugosi, and N. Vayatis. Ranking and scoring using empirical risk minimization. In COLT 2005, 2005.
  25. W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. In NIPS 1998, volume 10, pages 243-270, 1998.
  26. C. Cortes, M. Mohri, and etc. Magnitude-preserving ranking algorithms. In ICML 2007, pages 169-176, 2007.
  27. D. Cossock and T. Zhang. Subset ranking using regression. In COLT 2006, pages 605-619, 2006.
  28. K. Crammer and Y. Singer. Pranking with ranking. In NIPS 2002, pages 641-647, 2002.
  29. K. Duh and K. Kirchhoff. Learning to rank with partially-labeled data. In SIGIR 2008, pages 251-258, 2008.
  30. W. Fan, E. A. Fox, P. Pathak, and H. Wu. The effects of fitness functions on genetic programming based ranking discovery for web search. Journal of American Society for Information Science and Technology, 55(7):628-636, 2004.
  31. W. Fan, M. Gordon, and P. Pathak. Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Transactions on Knowledge and Data Engineering, 16(4):523-527, 2004.
  32. W. Fan, M. Gordon, and P. Pathak. A generic ranking function discovery framework by genetic programming for information retrieval. Information Processing and Management, 40(4):587-602, 2004.
  33. W. Fan, M. Gordon, and P. Pathak. Genetic programming-based discovery of ranking functions for effective web search. Journal of Management of Information Systems, 21(4):37-56, 2005.
  34. W. Fan, M. Gordon, and P. Pathak. On linear mixture of expert approaches to information retrieval. Decision Support System, 42(2):975-987, 2006.
  35. W. Fan, M. D. Gordon, W. Xi, and E. A. Fox. Ranking function optimization for effective web search by genetic programming: an empirical study. In HICSS 2004, page 40105, 2004.
  36. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933-969, 2003.
  37. N. Fuhr. Optimum polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Information Systems, 7(3):183-204, 1989.
  38. G. Fung, R. Rosales, and B. Krishnapuram, Learning Rankings via Convex Hull Separation, NIPS 2005 workshop on Learning to Rank, 2005
  39. J. Gao, H. Qi, X. Xia, and J. Nie. Linear discriminant model for information retrieval. In SIGIR 2005, pages 290-297, 2005.
  40. X.-B. Geng, T.-Y. Liu, and T. Qin. Feature selection for ranking. In SIGIR 2007, pages 407-414, 2007.
  41. X.-B. Geng, T.-Y. Liu, T. Qin, H. Li, and H.-Y. Shum. Query-dependent ranking using k-nearest neighbor. In SIGIR 2008, pages 115-122, 2008.
  42. J. Guiver and E. Snelson. Learning to rank with softrank and gaussian processes. In SIGIR 2008, pages 259-266, 2008.
  43. E. F. Harrington. Online ranking/collaborative filtering using the perceptron algorithm. In ICML 2003, pages 250-257, 2003.
  44. R. Herbrich, K. Obermayer, and T. Graepel. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115-132, 2000.
  45. R. Jin, H. Valizadegan, and H. Li. Ranking refinement and its application to information retrieval. In WWW 2008, pages 397-406, 2008.
  46. T. Joachims. Optimizing search engines using clickthrough data. In KDD 2002, pages 133-142, 2002.
  47. T. Joachims. A support vector method for multivariate performance measures. In ICML 2005, pages 377-384, 2005.
  48. S. Kramer, G. Widmer, B. Pfahringer, and M. D. Groeve. Prediction of ordinal classes using regression trees. Funfamenta Informaticae, 34:1-15, 2000.
  49. J. Lafferty and C. Zhai. Document language models, query models and risk minimization for information retrieval. In SIGIR 2001, pages 111-119, 2001.
  50. Y. Lan, T.-Y. Liu, T. Qin, Z. Ma, and H. Li. Query-level stability and generalization in learning to rank. In ICML 2008, pages 512-519, 2008.
  51. G. Lebanon and J. Lafferty. Cranking: Combining rankings using conditional probability models on permutations. In ICML 2002, pages 363-370, 2002.
  52. P. Li, C. Burges, and Q. Wu. Mcrank: Learning to rank using multiple classification and gradient boosting. In NIPS 2007, 2007.
  53. T.-Y. Liu, J. Xu, T. Qin, W.-Y. Xiong, and H. Li. LETOR: Benchmark dataset for research on learning to rank for information retrieval. In SIGIR ’07 Workshop on learning to rank for information retrieval, 2007.
  54. Y. Liu, T.-Y. Liu, T. Qin, Z. Ma, and H. Li. Supervised rank aggregation. In WWW 2007, pages 481-490, 2007.
  55. I. Matveeva, C. Burges, T. Burkard, A. Laucius, and L. Wong. High accuracy retrieval with multiple nested ranker. In SIGIR 2006, pages 437-444, 2006.
  56. D. A. Metzler and W. B. Croft. A Markov random field model for term dependencies. In SIGIR 2005, pages 472-479, 2005.
  57. D. A. Metzler, W. B. Croft, and A. McCallum. Direct maximization of rank based metrics for information retrieval. In CIIR Technical Report, 2005.
  58. D. A. Metzler and T. Kanungo. Machine learned sentence selection strategies for query-biased summarization. In SIGIR 2008 workshop on Learning to Rank for Information Retrieval, 2008.
  59. T. Minka and S. Robertson. Selection bias in the LETOR datasets. In SIGIR 2008 workshop on Learning to Rank for Information Retrieval, 2008.
  60. R. Nallapati. Discriminative models for information retrieval. In SIGIR 2004, pages 64-71, 2004.
  61. T. Pahikkala, E. Tsivtsivadze, A. Airola, J. Boberg, T. Salakoski, Learning to Rank with Pairwise Regularized Least-Squares, SIGIR 2007 workshop: Learning to Rank for Information Retrieval, 2007
  62. L. Rigutini, T. Papini, M. Maggini, and F. Scarselli. Learning to rank by a neural-based sorting algorithm. In SIGIR 2008 workshop on Learning to Rank for Information Retrieval, 2008.
  63. T. Qin, T.-Y. Liu, W. Lai, X.-D. Zhang, D.-S.Wang, and H. Li. Ranking with multiple hyperplanes. In SIGIR 2007, pages 279-286, 2007.
  64. T. Qin, T.-Y. Liu, M.-F. Tsai, X.-D. Zhang, and H. Li. Learning to search web pages with query-level loss functions. Technical Report, MSR-TR-2006-156, 2006.
  65. T. Qin, T.-Y. Liu, X.-D. Zhang, D. Wang, and H. Li. Learning to rank relational objects and its application to web search. In WWW 2008, pages407-416, 2008.
  66. T. Qin, T.-Y. Liu, X.-D. Zhang, D.-S. Wang, and H. Li. Global ranking using continuous conditional random fields. In NIPS 2008, 2008.
  67. T. Qin, T.-Y. L. X.-D. Zhang, M.-F. Tsai, D.-S. Wang, and H. Li. Query-level loss functions for information retrieval. Information Processing & Management, 44(2):838-855, 2007.
  68. T. Qin, T.-Y. Liu, J. Xu, and H. Li. How to make LETOR more useful and reliable. In SIGIR 2008 workshop on Learning to Rank for Information Retrieval, 2008.
  69. F. Radlinski and T. Joachims. Query chain: Learning to rank from implicit feedback. In KDD 2005, pages 239-248, 2005.
  70. F. Radlinski and T. Joachims. Active exploration for learning rankings from clickthrough data. In KDD 2007, 2007.
  71. F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with multi-armed bandits. In ICML 2008, pages 784-791, 2008.
  72. S. Rajaram and S. Agarwal. Generalization bounds for k-partite ranking. In NIPS 2005 WorkShop on Learning to Rank, 2005.
  73. S. Robertson and H. Zaragoza. On rank-based effectiveness measures and optimization. Information Retrieval, 10(3):321-339, 2007.
  74. C. Rudin, C. Cortes, M. Mohri, and R. E. Schapire, Margin-Based Ranking Meets Boosting in the Middle, COLT 2005.
  75. A. Shashua and A. Levin. Ranking with large margin principles: Two approaches. In NIPS 2002, pages 937-944, 2002.
  76. M. Talyor, J. Guiver, and etc. Softrank: Optimising non-smooth rank metrics. In WSDM 2008, pages 77-86, 2008.
  77. M. Taylor, H. Zaragoza, N. Craswell, S. Robertson, and C. J. Burges. Optimisation methods for ranking functions with multiple parameters. In CIKM 2006, pages 585-593, 2006.
  78. A. Trotman. Learning to rank. Information Retrieval, 8(3):359-381, 2005.
  79. M.-F. Tsai, T.-Y. Liu, T. Qin, H.-H. Chen, and W.-Y. Ma. Frank: a ranking method with fidelity loss. In SIGIR 2007, pages 383-390, 2007.
  80. A. Veloso, H. M. de Almeida, M. A. Gon?alves, and W. M. Jr. Learning to rank at query-time using association rules. In SIGIR 2008, pages 267-274, 2008.
  81. N. Usunier, V. Truong, M. R. Amini, and P. Gallinari, Ranking with Unlabeled Data: A First Study, NIPS 2005 workshop:Learning to Rank, 2005.
  82. W. Xi, J. Lind, and E. Brill, Learning effective ranking functions for newsgroup search, SIGIR 2004.
  83. F. Xia, T.-Y. Liu, J. Wang, W. Zhang, and H. Li. Listwise approach to learning to rank – theorem and algorithm. In ICML 2008, pages 1192-1199, 2008.
  84. J. Xu, Y. Cao, H. Li, and Y. Huang. Cost-sensitive learning of SVM for ranking. In ECML 2006, pages 833-840, 2006.
  85. J. Xu and H. Li. Adarank: a boosting algorithm for information retrieval. In SIGIR 2007, pages 391-398, 2007.
  86. J. Xu, T.-Y. Liu, M. Lu, H. Li, and W.-Y. Ma. Directly optimizing IR evaluation measures in learning to rank. In SIGIR 2008, pages 107-114, 2008.
  87. J.-Y. Yeh, J.-Y. Lin, and etc. Learning to rank for information retrieval using genetic programming. In LR4IR 2007, 2007.
  88. H. Yu. SVM selective sampling for ranking with application to data retrieval. In KDD 2005, pages 354-363, 2005.
  89. Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vector method for optimizing average precision. In SIGIR 2007, pages 271-278, 2007.
  90. Y. Yue and T. Joachims. Predicting diverse subsets using structural SVM. In ICML 2008, pages 1224-1231, 2008.
  91. C. Zhai and J. Lafferty. A risk minimization framework for information retrieval. Information Processing and Management, 42(1):31-55, 2006.
  92. Z. Zheng, K. Chen, G. Sun, and H. Zha. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR 2007, pages 287-294, 2007.
  93. Z. Zheng, H. Zha, and G. Sun. Query-level learning to rank using isotonic regression. In SIGIR 2008 workshop on Learning to Rank for Information Retrieval, 2008.
  94. Z. Zheng, H. Zha, and etc. A general boosting method and its application to learning ranking functions for web search. In NIPS 2007, 2007.
  95. K. Zhou, G.-R. Xue, H. Zha, and Y. Yu. Learning to rank with ties. In SIGIR 2008, pages 275-282, 2008.
  96. O. Zoeter, M. Taylor, E. Snelson, J. Guiver, N. Craswell, and M. Szummer. A decision theoretic framework for ranking using implicit feedback. In SIGIR 2008 workshop on Learning to Rank for Information Retrieval, 2008.
  97. Jonathan L. Elsas, Vitor R. Carvalho, Jaime G. Carbonell. “Fast Learning of Document Ranking Functions with the Committee Perceptron,” Proceedings of the First ACM International Conference on Web Search and Data Mining (WSDM 2008), 2008.
  98. Ronan Cummins and Colm O’Riordan. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions. Artificial Intelligence Review Journal.
  99. Ronan Cummins and Colm O’Riordan. Evolving local and global weighting schemes in information retrieval. Journal of Information Retrieval.

Tutorials

Tutorial talks

Tutorial papers/books

Events

 

 

Other

Community

Research Groups

Learning to rank has become a hot research topics in recent years. The following research groups are very active in this field.

If you want to add your own group to this list, please send email to letor@microsoft.com with the name of your group and a brief description.

Tech Blogs

Contact Us

LETOR Team

Tao Qin, Tie-Yan Liu, Jun Xu, and Hang Li.

Learning to Rank Project, Microsoft Research Asia

Email: letor@microsoft.com

MSRA