MIND News Recommendation Challenge

MIND News Recommendation Challenge

About

MIND probe: A competition on news recommendations with the world’s biggest news dataset

Put a battery-operated talking unicorn in your online shopping cart, and you may get an alert suggesting some AA cells to juice up the conversation. Binge-watch a few action movies and you may see titles of martial arts cinema fill up your must-see list.

Recommendation engines try to discern habits, likes, and other affinity traits to anticipate what you may need or want based on past actions. News consumption can fall into these patterns: We know for instance that people go to search engines to find out more about a story, be it more background or further developments. Certain types of news and feature stories lend itself to typical user behaviors: An impending hurricane, for instance, triggers preparation research (if you’re in the path), offers of help to donate supplies or blood (if you’re nearby) and historical curiosity of past hurricanes. Even celebrity stories inspire certain common impulses: An engagement announcement will launch some to seek a peek at the ring, others to check out past (failed) relationships.

How might a news recommendation system offer more stories, yet not fall into the trap of filter bubbles and echo chambers? The first thing is, you need a high-quality benchmark dataset. That’s where MIND comes in: a mammoth collection of anonymized data from user behavior logs of about 1 million people. Few companies in the world attract those kinds of numbers, and Microsoft News is one of them.

When you work at the scale that Microsoft News does — in 140 countries around the world — the challenge is not to overload its half billion readers. In the not-so-distant past, newspaper print space and radio & TV time constrained how news was reported, displayed and ranked. When the Internet smashed conventions of information delivery, audiences had to assume the responsibility for their own news diet. That diet can be hard to maintain when thousands of news stories come at people every day — especially when you throw in social media.

And while people want to seek news on their own, they also expect that information relevant to their locality (local news), habits and interests will find their way. The challenge becomes shared amongst the newsroom creators, the distribution channels and the readers.

Let the MIND Games begin

Our data scientists and engineering team behind MIND presented this dataset at the prestigious Association for Computation Linguistics 2020 conference. There, they covered their white paper, MIND: A Large-scale Dataset for News Recommendation.

Like all researchers, they’re eager to share: The team has issued a call out to the science world and news publisher to dive into this dataset and come up with ways to rank news articles that align with users’ interests. How might the dataset be used to study the performance of different kinds of models? Might metrics be to evaluate a recommendation’s effectiveness include diversity, “political” balance, a dose of serendipity?

And just to make it interesting, #cashprizes. Grand prize is $10,000, with two second-place prizes at $3,000 and four third-place prizes at $1,000.

Competition registration opens July 20, after which participants can download the training dataset to test their algorithms. The actual competition starts August 21, when the real test data will be made available. Technical reports are encouraged but not required, although the winning entries will be published online and presented at a future workshop. The deadline is midnight September 4 (UTC).

The dataset is free to download for research purposes on MIND Website, and baseline algorithms are available on Microsoft Recommenders repository.  There are two offerings of the dataset: the whole kit’n’caboodle and a smaller subset of 50,000 users called MIND-small, but the goals the Microsoft data scientists aim to achieve aren’t small-minded at all.

Timeline

July 20, 2020: Competition is open. Participants registration begins.
July 20 – August 20, 2020: Dev phase. Participants can submit their results on the dev set to obtain official evaluation scores.
August 21 – September 4, 2020: Test phase. Test data can be downloaded and participants can submit their results on the test set.
September 11, 2020: Competition results announcement.

*All deadlines are at 11:59 PM UTC on the corresponding day.

Prizes

Within 7 days following the Entry Period one Grand, two Second Place and four Third Place participants will be selected from among all eligible entries received.

  • Grand Prize: The winner will receive US $10,000.
  • Second Place Prizes: Each winner will receive US $3,000.
  • Third Place Prizes: Each winner will receive US $1,000.

In addition, we invite each winner to submit a system description paper and present your work after the end of the competition (other participants’ submissions are also highly encouraged).

Organizing team

Who is behind this competition?

This competition is collaboratively organized by Microsoft News and Microsoft Research Asia teams.

Many people are helping to make this happen, but the core team consists of:

For any inquiries please see our FAQ section or send us an email.

FAQ

Before you participate in the competition, please read the rules of this competition and confirm that you agree to them (you need to send your agreement in your registration email). In addition, the MIND dataset is free to download for research purposes under Microsoft Research License Terms. Please read these terms and confirm that you agree to them before you download the dataset. Feel free to contact us if you have any questions or need clarification regarding the rules of this competition or the licensing of the data.

How do I participate?

Go to our competition page and click “Join Competition”. You will be directed to the CodaLab platform. From there, please follow the detailed steps under the “Overview” tab on the CodaLab competition page.

How do I submit my entry to the competition?

Please follow the detailed steps under the “Submission Guidelines” tab on the competition page. We may update details throughout the development and final test phases.

Do you have some example code or a smaller dataset for trial?

Please find more info on our dataset website. You can find a smaller dataset for trial, and you can download the evaluation script here: evaluation.py

Can I publish results of my experiments?

Yes, we encourage academic publications relating to the dataset and competition. If you do decide to publish results, please cite MIND as:

@inproceedings{wu-etal-2020-mind,
title = "{MIND}: A Large-scale Dataset for News Recommendation",
author = "Wu, Fangzhao and Qiao, Ying and Chen, Jiun-Hung and Wu, Chuhan and Qi, Tao and Lian, Jianxun and Liu, Danyang and Xie, Xing and Gao, Jianfeng and Wu, Winnie and Zhou, Ming",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.331",
pages = "3597--3606",
}

Where can I get help?

For any inquiries please see our FAQ section or send us an email.