MIND probe: A competition on news recommendations with the world’s biggest news dataset
Put a battery-operated talking unicorn in your online shopping cart, and you may get an alert suggesting some AA cells to juice up the conversation. Binge-watch a few action movies and you may see titles of martial arts cinema fill up your must-see list.
Recommendation engines try to discern habits, likes, and other affinity traits to anticipate what you may need or want based on past actions. News consumption can fall into these patterns: We know for instance that people go to search engines to find out more about a story, be it more background or further developments. Certain types of news and feature stories lend itself to typical user behaviors: An impending hurricane, for instance, triggers preparation research (if you’re in the path), offers of help to donate supplies or blood (if you’re nearby) and historical curiosity of past hurricanes. Even celebrity stories inspire certain common impulses: An engagement announcement will launch some to seek a peek at the ring, others to check out past (failed) relationships.
How might a news recommendation system offer more stories, yet not fall into the trap of filter bubbles and echo chambers? The first thing is, you need a high-quality benchmark dataset. That’s where MIND comes in: a mammoth collection of anonymized data from user behavior logs of about 1 million people. Few companies in the world attract those kinds of numbers, and Microsoft News is one of them.
When you work at the scale that Microsoft News does — in 140 countries around the world — the challenge is not to overload its half billion readers. In the not-so-distant past, newspaper print space and radio & TV time constrained how news was reported, displayed and ranked. When the Internet smashed conventions of information delivery, audiences had to assume the responsibility for their own news diet. That diet can be hard to maintain when thousands of news stories come at people every day — especially when you throw in social media.
And while people want to seek news on their own, they also expect that information relevant to their locality (local news), habits and interests will find their way. The challenge becomes shared amongst the newsroom creators, the distribution channels and the readers.
Let the MIND Games begin
Our data scientists and engineering team behind MIND presented this dataset at the prestigious Association for Computation Linguistics 2020 conference. There, they covered their white paper, MIND: A Large-scale Dataset for News Recommendation.
Like all researchers, they’re eager to share: The team has issued a call out to the science world and news publisher to dive into this dataset and come up with ways to rank news articles that align with users’ interests. How might the dataset be used to study the performance of different kinds of models? Might metrics be to evaluate a recommendation’s effectiveness include diversity, “political” balance, a dose of serendipity?
And just to make it interesting, #cashprizes. Grand prize is $10,000, with two second-place prizes at $3,000 and four third-place prizes at $1,000.
Competition registration opens July 20, after which participants can download the training dataset to test their algorithms. The actual competition starts August 21, when the real test data will be made available. Technical reports are encouraged but not required, although the winning entries will be published online and presented at a future workshop. The deadline is midnight September 4 (UTC).
The dataset is free to download for research purposes on MIND Website, and baseline algorithms are available on Microsoft Recommenders repository. There are two offerings of the dataset: the whole kit’n’caboodle and a smaller subset of 50,000 users called MIND-small, but the goals the Microsoft data scientists aim to achieve aren’t small-minded at all.