MSR Image Recognition Challenge (IRC) @ IEEE ICME 2016

Established: February 23, 2016


MSR Image Recognition Challenge (IRC) @ IEEE ICME 2016 (past)

ICME 2016 Image Recognition Grand Challenge Session:

Time: 10:00-11:40, Wednesday, July 13, 2016 
Room: Grand III

  • Deep Multi-Context Network for Fine-Grained Visual Recognition

Xinyu Ou1,2,3, Zhen Wei2,4, Hefei Ling1, Si Liu2, Xiaochun Cao2

1Huazhong University of Science and Technology

2Chinese Academy of Science

3Yunnan Open University

4University of Electronic Science and Technology of China

  • Ensemble Deep Neural Networks for Domain-Specific Image Recognition

Wenbo Li, Chuan Ke

Chinese Academy of Science

  • Improve Dog Recognition by Mining More Information from Both Click-through Logs and Pre-Trained Models

Guotian Xie1, Kuiyuan Yang2, Yalong Bai3, Min Shang4, Yong Rui2, Jianhuang Lai1

1Sun Yat-Sen University

2Microsoft Research

3Harbin Institute of Technology

4Tsinghua University

  • Learning to Recognition from Bing Clickture Data

Chenghua Li, Qiang Song, Yuhang Wang, Hang Song, Qi Kang, Cheng Jian, Hanqing Lu, Chinese Academy of Science

Important Updates

  • We have finished this evaluation! More details below. You can find information about current and past challenges here.
  • The dataset for this challenge is described here and can be downloaded here.
  • Feb 23, 2016: Registration web site is opened.
  • Feb 23, 2016: ICME site is open, please register a placeholder for your final report: select Track =”Grand Challenges” and select Subject Area = “MSR Grand Challenge: Image Recognition Challenge”
  • Feb 24, 2016: Update about data sets, and faq
  • Feb 26, 2016: Update about sample codes, and faq
  • March 3, 2016: Update about test tool, team keys, and faq
  • March 7~10, 2016: Dry run traffic sent to your system for testing/verification, and faq
  • March 10, 2016: Update about final evaluation and faq
  • March 14, 2016: Evaluation started, please keep your system running stably
  • March 16, 2016: Evaluation ends (0:00am PDT)
  • March 21, 2016: Evaluation results announced (see the rank table below)
  • April 3, 2016: Grand Challenge Paper and Data Submission
  • April 28: Paper acceptance notification
  • May 13: Paper camera ready version due


There are 30+ teams registered for IRC@ICME2016, 10+ teams successfully connected to IRC Gateway and finished the evaluation. Comparing to last IRC@MM2015, much more (3X) teams participated to tackle the challenge. We see good progress on both participations and benchmark metrics. Thank you all for the strong support and congratulations for your accomplishment!

Team ybt_bj (TeamID=16) is a team from Microsoft Research Asia. To be fair, their result will be evaluated, but not awarded, although they didn’t leverage any internal data. Thanks team ybt_bj and all other teams for the understanding.

We encourage all participants to submit papers to ICME Grand Challenge track to share your experiences. Although recognition accuracy is the focus for this evaluation, discussions about other aspects will also be highly appreciated, including, but not limit to, Data, Algorithm, Performance, etc. Please note: the paper submission site and deadline is Apr. 3.

Dog breed recognition is only one of the Image Recognition Grand Challenges. In the coming ACM Multimedia 2016, we are releasing a new MS-Celeb-1M dataset and challenging the community with recognizing one million celebrities on the web. Please stay tuned!

Update Details

Update 3/16/2016: final evaluation finished

The final evaluation has been finished on 3/16 0:00 PDT. Among the 31 registered teams, 11 teams have successfully connected and finished the task during the evaluation time window. We are now calculating the recognition accuracies and will send them out shortly after careful verification. The final ranks will be announced on 3/23, as scheduled.

Again, thank you all for your great efforts in the past several weeks. Here is a bonus for all of you: “an online dog recognizer demo, of your own algorithm”!

Please replace “YourServiceGUID” in above link with your classifier’s serviceGUID ( NOT the providerGUID). Open the link on your browser, or any mobile phone, you can now take a picture, or upload an image to your dog recognizer and see its results immediately.

(Tip: You can also use or any other online services to convert your demo URL to a shorter/easy-to-remember URL, e.g. which brings you to an example dog recognizer).

We know that you have spent a lot of time to tune the algorithm and build the system. Now it is time to show off your work to your family, friends, and colleagues, and the whole world.

(Please use the CommandLineTool to verify that your classifier is running, so it is able to response to the demo requests. We are hosting the IRC gateway as a free service, for research/demo purpose only. There is no SLA guaranteed. The Microsoft IRC Organization Team reserve the rights to disconnect any classifiers if suspicious behavior is identified. )

Update 3/14/2016: final evaluation started

We are now sending the evaluation traffic to the classifiers connected to IRC gateway, so far the progress is pretty good. Thank you all for the great efforts!

Please keep your classifiers running during the evaluation time window 3/14 0:00PDT and 3/16 0:00 PDT. We will send several rounds of evaluation traffic to ensure no connection/timeout issues. Please don’t change your classifier during evaluation, which may affect your final score calculation. We will send out the notification once the evaluation is done.

Update 3/10/2016: remarks for final evaluation

There are 30 teams participating the competition worldwide this year. Many teams have successfully connected their dog classifiers to the IRC gateway and returned reasonable results, which indicate that they are ready for the final evaluation. Thank you for the great efforts!

Here are some remarks for the final evaluation:

The final evaluation will be conducted between 3/14 and 3/16 (PDT , i.e. US West Time Zone). Please note that the evaluation traffic may reach your system any time during this time period. So, please keep your system running stably. During the evaluation, we will send test images one by one to your system through IRC gateway and record your recognition response. Then we compare them with the ground truth to calculate accuracy@1 and accuracy@5.

About 10K images in total will be sent to your classifier. If your classifier is slow (i.e., the CommandLineTool spent more than 2 seconds to recognize an image), you may want to consider running multiple recognizer instances on single or multiple machines. IRC gateway will be able to send more requests concurrently to improve the overall throughput. You don’t need to do anything else for load balance.

Besides, as we mentioned before, if you used extra data other than the provided Clickture-dog data for training, please describe them to us before the final evaluation starts.

Update 3/7/2016: dry run traffic started

During March 7 and March 10, we are sending some random test traffic to the classifiers connected to IRC gateway. So, please

  1. Keep your dog classifier up and running so that it can process incoming recognition requests. You can:
    1. use the CommandLineTool we shared earlier to ping your recognizer from time to time;
    2. or, add the classifier as a background service, so it always run after machine restart and crashes;
    3. or, build a guardian process to monitor the your classifier program and restart it when necessary;
  2. Observe and fix issues of your dog classifier, including:
    1. crash/Freeze;
    2. overload (e.g. high CPU/memory utilization, etc.);
    3. other unexpected behaviors;

We will also contact you directly if we found unexpected response from your classifiers. Please note, we will only check the response format, NOT the recognition accuracy.

Update 3/3/2016: test CommandLineTool and team Guids

In our last update, we shared you the Windows/Linux sample codes to register your classifiers to IRC gateway, which turns your classifier into a globally accessible service. So that we can send test requests to you and evaluate your algorithms. Please follow the steps below to change this sample code, register your recognizer, and verify it is working, before the dry-run started.

  1. Change the sample codes to identify your classifier:

(1) Following lines need to be customized in the sample codes:


(2) Please re-compile the sample codes after above changes and start it. Then you can use the tool described in next section to test it.

  1. Use the CommandLine tool shared at here to verify your classifier (under Windows):

(1) ping IRC gateway, to verify it is up and running:

CommandLineTool.exe Ping -pHub

You are expected to see the response like this: Checked and received expected response!

(2) ping the classifier you registered in step#1, please note the serviceGuid we sent you are for your classifier only, don’t disclose it to anybody else.

CommandLineTool.exe Check -pHub -serviceGuid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

You are expected to see the response like this: pinged and verified classifier xxxx is available, you can now send test request to it.

(3) send a single image and confirm the recognition response (format/value) is expected

CommandLineTool.exe Recog -pHub -serviceGuid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -providerGuid xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx -infile c:\dog.jpg

You are expected to see the response like this:

Response: tag1:0.95;tag2:0.32;tag3:0.05;tag4:0.04;tag5:0.01
  1. Please make sure that your recognition result string follow the format shown in the sample code: “tag1:0.95;tag2:0.32;tag3:0.05;tag4:0.04;tag5:0.01” , where

(1) Replace tag1,tag2…tag5 with the predicted dog breed name, which should be the same as the dog breed labels in Clickture-Dog dataset, we will use it to match our groundtruth;

(2) Replace the numbers after each tag with your prediction confidence;

(3) We will only evaluate your first 5 results, so please sort them by confidence score;

If you have any questions during above steps, please send us the changed code in step1, the command line you used in step2, together with the screenshot of the unexpected response. We will work with you on troubleshooting.

Update 2/29/2016: update about sample codes

Some DLL files in the sample codes are corrupted, Please re-download the zip file if you encounter any build issues.

Update 2/26/2016: sample codes

As we mentioned earlier, we will use an open multimedia hub (Prajna Hub) for evaluation. That is, you need to register your recognizer to Prajna Hub, which essentially turns your recognition program to a cloud service. Then your algorithm can be evaluated remotely. The benefit of using Prajna Hub is to bring your recognizer to cloud and make it readily accessible by public users, e.g. mobile apps. Note that your algorithm is still running on your local machine and you have full controls on it.

To help you register your recognizers, we have written some sample codes for both Windows and Linux operating systems. You can download the codes from and refer to the samples to choose the right way to integrate your recognizer. Please see Readme.txt under each project folder to get more details.

We will send you separately a GUID representing your team’s ID, which is needed when you register your recognizer. Please do use your own GUID to replace the default one “EA73374D-FFBD-466E-90CE-4C8CFB4BF0CE” in the sample code.

We will also share you a test tool (an EXE file) shortly to let you check if you have registered your recognizer successfully or not.

Update 2/24/2016: about datasets

Thanks again for your interests of MSR-Bing IRC @ ICME2016! The training data sets are ready for download at here. The complete Clickture Dataset can be downloaded from here. Both of them are shared from Azure Blob Storage. You can use any web browser, or use Azure SDK, or 3rd party Azure clients (e.g. CloudBerry Explorer for Azure Blob Storage) to download the same data files (.tsv files).

Training Set: The Clickture-Dog dataset contains ~95K entries related to 344 dog breeds, extracted from Clickture dataset. Please note: The Clickture-Dog is just one way to collect training data. You can also filter the Clicture-Full dataset yourself to find more data. You can use above data sets to train/tune your algorithms. Again, If you used data sets other than Clickture during training, you must disclose it clearly in your submission. So that the results can be fairly compared among all teams.

Evaluation Set: We encourage participants to train a recognizer that can recognize as many as possible dog breeds. The evaluation set will include ~100 categories, randomly sampled from the above 344 dog breeds, based on their popularity.

Trial Set: (Next) We will release the trial data/code, for all the participants to verify your algorithm, system and the result format.

Please refer to FAQ page for answers to some frequently asked questions.


With the success of previous MSR Image Retrieval/Recognition Challenges (MSR IRC) at IEEE International Conference on Multimedia and Expo (ICME) 2014 and 2015, Microsoft Research is happy to announce MSR IRC at ICME 2016, based on real-world large scale dataset, and open evaluation system.

Thanks to the advance of deep learning algorithms, great progresses have been made in visual recognition in the past several years. But, there is still a big gap from these academic innovations and practical intelligent services, due to the lack of: (1) real-world large scale data with better quality for training and evaluation. (2) public platform to conduct fair, efficient evaluations and make the recognition result reproduceable and accessible.

To further motivate and challenge the academic and industrial research community, Microsoft has released Clickture, a large-scale real-world image click data to public, based on search engine log. The dataset contains 40M images labelled with query terms and click counts, which can be used to train and evaluate both image recognition and retrieval algorithms.

Moreover, Microsoft Research has developed Prajna Hub, an open multimedia gateway, to convert latest algorithms into online services that can be accessed by anybody, from anywhere, and make the evaluation/test results reproduceable and comparable.

By participating in this challenge, you can:

  • Leverage “unlimited” click data to mine and model semantics;
  • Try out your image recognition system using real world data;
  • See how it compares to the rest of the community’s entries;
  • Get to be a contender for ICME 2016 Grand Challenge;


This year we will focus on visual recognition task. The contestants are asked to develop image recognition system based on the datasets provided by the Challenge (as training data) and any other public/private data to recognize a wide range of object, scene, event, etc., in the images. For the evaluation purpose, we will use dog breeds for this year’s topic. A contesting system is asked to produce 5 labels for each of the test images, ordered by confidence scores. Top one and five accuracies will be evaluated against a pre-labeled image dataset, which will be used during evaluation stage.


The data is based on queries received at Bing Image Search in the EN-US market and comprises two parts: (1) the Clickture-Full dataset is a sample of Bing user click log, and (2), the Clickture-Dog dataset is a subset of Clickture-Full, which contains images related to dog breed queries. The two datasets are intended for contestants’ debugging and evaluation. Below table shows the dataset statistics.

Data Set Name Total Size Image# Download Description Note
Clickture Full 600GB 40M link Full Clickture data Split to 60+ files (10GB each) for easy downloading
Clickture Dog 1.3G 95K link Dog related images in Clickture MSR Challenge on Image Recognition Datasets

Please see dataset document for more details about the dataset, including the links for downloading. A paper introducing this dataset can be found here.

Both above datasets are optional to be used. That is, systems that based on Clickture and/or any other private/public datasets will all be evaluated for final award, as long as the participants disclose the datasets they used.

Evaluation Metric

Top 1 and top 5 recognition accuracies over a test set will be used to evaluate the performance of the visual recognition systems. Final rank will be determined by top 5 recognition accuracy, i.e. P@5.


You’re encouraged to build generic system for recognizing a wide range of “things” (objects, scenes, events, etc.) However, for the evaluation purpose, we will use dog breeds for this year’s topic. The number of candidate labels will be relatively large, for example, may be a few hundreds, which will be provided to the participants for data filtering and training.

Please note: an open multimedia hub, Prajna Hub, will be used for the evaluation, which will turn your recognition program to a cloud service, so that your algorithm can be evaluated remotely. Similar methodology has been used in the last several IRCs and it was well-received. This time, we made it even easier, with extra bonus including:

  • Your recognizer will be readily accessible by public users, e.g. web pages, mobile apps. But the core recognition algorithm will still be running on your own machine/clusters (or any other public clusters if preferred), so that you always have full controls;
  • Sample codes for web/phone apps will also be available through open source, so that your recognition algorithms can be used across devices (PC/Tablet/Phone) and platforms ( WindowsPhone, Android, iOS). I.e., you will have a mobile app to demonstrate your dog breed recognizer, but you won’t need to writing mobile app codes or just need to make simple modifications.
  • Sample codes will be provided to help participant to convert your existing recognition algorithms to a cloud service, which can be accessed from anywhere in the world, with load balance and geo-redundancy;

The recognizer can be running on either Windows or Linux this year.


The Challenge is a team-based contest. Each team can have one or more members, and an individual can be a member of multiple teams. No two teams, however, can have more than 1/2 shared members. The team membership must be finalized and submitted to the organizer prior to the Final Challenge starting date.

At the end of the Final Challenge, all entries will be ranked based on the metrics described above. The top three teams will receive award certificates. At the same time, all accepted submissions are qualified for the conference’s grand challenge award competition.

Paper Submission

Please follow the guideline of ICME 2016 Grand Challenge for the corresponding paper submission.

Detailed Timeline

  • Now: Dataset available for download Clickture-Full and Clickture-Dog.
  • Now: Registration started
  • Jan. 1, 2016: Details about evaluation announced/delivered
  • March 1, 2016: Registration deadline
  • March 7, 2016: Dry run starts (trial requests sent to participants)
  • March 14, 2016: Evaluation starts (evaluation requests start at 0:00am PDT)
  • March 16, 2016: Evaluation ends (0:00am PDT)
  • March 23, 2016: Evaluation results announced.
  • April 3, 2016: Grand Challenge Paper and Data Submission

More information

Challenge Contacts

Questions related to this challenge should be directed to:


Here are some frequently asked questions regarding IRC and clickture-dog data:

Q: Can I use ImageNet or other data for network pre-training?

A: Yes, you can use pre-trained CNN models (by ImageNet or other dataset), as long as you can fine tune that well to fit for the dog breed categories. Basically, we treat these pre-trained CNN models as feature extraction layer. Your system/algorithm output will be evaluated and compared with other team’s results.

But, if you use some data other than Clickture dataset as positive/negative examples during the training/tuning, please describe this clearly and notify us before the evaluation starts. Your algorithm/system will still be evaluated, we are considering to rank these systems in separate tracks for fairness.

Q: We found there are some noise/conflicting labels in Clickure-Dog dataset. Is this expected?

A: You may find that there are ~15K images appear in multiple dog breeds. The reason is as follows: we use the clicked queries as the “groundtruth” of dog breeds, but sometimes the same images are returned/clicked by multiple dog breed queries. We intentionally to NOT remove these 15K images from the data. Because we treat this as an important/interest topic to automatically remove the “wrong/conflicting” samples and keep the “correct” labels, during data collection and training.

Q: How is the evaluation data set constructed?

A: We extracted all the dog breeds that have matched names in pairs in Clickture Full, which result in 344 dog breeds in Clickture-Dog data set. The evaluation dataset will include 100 categories, including part of these 106 dog breeds which have more than 100 samples in Clickture-Dog dataset. But we will also include a few categories with small number (i.e. <100) of samples. This is to encourage participants to train a recognizer that can recognize as many as possible dog breeds. The Clickture-Dog is just one way to collect training data. You can also filter the Clickture-Full to find more data. This year, we also allow participants to collect training data outside of Clickture by themselves, but we will only control the evaluation set to define the problem.

Q: How to extract the images and associating class labels from clickture_dog_thumb.tsv file?

A: For the Clickture-Dog dataset, there are 95119 lines in the clickture_dog_thumb.tsv files, each is a data sample. There are three columns (delimited by “\t”) in each line. A sample line from clickture_dog_thumb.tsv file is shown below:

affenpinscher /LUkKqfrtLwqEw /9j/4AAQSkZJRgABAQEAAAAAAAD/2wBDAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCI…

the first column (“affenpinscher”) is the dog breed label, If you count all the unique dog breed strings in the first column, there are 344 dog breeds in this Clickture-Dog dataset.

the second column (“/LUkKqfrtLwqEw”) is the unique key for this record (which can be used to locate the corresponded record in the Clickture-Full dataset).

the third column (“/9j/4AAQSkZJR…”) is the jpeg image encoded as base64, which can be saved to file by “File.WriteAllBytes (“Sample.jpg”, Convert.FromBase64String(ImageData))” in C#. You can also use this website to decode this base64Encoded image data and save it to a jpg file manually.

Q: We are trying to run the sample server on our Linux machine and we are getting an Unhandled System.TypeLoadException?

A: Please install FSharp first and give it a try again: “sudo apt-get install fsharp”. If it still doesn’t work, please run the program with: “MONO_LOG_LEVEL=debug mono [***the exe program***]” and send us all the output.

Q: We updated the sample code with team name and GUIDs, it compiled and started successfuly, we run the CommandLineTool to test it, both Ping and Check steps work well, butwe see “Error: Connected to Prajna Hub Gateway, but get empty response, please check classifier xxxx and its response format” when sending the Recog request in step 2.(3),

A: Please check:

    1. Use some network monitoring tool to check whether the CommandLineTool.exe send out the request. Since both 2.(1) and 2.(2) worked well, it is very unlikely that 2.(3) is blocked;
    2. Use some network monitoring tool to check whether your machine received the recognition request from the gateway, when you use 2.(3) to send the recognition. It is possible that your firewall blocked this incoming requests from Prajna Gateway;
    3. Put a break point in the PredictionFunc(), to see whether your classifier wrapper (the sample code) really received the request, and which exe or function (which should be your classifier) it called to recognizer the image, here you may have put a wrong exe file path, or your classifier DLL may have some issue, or you forgot to build the IRC.SampleRecogCmdlineCSharp.exe, which the sample code will call by default to return dummy results.
    4. If the image is sent to your classifier.exe, check your classifier side to see whether it get executed correctly and return the result string like “tag1:0.95;tag2:0.32;tag3:0.05;tag4:0.04;tag5:0.01”

    Q: When we try to run the sample code on linux, we encountered error message containing: “Mono: Could not load file or assembly ‘FSharp.Core, Version=, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’ or one of its dependencies.”

    A: For Linux/Ubuntu users, please follow the instructions we provided in the readme file to install the up-to-date version of mono and fsharp.

    Please note that the sample code has to be run with the newest version of fsharp (4.3.1) . The default mono and fsharp packages you get from Ubuntu may not be up-to-date. Please follow the readme file provided with the sample codes to install the newest mono and fsharp package.

    If you have already install the old version, please do the following to fix the problem.

    sudo apt-get autoremove mono-completesudo  apt-get autoremove fsharpsudo  apt-key adv --keyserver hkp:// --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF echo "deb wheezy main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list  sudo apt-get update esudo apt-get install mono-complete fsharp

    Q: When we use CommandLineTest tool, we encountered error message when checking the the availability of our classifier:

    Error: Connected to Prajna Hub Gateway, but get empty response, please check classifier 12345678-abcd-abcd-abcd-123456789abc and its response format.

    A: Some teams confused serviceGuid with providerGuid, so they used the providerGuid to hit a classifier in step#2, which should be serviceGuid instead. ServiceGuid is used to identify your classifier, while providerGuid is used to identify your team. Please use the exact command lines we sent you in earlier email of update 3/3. No need to change anything.

    Step#2: CommandLineTool.exe Check -pHub -serviceGuid "GUID_Of_Your_Classifier" Step#3: CommandLineTool.exe Recog -pHub -serviceGuid "GUID_Of_Your_Classifier" -providerGuid "GUID_Of_Your_Team" -infile c:\dog.jpg