Here are some frequently asked questions regarding IRC and clickture-dog data:
Q: Can I use ImageNet or other data for network pre-training?
A: Yes, you can use pre-trained CNN models (by ImageNet or other dataset), as long as you can fine tune that well to fit for the dog breed categories. Basically, we treat these pre-trained CNN models as feature extraction layer. Your system/algorithm output will be evaluated and compared with other team’s results.
But, if you use some data other than Clickture dataset as positive/negative examples during the training/tuning, please describe this clearly and notify us before the evaluation starts. Your algorithm/system will still be evaluated, we are considering to rank these systems in separate tracks for fairness.
Q: We found there are some noise/conflicting labels in Clickure-Dog dataset. Is this expected?
A: You may find that there are ~15K images appear in multiple dog breeds. The reason is as follows: we use the clicked queries as the “groundtruth” of dog breeds, but sometimes the same images are returned/clicked by multiple dog breed queries. We intentionally to NOT remove these 15K images from the data. Because we treat this as an important/interest topic to automatically remove the “wrong/conflicting” samples and keep the “correct” labels, during data collection and training.
Q: How is the evaluation data set constructed?
A: We extracted all the dog breeds that have matched names in pairs in Clickture Full, which result in 344 dog breeds in Clickture-Dog data set. The evaluation dataset will include 100 categories, including part of these 106 dog breeds which have more than 100 samples in Clickture-Dog dataset. But we will also include a few categories with small number (i.e. <100) of samples. This is to encourage participants to train a recognizer that can recognize as many as possible dog breeds. The Clickture-Dog is just one way to collect training data. You can also filter the Clickture-Full to find more data. This year, we also allow participants to collect training data outside of Clickture by themselves, but we will only control the evaluation set to define the problem.
Q: How to extract the images and associating class labels from clickture_dog_thumb.tsv file?
A: For the Clickture-Dog dataset, there are 95119 lines in the clickture_dog_thumb.tsv files, each is a data sample. There are three columns (delimited by “\t”) in each line. A sample line from clickture_dog_thumb.tsv file is shown below:
affenpinscher /LUkKqfrtLwqEw /9j/4AAQSkZJRgABAQEAAAAAAAD/2wBDAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCI…
the first column (“affenpinscher”) is the dog breed label, If you count all the unique dog breed strings in the first column, there are 344 dog breeds in this Clickture-Dog dataset.
the second column (“/LUkKqfrtLwqEw”) is the unique key for this record (which can be used to locate the corresponded record in the Clickture-Full dataset).
the third column (“/9j/4AAQSkZJR…”) is the jpeg image encoded as base64, which can be saved to file by “File.WriteAllBytes (“Sample.jpg”, Convert.FromBase64String(ImageData))” in C#. You can also use this website to decode this base64Encoded image data and save it to a jpg file manually.
Q: We are trying to run the sample server on our Linux machine and we are getting an Unhandled System.TypeLoadException?
A: Please install FSharp first and give it a try again: “sudo apt-get install fsharp”. If it still doesn’t work, please run the program with: “MONO_LOG_LEVEL=debug mono [***the exe program***]” and send us all the output.
Q: We updated the sample code with team name and GUIDs, it compiled and started successfuly, we run the CommandLineTool to test it, both Ping and Check steps work well, butwe see “Error: Connected to Prajna Hub Gateway, but get empty response, please check classifier xxxx and its response format” when sending the Recog request in step 2.(3),
A: Please check:
- Use some network monitoring tool to check whether the CommandLineTool.exe send out the request. Since both 2.(1) and 2.(2) worked well, it is very unlikely that 2.(3) is blocked;
- Use some network monitoring tool to check whether your machine received the recognition request from the gateway, when you use 2.(3) to send the recognition. It is possible that your firewall blocked this incoming requests from Prajna Gateway;
- Put a break point in the PredictionFunc(), to see whether your classifier wrapper (the sample code) really received the request, and which exe or function (which should be your classifier) it called to recognizer the image, here you may have put a wrong exe file path, or your classifier DLL may have some issue, or you forgot to build the IRC.SampleRecogCmdlineCSharp.exe, which the sample code will call by default to return dummy results.
- If the image is sent to your classifier.exe, check your classifier side to see whether it get executed correctly and return the result string like “tag1:0.95;tag2:0.32;tag3:0.05;tag4:0.04;tag5:0.01”
Q: When we try to run the sample code on linux, we encountered error message containing: “Mono: Could not load file or assembly ‘FSharp.Core, Version=22.214.171.124, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’ or one of its dependencies.”
A: For Linux/Ubuntu users, please follow the instructions we provided in the readme file to install the up-to-date version of mono and fsharp.
Please note that the sample code has to be run with the newest version of fsharp (4.3.1) . The default mono and fsharp packages you get from Ubuntu may not be up-to-date. Please follow the readme file provided with the sample codes to install the newest mono and fsharp package.
If you have already install the old version, please do the following to fix the problem.
sudo apt-get autoremove mono-completesudo apt-get autoremove fsharpsudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF echo "deb http://download.mono-project.com/repo/debian wheezy main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list sudo apt-get update esudo apt-get install mono-complete fsharp
Q: When we use CommandLineTest tool, we encountered error message when checking the the availability of our classifier:
Error: Connected to Prajna Hub Gateway vm-Hub.trafficManager.net, but get empty response, please check classifier 12345678-abcd-abcd-abcd-123456789abc and its response format.
A: Some teams confused serviceGuid with providerGuid, so they used the providerGuid to hit a classifier in step#2, which should be serviceGuid instead. ServiceGuid is used to identify your classifier, while providerGuid is used to identify your team. Please use the exact command lines we sent you in earlier email of update 3/3. No need to change anything.
Step#2: CommandLineTool.exe Check -pHub vm-Hub.trafficManager.net -serviceGuid "GUID_Of_Your_Classifier" Step#3: CommandLineTool.exe Recog -pHub vm-Hub.trafficManager.net -serviceGuid "GUID_Of_Your_Classifier" -providerGuid "GUID_Of_Your_Team" -infile c:\dog.jpg