AutoML Challenge: A leap forward for machine learning competitions
By Isabelle Guyon, Professor, University Paris-Saclay, and President, ChaLearn
If you are attending this year’s ICML conference in New York City, June 19–24, be sure to drop by the AutoML workshop and congratulate Team AAD Freiburg, the winners of the Automatic Machine Learning (AutoML) Challenge. Led by Frank Hutter, who co-developed SMAC and Auto-WEKA, the winning team delivered auto-sklearn, an open-source tool that provides a wrapper around the Python library scikit-learn. Running head to head in most phases, the Intel team, led by Eugene Tuv, used a proprietary solution, a fast implementation of tree-based methods in C/C+.
In recent years, challenges have emerged as a means of crowdsourcing machine learning. Naturally, some organizers have started trying to automate the process of participation in competitions in order to save time and maximize profit.
CodaLab Competitions, an open-source challenge platform, has made it possible to easily organize machine learning challenges with code submission. Running on Microsoft Azure, the platform provides free compute time and enables unbiased evaluation by executing submitted code in the same condition for all participants; and making it possible for the AutoML Challenge to test whether machine learning code could operate without any human intervention under strict execution time and memory usage constraints.
The AutoML Challenge took place from 2014 to 2016, over the course of 18 months. The challenge participants worked to develop fully automatic “black-box” learning machines for feature-based classification and regression problems. Over the course of 5 consecutive rounds, the participants were exposed to 30 datasets from a wide variety of application domains. In each new round, the participants’ code underwent a blind test on 5 new datasets. Several teams succeeded in delivering real AutoML software capable of being trained and tested without human intervention in 20 minutes of time on an 8-core machine. This was regardless of the type of dataset, which included a wide range in level of complexity.
Participants could also enter the challenge without submitting code by running the learning machines on their own local computers and submitting only results: Following each AutoML phase, the newly introduced datasets were released (labeled training data and unlabeled validation and test data), and the participants were able to manually tune their models for over a month during “Tweakathon” phases. We have more details on the solutions and what we learned in a paper, which will be presented at the ICML AutoML workshop.
When we look closely at the results of the challenge, we can see that there is still significant room for improvement. For one thing, there’s a significant gap between Tweakathon and AutoML results, indicating that the “automatic” algorithms can be further optimized. Nonetheless, this challenge has resulted in a leap forward for the field in terms of automation.
Please join us in congratulating the AutoML Challenge winners. By making their solution publicly available, AAD Freiburg has set a great precedent. We are grateful for their contribution. Imagine what the impact to the data science industry would be if all the successful software were shared.
If you missed the challenge, or just want to know more about the details, the winners’ code and the presentation material from several satellite events (hackathons and workshops) are available at ChaLearn’s website. By the way, if you think you can beat the winners, the CodaLab platform remains open for post challenge submissions!