Contest promotes automation of machine learning

Published

Machine learning is the cornerstone of today’s modern data analysis. The gurus of “big data” analytics are all well versed in machine learning, but most domain specialists still must hire data scientists to meet their data-analysis needs. It’s inevitable, though, that the data-modeling chain will become largely automated—simplified to the point where off-the-shelf data transformation tools will be as pervasive as those for word processing and spreadsheets. Data analysis will then be like driving a car: the user will focus on the route to the destination, without worrying about how the engine works.

We refer to this vision as the automation of machine learning, or AutoML for short. To help advance towards this grand goal, ChaLearn (opens in new tab), an organization that promotes machine-learning challenges, has launched a contest to help democratize machine learning (opens in new tab). Built on the new CodaLab (opens in new tab) platform, the contest offers US$30,000 in prizes donated by Microsoft. More than 60 teams already have entered the contest during the Prep round, and now, until October 15, 2015, you can enter any of five additional rounds: novice, intermediate, advanced, expert, or master. Visit the ChaLearn Automatic Machine Learning Challenge site (opens in new tab) to see the deadlines for each round. You can enter even if you have not participated in previous rounds.

Five rounds remain in the Automatic Machine Learning Challenge, each round consisting of AutoML and Tweakathon phases.

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience

Five rounds remain in the Automatic Machine Learning Challenge, each round consisting of AutoML and Tweakathon phases.

The contest problems are drawn from a variety of domains. They include challenges in the classification of text, the prediction of customer satisfaction, the recognition of objects in photographs, the recognition of actions in video data, as well as problems involving speech recognition, credit ratings, medical diagnoses, drug effects, and the prediction of protein structures.

Five datasets of progressive difficulty are introduced during each round. The rounds alternate between (1) AutoML phases, during which submitted code is blind tested in limited time on our platform, using datasets you have never seen before; and (2) Tweakathon phases, in which you are given time to improve your methods by tweaking them on those datasets and running them on your own systems, without computational resource limitation and without requirement of code submission.

During the novice round, which runs through April 14, you will encounter only binary classification problems, with no missing values and no categorical variables. All the datasets are formatted as simple data tables—no sparse matrix format, though one dataset does include a lot of zeros. The classes are balanced. The number of features does not exceed 2,000, and the number of examples does not exceed 6,000. The metric of evaluation is simply classification accuracy.

For more details, read our white paper (opens in new tab).

Enter the AutoML challenge (opens in new tab) for a rich learning and research experience, and a chance to win!

Isabelle Guyon (opens in new tab), President, ChaLearn; Evelyne Viegas, Director, Microsoft Research; Rich Caruana, Senior Researcher, Microsoft Research

Learn more

Continue reading

See all blog posts