Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Machine learning, data mining and rethinking knowledge at KDD 2018

September 6, 2018 | By Microsoft blog editor

group of people at KDD

A group of Microsoft employees attending KDD 2018.

KDD 2018, the 24th ACM Conference on Knowledge Discovery and Data Mining took place in London, United Kingdom on August 19-23 in the heart of London’s historic Royal Docks. KDD is one of the top conferences in the machine learning and data mining domain, bringing together researchers and practitioners across computer science and all verticals. This year’s KDD was the largest ever, with more than 3400 participants from 99 countries and 1588 submissions and included a strong showing by Microsoft.

In addition to an astonishing program featuring peer-reviewed papers, workshops, hands-on tutorials, deep learning day and Health Day – an entire day dedicated to discussing machine learning trends and addressing challenges in healthcare, attendees were treated to outstanding keynote talks by Imperial College London Emeritus Professor of Mathematics David Hand, Nobel Laureate Alvin Roth, Columbia University’s Data Science Director Jeannette Wing and Oxford University Professor Yee Whye Teh. Professor Hand focused on data science for financial applications and on the importance of understanding the data in this domain. In remarks regarding the reliability of data, one quote in particular stood out, “If data can speak for themselves, they can also lie for themselves”. He identified two types of models: data-driven models that are based on relationships observed in data and come with a statistical theory; and theory-driven models that are based on an underlying theoretical model and can be used to understand the data once a fit to data is made using statistical ideas. He then presented some general lessons that touched upon the limitations of models and the importance of these limitations before they can be applied. Any algorithm will produce a number if data is thrown at it. Therefore, purely data-based approaches are fragile. Many examples in the financial world suffer from non-stationarity and therefore many algorithms are not suitable for these use cases. Thought provoking stuff.

Professor Chris Re of Stanford University talked about Software 2.0 and the Snorkel project. The manual process to create labeled training data is expensive and slow in real-life applications and requires domain expertise. The Snorkel project aims to rapidly create, model and manage large training sets which is essential for the success of machine learning models. This project takes noisy labeling functions from users and automatically models the process by learning in which labeling functions are more accurate.

Microsoft had a strong and dynamic presence at the conference with multiple oral and poster presentations, tutorials and workshops. Joseph Sirosh, Corporate Vice President and CTO for AI gave a well-attended invited talk titled, “Planet-scale Land Cover Classification with FPGAs” in which he demonstrated the power of Azure Machine Learning and Project Brainwave in classification of terabytes of land cover aerial images using DNNs and tackling use cases such as wildlife poacher recognition.

For the full list of Microsoft’s contributions at KDD 2018, check out https://www.microsoft.com/en-us/research/event/kdd-2018/ and be sure to watch some of the videos if you were unable to attend!

Up Next

Artificial intelligence

Deep Learning Indaba 2018: Strengthening African machine learning

At the 30th conference on Neural Information Processing in 2016, one of the world’s foremost gatherings on machine learning, there was not a single accepted paper from a researcher at an African institution. In fact, for the last decade, the entire African continent has been absent from the contemporary machine learning landscape. The following year, […]

Tempest van Schaik

Software Engineer, AI & Data Science, Commercial Software Engineering

Artificial intelligence

Transfer learning for machine reading comprehension

By Xiaodong He, Principal Researcher, Microsoft Research For human beings, reading comprehension is a basic task, performed daily. As early as in elementary school, we can read an article, and answer questions about its key ideas and details. But for AI, full reading comprehension is still an elusive goal–but a necessary one if we’re going […]

Microsoft blog editor

Artificial intelligence, Data management, analysis and visualization

Microsoft accelerates data science at The Alan Turing Institute with $5m in cloud computing credits

By Kenji Takeda, Director, Azure for Research, Microsoft Research Microsoft is excited to be empowering researchers at The Alan Turing Institute to achieve more by awarding $5 million in Microsoft Azure cloud computing credits. The Turing is the U.K.’s national center for data science, with its headquarters at the British Library in London. It is […]

Microsoft blog editor