Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Machine learning, data mining and rethinking knowledge at KDD 2018

September 6, 2018 | By Microsoft blog editor

group of people at KDD

A group of Microsoft employees attending KDD 2018.

KDD 2018, the 24th ACM Conference on Knowledge Discovery and Data Mining took place in London, United Kingdom on August 19-23 in the heart of London’s historic Royal Docks. KDD is one of the top conferences in the machine learning and data mining domain, bringing together researchers and practitioners across computer science and all verticals. This year’s KDD was the largest ever, with more than 3400 participants from 99 countries and 1588 submissions and included a strong showing by Microsoft.

In addition to an astonishing program featuring peer-reviewed papers, workshops, hands-on tutorials, deep learning day and Health Day – an entire day dedicated to discussing machine learning trends and addressing challenges in healthcare, attendees were treated to outstanding keynote talks by Imperial College London Emeritus Professor of Mathematics David Hand, Nobel Laureate Alvin Roth, Columbia University’s Data Science Director Jeannette Wing and Oxford University Professor Yee Whye Teh. Professor Hand focused on data science for financial applications and on the importance of understanding the data in this domain. In remarks regarding the reliability of data, one quote in particular stood out, “If data can speak for themselves, they can also lie for themselves”. He identified two types of models: data-driven models that are based on relationships observed in data and come with a statistical theory; and theory-driven models that are based on an underlying theoretical model and can be used to understand the data once a fit to data is made using statistical ideas. He then presented some general lessons that touched upon the limitations of models and the importance of these limitations before they can be applied. Any algorithm will produce a number if data is thrown at it. Therefore, purely data-based approaches are fragile. Many examples in the financial world suffer from non-stationarity and therefore many algorithms are not suitable for these use cases. Thought provoking stuff.

Professor Chris Re of Stanford University talked about Software 2.0 and the Snorkel project. The manual process to create labeled training data is expensive and slow in real-life applications and requires domain expertise. The Snorkel project aims to rapidly create, model and manage large training sets which is essential for the success of machine learning models. This project takes noisy labeling functions from users and automatically models the process by learning in which labeling functions are more accurate.

Microsoft had a strong and dynamic presence at the conference with multiple oral and poster presentations, tutorials and workshops. Joseph Sirosh, Corporate Vice President and CTO for AI gave a well-attended invited talk titled, “Planet-scale Land Cover Classification with FPGAs” in which he demonstrated the power of Azure Machine Learning and Project Brainwave in classification of terabytes of land cover aerial images using DNNs and tackling use cases such as wildlife poacher recognition.

For the full list of Microsoft’s contributions at KDD 2018, check out and be sure to watch some of the videos if you were unable to attend!

Up Next

Artificial intelligence, Data platforms and analytics

Calling all aspiring women in Data Science

What started as a one-day conference organized by Stanford University in 2015, Women in Data Science (WiDS) has blossomed into a movement bringing together women data scientists and aspiring data scientists via a series of over 150 virtual and in-person events worldwide, ultimately culminating in the March 4, 2019 main event at Stanford. Microsoft is […]

Vani Mandava

Director, Data Science Outreach

Artificial intelligence, Data platforms and analytics

Cloud computing aids researchers in solving the unsolvable in medical data labeling

It’s not uncommon for physicians to disagree about a diagnosis. That’s why people often seek a second or third opinion when faced with a serious or complex health concern. What if instead of a second opinion, hundreds of expert opinions could be collated? What if those experts were a combination of both humans and AI […]

Vani Mandava

Director, Data Science Outreach

Artificial intelligence

Deep Learning Indaba 2018: Strengthening African machine learning

At the 30th conference on Neural Information Processing in 2016, one of the world’s foremost gatherings on machine learning, there was not a single accepted paper from a researcher at an African institution. In fact, for the last decade, the entire African continent has been absent from the contemporary machine learning landscape. The following year, […]

Tempest van Schaik

Software Engineer, AI & Data Science, Commercial Software Engineering