Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Machine learning, data mining and rethinking knowledge at KDD 2018

September 6, 2018 | By Microsoft blog editor

group of people at KDD

A group of Microsoft employees attending KDD 2018.

KDD 2018, the 24th ACM Conference on Knowledge Discovery and Data Mining took place in London, United Kingdom on August 19-23 in the heart of London’s historic Royal Docks. KDD is one of the top conferences in the machine learning and data mining domain, bringing together researchers and practitioners across computer science and all verticals. This year’s KDD was the largest ever, with more than 3400 participants from 99 countries and 1588 submissions and included a strong showing by Microsoft.

In addition to an astonishing program featuring peer-reviewed papers, workshops, hands-on tutorials, deep learning day and Health Day – an entire day dedicated to discussing machine learning trends and addressing challenges in healthcare, attendees were treated to outstanding keynote talks by Imperial College London Emeritus Professor of Mathematics David Hand, Nobel Laureate Alvin Roth, Columbia University’s Data Science Director Jeannette Wing and Oxford University Professor Yee Whye Teh. Professor Hand focused on data science for financial applications and on the importance of understanding the data in this domain. In remarks regarding the reliability of data, one quote in particular stood out, “If data can speak for themselves, they can also lie for themselves”. He identified two types of models: data-driven models that are based on relationships observed in data and come with a statistical theory; and theory-driven models that are based on an underlying theoretical model and can be used to understand the data once a fit to data is made using statistical ideas. He then presented some general lessons that touched upon the limitations of models and the importance of these limitations before they can be applied. Any algorithm will produce a number if data is thrown at it. Therefore, purely data-based approaches are fragile. Many examples in the financial world suffer from non-stationarity and therefore many algorithms are not suitable for these use cases. Thought provoking stuff.

Professor Chris Re of Stanford University talked about Software 2.0 and the Snorkel project. The manual process to create labeled training data is expensive and slow in real-life applications and requires domain expertise. The Snorkel project aims to rapidly create, model and manage large training sets which is essential for the success of machine learning models. This project takes noisy labeling functions from users and automatically models the process by learning in which labeling functions are more accurate.

Microsoft had a strong and dynamic presence at the conference with multiple oral and poster presentations, tutorials and workshops. Joseph Sirosh, Corporate Vice President and CTO for AI gave a well-attended invited talk titled, “Planet-scale Land Cover Classification with FPGAs” in which he demonstrated the power of Azure Machine Learning and Project Brainwave in classification of terabytes of land cover aerial images using DNNs and tackling use cases such as wildlife poacher recognition.

For the full list of Microsoft’s contributions at KDD 2018, check out https://www.microsoft.com/en-us/research/event/kdd-2018/ and be sure to watch some of the videos if you were unable to attend!

Up Next

Artificial intelligence

Deep Learning Indaba 2018: Strengthening African machine learning

At the 30th conference on Neural Information Processing in 2016, one of the world’s foremost gatherings on machine learning, there was not a single accepted paper from a researcher at an African institution. In fact, for the last decade, the entire African continent has been absent from the contemporary machine learning landscape. The following year, […]

Tempest van Schaik

Software Engineer, AI & Data Science, Commercial Software Engineering

Data management, analysis and visualization

Changing the world with data science

Alan Turing asked the question “can machines think?” in 1950 and it still intrigues us today. At The Alan Turing Institute, the United Kingdom’s national institute for data science in London, more than 150 researchers are pursuing this question by bringing their thinking to fundamental and real-world problems to push the boundaries of data science. […]

Kenji Takeda

Director, Azure for Research

Data Science education at UC Berkeley

Data management, analysis and visualization

A new understanding of the world through grassroots Data Science education at UC Berkeley

By Vani Mandava, Director, Data Science, Microsoft Research While some may regard data science as an easy passport to a job for the tech savvy, Luis Macias has different ideas. The fourth-year undergraduate student, who is majoring in American Studies at University of California, Berkeley (UC Berkeley), wants to turn the hype of data science […]

Microsoft blog editor