Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Cloud computing aids researchers in solving the unsolvable in medical data labeling

March 1, 2019 | By Vani Mandava, Director, Data Science Outreach

Members of the Emory University team (from left): Ph.D. student Samanesh Nasiri and software engineers Amit Verma and Alvince Pongos

It’s not uncommon for physicians to disagree about a diagnosis. That’s why people often seek a second or third opinion when faced with a serious or complex health concern. What if instead of a second opinion, hundreds of expert opinions could be collated? What if those experts were a combination of both humans and AI algorithms, as is the case in a crowdsourced version of traffic model convergence? That’s the promise of work being done at Emory University in Atlanta by Dr. Gari Clifford, interim chair and associate professor in the university’s School of Medicine, and we at Microsoft Research are excited to support him and his team in their innovative efforts to make AI diagnoses more accurate.

Through our partnership with the National Science Foundation, we have been able to provide them with cloud computing resources to better manage the growing number of algorithms and datasets they employ in tackling this goal.

The work: Creating a super algorithm

Dr. Clifford and his team target a variety of medical scenarios, including heart arrhythmias, which we’ll use as a case study here to explore the two-step approach they take to their work. First, they have cardiologists help train algorithms by labeling electrocardiograms, or recordings of the heart, as normal, noisy, or abnormal in rhythm. They then use a mathematical process to determine which of the doctors are most accurate and assign weights to their labels proportionally. In the second step, the team conducts an international challenge in which the labeled data is made available to the larger research community, resulting in a collection of independent algorithms that can learn from the labels to be almost as accurate as the doctors when labeling new data. The leading algorithms are then used to “vote” on the labels, creating a super algorithm that is more accurate than any single one. The eventual result, it is hoped, will be an AI system that can identify heartbeat abnormalities with precision.

This approach is particularly useful in cases where experts disagree. Said Dr. Clifford, “In normal machine training exercises, if a subset of the data can’t be labeled (because the experts disagree), the computer scientists may just throw out that data. But when you’re dealing with people, with real diagnoses that experts disagree on, that’s where the most important data resides. Solving the currently unsolvable problems is what this project is trying to do.”

When the researchers commercialize this labeling system, it could ultimately be able to review the data from wearables, such as the latest smart watches and fitness trackers, to alert consumers if or when they have an abnormal heartbeat. Today, a prototype cloud-based system facilitates the upload of medical data and algorithms to create an ever-growing database of arrhythmia events.

Said Dr. Clifford, “We have shown that this system can identify the minimum number of experts needed to provide accurate labels on the electrocardiogram.” This research has additional applications to other health scenarios, including critical care monitoring, sleep analysis, epilepsy seizure prediction, and perinatal monitoring.

Improving scalability with cloud computing

As the algorithms and datasets began to grow, more computing resources became necessary to respond rapidly to the many users contributing. When competing in the international challenges, teams wanted to run their algorithms on the same datasets at the same time and receive an answer within minutes or hours, so scalability was an important design consideration.

Dr. Clifford applied for and received $62,000 worth of Microsoft Azure resources via a grant from the National Science Foundation’s Big Data Regional Innovation Hubs program, which facilitates collaboration among the government, research community, and private sector in using data science to address societal needs and to which we committed $3 million in Azure credits in June 2016.

Dr. Clifford finds Azure is both fast and scalable and that with Azure Kubernetes Service, formerly Azure Container Service, fewer resources are required than with virtual machines and the runtime components, libraries, and operating system are portable from machine to machine.

Curating these datasets using machine learning running on container-based Azure resources helps reduce the uncertainty in labeling to facilitate more efficient and effective human-AI collaboration. Dr. Clifford and his team have demonstrated a novel approach through this ensemble of cloud-enabled machine learning, competition, and expert labeling.

While the proliferation of data relevant to health opens enormous opportunities for individuals, health care providers, and researchers, addressing data labeling and other such challenges is an important step in leveraging this data for better health care outcomes, and Azure is here to help.

Up Next

group of people at KDD

Artificial intelligence, Data platforms and analytics

Machine learning, data mining and rethinking knowledge at KDD 2018

KDD 2018, the 24th ACM Conference on Knowledge Discovery and Data Mining took place in London, United Kingdom on August 19-23 in the heart of London’s historic Royal Docks. KDD is one of the top conferences in the machine learning and data mining domain, bringing together researchers and practitioners across computer science and all verticals. […]

Microsoft blog editor

Artificial intelligence, Data platforms and analytics

Measuring human happiness and frustration using data science in the cloud

Emotions make us human. Researchers at The Alan Turing Institute in the United Kingdom are using artificial intelligence and machine learning to push the state of the art in data science to better understand what makes us happy, angry and frustrated. “Our research seeks to try and measure aspects of the world that we, as […]

Kenji Takeda

Director, Health and AI Partnerships (Academic)

NSF Big Data Innovation Hubs collaboration

Artificial intelligence, Data platforms and analytics, Ecology and environment, Medical, health and genomics

NSF Big Data Innovation Hubs collaboration — looking back after one year

By Vani Mandava, Director, Data Science Significant technical advancements in cloud computing have led to lower infrastructure costs, making possible big storage and big computing. Big data technology, though, requires cross-discipline research within and beyond non-computing domains. This is where domain experts collaborate with computing teams, industry, and government agencies to discover new insights that […]

Microsoft blog editor