Data Science for Research

Data Science for Research


Microsoft Research provides a continuously refreshed collection of free datasets, tools and resources designed to advance the state of the art of academic research in many areas of computer science, such as natural language processing and computer vision. In addition, you can browse datasets and apply for cloud-based compute cycles available under the Azure for Research program.

NSF Big Data Regional Innovation Hub Program

Microsoft Research is proud to support the US-wide National Science Foundation’s (NSF) Big Data Regional Innovation Hubs (BD Hubs) program by awarding $3M in Microsoft Azure cloud computing credits.

NSF supports four regional hubs for data science innovation, called the BD Hubs, throughout the United States. The consortia are coordinated by top data scientists at multitude of top universities around the country.

Spend more time doing computer science and less time on looking for datasets and computational cycles to push the state of the art:


Upcoming Events

  • Microsoft Azure Training | Midwest Big Data Hub All Hands,  5 Oct 2017, Omaha, Nebraska


Recent Events

Members of the Data science team were recently at these events:

Dataset directory

Datasets are ordered by recency (more recent at the top) and by their primary research area. If you don’t find what you are looking for in one area, be sure to browse the others.

Human computer interaction

These datasets have to do with methods of interactions between humans and computers such as gesture, speech, and touch.

Audio / video

These datasets are video, continuous images, or audio files.

Data mining / information retrieval

These datasets are semantic maps or image-retrieval data.

Geospatial / location

These datasets contain GPS traces for a variety of applications; validation, activity analysis, and location obfuscation.

Natural language processing

These datasets cut across various NLP applications from semantic analysis, entity extraction, captioning (of video), question answering and many others.

Robotics / computer vision

These datasets are designed to be used for image and video recognition captured by still cameras, depth cameras, robots, and standard video. These clips and images usually include tags or captions to assist with training recognition systems.


Learning resources

Get started with Azure

Hands-on labs based on research datasets

Microsoft Azure for Research Online


Case studies

Creating intelligent water systems to unlock the potential of Smart Cities

The newspaper headlines about “Bangalore’s looming water crisis” have been ominous, with one urban planning expert proclaiming that Bangalore will become “unlivable” in a few years because of water scarcity. This is a critical issue that threatens the future of one of India’s fastest-growing cities. In fact, water availability is a cause for worry in the entire country. According to an estimate by The Asian Development…

Democratizing AI to improve citizen health

Doctors make life-saving — and life-changing — decisions every day. But how do they know that they are making the best decisions? Can artificial intelligence (AI) help? “Before evidence-based medicine, decision-making in health care was heavily reliant on the expertise and knowledge of the health professional, usually a doctor. What has happened in the last 20, 30 years is that the health…

Cloud computing changes the way we practice public speaking

People often rank public speaking as the number one fear that they face. New cloud-based technology from researchers at the University of Rochester lets speakers polish and practice at home in front of their computer camera, while the analysis provides instant feedback about improvement. Leading this effort known as ROC Speak is M. Ehsan Hoque, an assistant professor of…

Preventing flood disasters with Cortana Intelligence Suite

On October 31, 2013, the city of Austin, Texas, faced a destructive flood. At the time, I was visiting David Maidment, Chaired Professor of the Civil Engineering Center for Research in Water Resources on site at the University of Texas at Austin. The day before the flood, we had been discussing research and analytics around the long-standing drought conditions across western Texas. Overnight…

Securing safe water through Microsoft’s Intelligent Cloud

Jacob Katuva used to get up at dawn to cycle 12 miles from his village to collect water with his uncles and cousins when he was growing up in Kenya. Now he is part of a research team at the University of Oxford using cloud computing and mobile sensors to monitor water wells and help ensure that thousands of villages in rural Africa and Asia have a safe, secure supply of water. Millions of people across the world fear not having…

Predicting ocean chemistry using Microsoft Azure

Shellfish farmer Bill Dewey remembers the first year he heard of ocean acidification, a phrase that means a change in chemistry for ocean water. It was around 2008, and Dewey worked for Taylor Shellfish, a company that farms oysters in ocean waters off the coast of Washington. That year, thousands of tiny “seed” oysters died off suddenly. Today, a cloud-based predictive system from the University of Washington…

All that RaaS: saving lives and transforming healthcare economics

Stuart, a 66-year-old man with diabetes, felt lousy—constantly fatigued, nauseated, and short of breath after just the slightest exertion. His daughter, worried by his increasing frailty, took him to the emergency room at the local hospital. Her concern was amply justified: Stuart was suffering from heart failure. Like 5.1 million other…

Microsoft Azure helps researchers predict traffic jams

More than half of the world’s population now lives in cities and suburbs, and as just about any of these billions of people can tell you, urban traffic can be a nightmare. Cars stack up bumper-to-bumper, clogging our highways, jangling our nerves, taxing our patience, polluting our air, and taking a toll on our productivity. In short, traffic jams impair on our emotional, physical, and economic…

Cloud computing helps make sense of cloud forests

The forests that surround Campos do Jordao are among the foggiest places on Earth. With a canopy shrouded in mist much of time, these are the renowned cloud forests of the Brazilian state of São Paulo. It is here that researchers from the São Paulo Research Foundation—better known by its Portuguese acronym, FAPESP—have partnered with Rafael Olivier, professor of ecology…