Data Science for Research

Data Science for Research

About

Microsoft Research provides a continuously refreshed collection of free datasets, tools and resources designed to advance the state of the art of academic research in many areas of computer science, such as natural language processing and computer vision. In addition, you can browse datasets and apply for cloud-based compute cycles available under the Azure for Research program.

Winners to be announced soon for Cloud AI Research Challenge!

The deadline for submissions is now closed. Thanks to the students, academics, and employees of public and private organizations for their submissions to build AI applications on Microsoft AI services. Check back soon to find out who the prize winners are!​​​

NSF Big Data Regional Innovation Hub Program

Microsoft Research is proud to support the US-wide National Science Foundation’s (NSF) Big Data Regional Innovation Hubs (BD Hubs) program by awarding $3M in Microsoft Azure cloud computing credits.

NSF supports four regional hubs for data science innovation, called the BD Hubs, throughout the United States. The consortia are coordinated by top data scientists at multitude of top universities around the country.

 

Spend more time doing computer science and less time on looking for datasets and computational cycles to push the state of the art:

Events

Recent Events

Members of the Data science team were recently at these events:

Dataset directory

Datasets are ordered by recency (more recent at the top) and by their primary research area. If you don’t find what you are looking for in one area, be sure to browse the others.

Human computer interaction

These datasets have to do with methods of interactions between humans and computers such as gesture, speech, and touch.

Audio / video

These datasets are video, continuous images, or audio files.

Data mining / information retrieval

These datasets are semantic maps or image-retrieval data.

Geospatial / location

These datasets contain GPS traces for a variety of applications; validation, activity analysis, and location obfuscation.

Natural language processing

These datasets cut across various NLP applications from semantic analysis, entity extraction, captioning (of video), question answering and many others.

Robotics / computer vision

These datasets are designed to be used for image and video recognition captured by still cameras, depth cameras, robots, and standard video. These clips and images usually include tags or captions to assist with training recognition systems.

Learn

Learning resources

Get started with Azure

Hands-on labs based on research datasets

Microsoft Azure for Research Online

 

Case studies

Creating intelligent water systems to unlock the potential of Smart Cities

The newspaper headlines about “Bangalore’s looming water crisis” have been ominous, with one urban planning expert proclaiming that Bangalore will become “unlivable” in a few years because of water scarcity. This is a critical issue that threatens the future of one of India’s fastest-growing cities. In fact, water availability is a cause for worry in the entire country. According to an estimate by The Asian Development…

Democratizing AI to improve citizen health

Doctors make life-saving — and life-changing — decisions every day. But how do they know that they are making the best decisions? Can artificial intelligence (AI) help? “Before evidence-based medicine, decision-making in health care was heavily reliant on the expertise and knowledge of the health professional, usually a doctor. What has happened in the last 20, 30 years is that the health…

Cloud computing changes the way we practice public speaking

People often rank public speaking as the number one fear that they face. New cloud-based technology from researchers at the University of Rochester lets speakers polish and practice at home in front of their computer camera, while the analysis provides instant feedback about improvement. Leading this effort known as ROC Speak is M. Ehsan Hoque, an assistant professor of…

Preventing flood disasters with Cortana Intelligence Suite

On October 31, 2013, the city of Austin, Texas, faced a destructive flood. At the time, I was visiting David Maidment, Chaired Professor of the Civil Engineering Center for Research in Water Resources on site at the University of Texas at Austin. The day before the flood, we had been discussing research and analytics around the long-standing drought conditions across western Texas. Overnight…

Securing safe water through Microsoft’s Intelligent Cloud

Jacob Katuva used to get up at dawn to cycle 12 miles from his village to collect water with his uncles and cousins when he was growing up in Kenya. Now he is part of a research team at the University of Oxford using cloud computing and mobile sensors to monitor water wells and help ensure that thousands of villages in rural Africa and Asia have a safe, secure supply of water. Millions of people across the world fear not having…

Predicting ocean chemistry using Microsoft Azure

Shellfish farmer Bill Dewey remembers the first year he heard of ocean acidification, a phrase that means a change in chemistry for ocean water. It was around 2008, and Dewey worked for Taylor Shellfish, a company that farms oysters in ocean waters off the coast of Washington. That year, thousands of tiny “seed” oysters died off suddenly. Today, a cloud-based predictive system from the University of Washington…

All that RaaS: saving lives and transforming healthcare economics

Stuart, a 66-year-old man with diabetes, felt lousy—constantly fatigued, nauseated, and short of breath after just the slightest exertion. His daughter, worried by his increasing frailty, took him to the emergency room at the local hospital. Her concern was amply justified: Stuart was suffering from heart failure. Like 5.1 million other…

Microsoft Azure helps researchers predict traffic jams

More than half of the world’s population now lives in cities and suburbs, and as just about any of these billions of people can tell you, urban traffic can be a nightmare. Cars stack up bumper-to-bumper, clogging our highways, jangling our nerves, taxing our patience, polluting our air, and taking a toll on our productivity. In short, traffic jams impair on our emotional, physical, and economic…

Cloud computing helps make sense of cloud forests

The forests that surround Campos do Jordao are among the foggiest places on Earth. With a canopy shrouded in mist much of time, these are the renowned cloud forests of the Brazilian state of São Paulo. It is here that researchers from the São Paulo Research Foundation—better known by its Portuguese acronym, FAPESP—have partnered with Rafael Olivier, professor of ecology…

News & Blog

Microsoft partners with National Science Foundation to empower data science breakthroughs

Over the past decade, Microsoft has partnered with the National Science Foundation (NSF) on three separate programs, first in 2010, and more recently through a commitment of $6M in cloud credits across two NSF supported data science programs – with the Big Data Regional Innovation Hubs and as part of the NSF BigData solicitation…… | February 13, 2018

Calling All AI Innovators – Join the ‘Cloud AI Challenge’ for a Chance to Win $25,000

The AI revolution is poised to unleash unprecedented innovation and impact on our society. Several research and development groups across Microsoft have hit their stride in delivering world-changing impact through the power of AI. Working together, we are creating a comprehensive Microsoft AI platform and a set of AI services that will enable the next generation of intelligent applications that will augment human intelligence….. | December 13, 2017

Transportation Data Science at Microsoft

The National Science Foundation (NSF)-supported Big Data Innovation Hubs launched a National Transportation Data Challenge with a kickoff event in Seattle in May 2017. Microsoft Outreach, through its partnership with the Big Data Hubs organized an Azure workshop and participated in a panel discussion on ‘How Cloud Computing Can Enable Transportation Data Science.’…. | July 13, 2017

NSF Big Data Innovation Hubs collaboration — looking back after one year

Significant technical advancements in cloud computing have led to lower infrastructure costs, making possible big storage and big computing. Big data technology, though, requires cross-discipline research within and beyond non-computing domains. This is where domain experts collaborate with computing teams, industry, and government… | June 8, 2017