United States Change | All Microsoft Sites
Digital Watersheds
Science at Microsoft

Data Deluge and Digital Watersheds

Commodity sensors and Internet connectivity have created a veritable data deluge. Yet it remains a challenge to find, access, clean, and reuse data. That is particularly true when data from different sources is needed for synthesis science—bringing many diverse observations together to create a larger, holistic view.

Scientists at the Berkeley Water Center, led by James Hunt, worked with the National Marine Fisheries Service on a synthesis challenge in the Russian River area. The watershed is the breeding ground for several species of fish, but wine-grape farming, urbanization, gravel mining, and other factors have affected the river. As a result, the fish have become endangered, and habitat restoration is critical.

To enable such studies, Microsoft Research’s Catharine van Ingen and Lawrence Berkeley National Laboratory researcher Deb Agarwal built a digital watershed. Constructed on a Microsoft SQL Server database and SQL Server Analysis Services data cube, the digital watershed enables simple interactive browsing of the assembled diverse sensor and field observations. Data updates are automatically harvested from available government websites and are ingested from smaller spreadsheets or other sources. Some of the data are historic, dating back more than 100 years. Others are real-time measurements with only transient availability over the Internet.

Among the questions examined with the digital watershed was the impact of “frost dips.” During the season when wine-grape buds are setting, local farmers use sprinklers to avoid frost damage. Pumping from the river to supply the sprinklers causes transient dips in the river water level. Such dips can strand small hatchling fish, making them susceptible to predators or oxygen deprivation.

This research led to a fundamental change in how hydrologists get timely answers to their questions by using advanced computing methods. It established a model for how scientists can use technologies to help them solve problems with data processing so that they can remain focused on their sciences.

Learn more about this research:

Primary Researchers

Catharine van Ingen

Catharine van Ingen, Ph.D., is partner architect in the Microsoft Research eScience group. Her research explores how commercial software and tools can be used enable synthesis science in environmental research science. A key challenge in such studies is addressing not only very large datasets from satellites and ground sensors, but also the small, irregular, ancillary, and categorical data that are necessary for scientific understanding.

James Hunt

James Hunt is professor of Civil and Environmental Engineering at University of California, Berkeley. A fundamental challenge in all instances of his research topics is how to deal with vast and widely distributed data. Hunt believes that collaborations with Microsoft Research and Lawrence Berkeley National Laboratory (LBNL) were essential in developing appropriate data-harvesting and management tools that permit data synthesis.