United States Change | All Microsoft Sites
Digital Watersheds
Share:
 
 
Science at Microsoft
 

Data Deluge and Digital Watersheds

Commodity sensors and Internet connectivity have created a veritable data deluge. Yet it remains a challenge to find, access, clean, and reuse data. That is particularly true when data from different sources is needed for synthesis science—bringing many diverse observations together to create a larger, holistic view.

Scientists at the Berkeley Water Center, led by James Hunt, worked with the National Marine Fisheries Service on a synthesis challenge in the Russian River area. The watershed is the breeding ground for several species of fish, but wine-grape farming, urbanization, gravel mining, and other factors have affected the river. As a result, the fish have become endangered, and habitat restoration is critical.

To enable such studies, Microsoft Research’s Catharine van Ingen and Lawrence Berkeley National Laboratory researcher Deb Agarwal built a digital watershed. Constructed on a Microsoft SQL Server database and SQL Server Analysis Services data cube, the digital watershed enables simple interactive browsing of the assembled diverse sensor and field observations. Data updates are automatically harvested from available government websites and are ingested from smaller spreadsheets or other sources. Some of the data are historic, dating back more than 100 years. Others are real-time measurements with only transient availability over the Internet.

Among the questions examined with the digital watershed was the impact of “frost dips.” During the season when wine-grape buds are setting, local farmers use sprinklers to avoid frost damage. Pumping from the river to supply the sprinklers causes transient dips in the river water level. Such dips can strand small hatchling fish, making them susceptible to predators or oxygen deprivation.

Learn more about this research:

Primary Researchers

Catharine van Ingen

Catharine van Ingen, Ph.D., is partner architect in the Microsoft Research eScience group. Her research explores how commercial software and tools can be used enable synthesis science in environmental research science. A key challenge in such studies is addressing not only very large datasets from satellites and ground sensors, but also the small, irregular, ancillary and categorical data that are necessary for scientific understanding. Prior to coming to Microsoft Research in 2005, Catharine was the Windows architect primarily concerned with storage management, including distributed consumer archive. She has been co-system architect for an early 64-bit server, performance architect for two processors, worked on a large physics detector data acquisition system, and simulated the Missouri River. She holds degrees in civil and environmental engineering from the University of California, Irvine (B.S.), University of California, Berkeley (M.S.), and California Institute of Technology (Ph.D.).

James Hunt

James Hunt, Ph.D., was trained in environmental engineering at the University of California, Irvine (B.S.), Stanford University (M.S.), and the California Institute of Technology (Ph.D.), and has been in the Civil and Environmental Engineering Department at the University of California, Berkeley, since 1980. His teaching interests emphasize many aspects of water resources engineering, including water treatment and hydrology. Research topics have included particle dynamics in marine systems, estuarine sediment transport, contaminant transport processes in the subsurface, and hydrologic science. In all instances, initial efforts have been constrained by data management challenges of finding the existing data, documenting the source of that data, and then using models as a means of scaling that data from one location to another. With the vast and widely distributed data that are available in hydrologic sciences, collaborations with Microsoft Research and Lawrence Berkeley National Laboratory (LBNL) were essential in developing appropriate data harvesting and management tools that permit data synthesis.