DopplerSource: .NET Framework for Accesing Doppler Radar Data

  • Beth Plale | Indiana University

Doppler radar data, which has proven its value in meteorology research, has tremendous potential for use in many other research endeavors if only it weren’t so difficult to work with. In DopplerSource we are removing the hurdles that prevent broader use of the data through a service-based framework for storing, operating on, and serving the data. The 130 WSR-88D (Doppler) radars located throughout the United States generate Level II data continuously 24×7. The data has been valuable in many aspects of meteorology research and education, for instance, for the real time warning of hazardous spring and winter weather, for initializing numerical weather prediction models, and for verifying the occurrence of past events, such as the location of damaging hail. But it has broader potential. Level II data is used in bird and insect migration student, bird strike avoidance, urban pollution transport, and the tracking of hazardous atmospheric releases. This larger goal of facilitating additional avenues of science cannot be fully realized without significant improvements in the accessibility and availability of the data over what exists today.

In this project partially funded through Microsoft e-Science, we are constructing a .NET framework for storing, operating on, and serving NEXRAD Level II data and the knowledge products derived from the data. Our pilot project is aimed the six nearest radars surrounding Bloomington, Indiana. The project focus areas are in:

Storing and indexing large volumes of streaming data using a SQL Server database
Generating metadata on-the-fly to describe data and capture features of time-sequence in which the data arrived
Simple retrieval of Doppler data through a spatial-temporal interface. The user selects a region of interest, and specifies a temporal range.
Support services to query, process, clean, filter, and fuse data on the fly
Authentication mechanisms to avoid denial of service abuse by over-taxing the computational resource
Scalability—level of performance that balances continuous input stream arrival, computationally intense user services, and rich query access over highly correlated temporal and spatial data
Log analysis to characterize arrival and anticipate user workload. Logs from related meteorology services used to analyze patterns of use that allow us to better anticipate future usage patterns
The storage needs for the pilot radars alone is substantial. The 6 radars generate 27.5 TB per year of raw Level II data that can be compressed to 1/25th size, requiring 1TB/yr of storage. A useful transformation of the data is into the binary netCDF format. The converted data adds another 2.5 TB/year. The arriving data products are tagged with metadata to facilitate searching. The metadata needs for the pilot data products are estimated at 170GB/yr. The knowledge products generated on-demand by statistical analysis and data mining services are estimated at 0.5 TB/yr. This places the total storage need at 4.5TB/year of data. The tools used include web service framework (.NET), database management system (SQL Server), XML metadata schema (leveraging LEAD Metadata Schema from the NSF LEAD project), and Integrated Radar Data Services (IRaDS) support for the Doppler streams. The hardware testbed includes 16 dual Opterons with 16GB RAM each, a 3.5 TB SAN storage array, a dual Opteron, 4GB RAM, 2TB RAID 1 disk, Windows 2003 as the database server, and the Indiana University MDSS fault tolerant mass store server with a collective 1 Petabyte of storage.

Speaker Details

Beth Plale is an Assistant Professor in the Department of Computer Science at Indiana University. Prior to joining Indiana University, Professor Plale held a Postdoc in the Center for Experimental Research and Computer Systems at Georgia Tech. Plale’s Ph.D. is in computer science from State University of New York Binghamton. She earned a M.S. in computer science from Temple University in 1991, an MBA from University of LaVerne in 1986, and a B.Sc. in computer science from University of Southern Mississippi in 1984. Professor Plale’s interest in experimental systems was heavily influenced by time spent as a software engineer in the defense industry in the 80’s. Her research interests include data-driven applications, parallel and distributed computing, data management, and grid computing.

    • Portrait of Jeff Running

      Jeff Running