Managing the Scientific Data Explosion: a Response to the OSTP Digital Data RFI
Scientists can agree that there’s a lot of data out there, and that we could be using it more efficiently. Now the White House has asked for input on how to do just that.
Data from scientific research is important to a diverse array of user communities from researchers, governments, and companies to wildlife managers, transportation managers, hospitals, and teachers. As the quantity of data in individual and community collections grows, its potential value also increases but, unfortunately, so do the associated challenges of data access, privacy, storage, and archiving. These challenges are social, economic, and technical, and the solutions will require collaborative contributions from universities, federal agencies, companies, scientific societies, and other organizations.
Effective approaches to realizing the benefits of scientific data are likely to require many elements, including:
- Providing incentives and rewards for sharing data
- Creating and disseminating software tools and online services that enable users to find and analyze data of interest
- Developing and using standard metadata schemas, well-documented data formats, and access protocols to enable data re-use and cross-domain fusing of data
- Facilitating systems by which funding agencies and users can contribute to the costs of data storage, sharing, and analysis
- Developing systems and metrics to determine when and how data is worth preserving and sharing
Microsoft believes that these are challenges worth tackling, and that coordinated efforts are urgently needed to advance our ability to curate, preserve, and use digital scientific data to maximize the societal and economic impact of research. Therefore, on January 12, 2012, Microsoft submitted our input in response to the White House Office of Science and Technology Policy (OSTP) request for information (RFI) on Public Access to Digital Data Resulting From Federally Funded Scientific Research.
The Microsoft response emphasizes two areas: Economic Models and Software Tools and Online Services. We discuss that nations, to facilitate research and realize societal benefits of that research, should create environments in which innovation can occur around the critical elements that enable data sharing, retention, and use, and the costs should be shared among the various groups that receive benefits from the data and associated discoveries. In some cases, dissemination and use of specific data sets are necessary to meet high priority scientific, policy, economic, or societal goals, and thus should be supported by relevant government agencies. In other cases, there are opportunities to create a tool or service infrastructure that enhances the value of data and allows the provider to monetize access at a level sufficient to cover the investment made in creating or maintaining the data archive. We emphasize that in determining which data to share and how, it is important to recognize that consumers of a particular data set may be outside of the research community that created it (for example, in another scientific field or at a commercial enterprise). These consumers should still help define the value of the data and drive the creation of tools to facilitate its cross-domain use. They must also share in paying for its maintenance costs. Overall, we stress the value that innovations in information technology, including emerging cloud services, can bring to facilitating data sharing and analysis and enabling collaborative, multi-disciplinary, and international science.
While the Microsoft response to the OSTP RFI on access to digital scientific data focuses on a few specific areas, it builds on collaborative work already done by the research community and Federal agencies in this area. Experts from Microsoft participate regularly in and support such efforts. In particular, we remain committed to the conclusions of the National Science Foundation’s Advisory Committee for Cyberinfrastructure’s Task Force on Data and Visualization and the Blue Ribbon Task Force on Sustainable Digital Preservation and Access. We also agree with many of the challenges described and conclusions reached in the National Science Board’s draft Data Policies Report released on January 5, 2012.
The above reports and activities focus on the policy side of realizing the value of scientific data. Microsoft is also working to create, demonstrate, and implement the technical side of these challenges. In the book The Fourth Paradigm, the authors identify a range of opportunities where access to data is fundamentally changing the way science is conducted. Microsoft, in partnership with the academic community, is working to put these ideas into practice. Examples include WorldWide Telescope; the new earth-science data explorer, Layerscape; the Eye on Earth network for environmental maps; and data analytics tools such as Daytona and Excel DataScope.
—Elizabeth Grossman, Technology Policy Group, Microsoft Corporation
January 31, 2012, update: The White House Office of Science & Technology Policy (OSTP) has publicly posted all of the responses to the RFI.
- OSTP RFI: Public Access to Digital Data Resulting From Federally Funded Scientific Research
- Microsoft Corporation Response to OSTP RFI
- NSF Task Force on Data and Visualization Final Report, March 2011
- Blue Ribbon Task Force on Sustainable Digital Preservation and Access
- National Science Board’s Data Policies Report
- The Fourth Paradigm: Data-Intensive Scientific Discovery
- Education and Scholarly Communication at Microsoft Research Connections