Coping with Data Deluge
Overwhelmed by data? You’re not alone.
Data mining has become one of the most critical research processes in this era of data-intensive science. There are, however, many areas of science where the usefulness of data mining is limited by the massive nature of the datasets. Consequently, scientists are desperately looking for new tools that can dig into the data faster and deeper. In the rapidly developing field of synoptic sky surveys, for example, transient signals from a variety of interesting astrophysical phenomena must be detected and characterized in (near) real-time. The resulting wealth of data is invaluable to researchers seeking new discoveries, but they need better computational methods to help them manage and analyze so much data.
It was in response to such needs that Caltech’s Keck Institute for Space Studies sponsored a workshop, Digging Faster and Deeper: Algorithms for Computationally Limited Problems in Time-Domain Astronomy, from December 12 to 13. Bringing together more than 50 distinguished participants, the workshop focused on some of the unresolved data mining issues for future studies in time-domain astronomy and related fields.
I was privileged to give two talks during day two of the workshop. In “Discovery of Hidden Patterns in Data through Interactive Search,” I presented the Environmental Informatics Framework (EIF), a strategy and technology platform that the Microsoft Research Connections Earth, Energy, and Environment group developed to help advance data exploration in environmental research. I demonstrated Microsoft PivotViewer, a faceted search technology included in EIF that enables users to visually and interactively search and discover hidden patterns in massive data or image sets.
I was pleased to receive positive feedback from attendees about the work that Microsoft Research is doing for data-intensive sciences. As one participant noted to me in email, “I have to admit that I wasn’t aware of the work that Microsoft Research was doing, but I was very impressed with what I saw yesterday. The work you’ve been doing on data visualization can only be described as stunning!”
In “Building a Better Scientist,” my second talk of the day, I discussed how the fourth paradigm for data-intensive scientific discovery is changing the way scientists conduct research, and is, therefore, creating a need for a new generation of scientists with advanced computational mindsets. The presentation stimulated passionate discussions, and, as event chair George Djorgovski pointed out, it is a topic closely related to how fast and deep we can go with our data.
—Yan Xu, Senior Research Program Manager, Microsoft Research Connections
- Keck Institute for Space Studies
- Digging Faster and Deeper: Algorithms for Computationally Limited Problems in Time-Domain Astronomy
- Microsoft Environmental Informatics Framework (EIF)
- Earth, Energy, and Environment at Microsoft Research Connections
- Microsoft PivotViewer
- The Fourth Paradigm: Data-Intensive Scientific Discovery