Data Mining the SDSS SkyServer Database
- J. Gray ,
- A.S. Szalay ,
- A. Thakar ,
- P. Kunszt ,
- C. Stoughton ,
- D. Slutz ,
- J. Vandenberg
MSR-TR-2002-01 |
Distributed Data and Structures 4: Records of the 4th International Meeting
An earlier paper described the Sloan Digital Sky Survey’s (SDSS) data management needs [Szalay1] by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper “The SDSS SkyServer – Public Access to the Sloan Digital Sky Server Data” [Szalay2].