The schematics for the SQL to CUDA integration
An image of the 2D galaxy correlation based upon computing 600 trillion(!) galaxy pairs
Data-Intensive Computing at Johns Hopkins
At the Institute for Data Intensive Engineering and Science (IDIES) at Johns Hopkins we have been working closely with Microsoft to move data intensive scientific computing to be performed inside the database. Using SQL Server 2008, and its CLR integration features, we have built sophisticated web services that perform image processing, computational calculations and various statistical functions inside the database, callable as User Defined Functions (UDFs) in T-SQL. These techniques have been applied to several scientific domains, in particular astronomy, turbulence simulations, radiation oncology, wireless sensor networks for environmental monitoring and genomics.
SQL-CUDA Integration
In order to use the new capabilities offered by the emerging GPGPU architectures, we are experimenting with using this functionality in a cluster of database servers to develop a set of software tools for use in the analysis of large data sets, in particular astronomy and genomics.
The implementation uses an out-of-process server communicating to both SQL Server and the CUDA-based GPU resource through a traditional IPC mechanism. The server runs as a Windows Service, and maintains control of the threads. It loads a DDL containing the user code for the CUDA part of the function. The SQLCLR procedure needs to first execute a SQL statement on the server that fetches the result set into the shared memory, then it needs to call the DLL ‘hooks’ in the server to execute the CUDA part of the function, with a pointer to the data in the shared memory. The result is returned to the shared memory, and the return values are transferred to SQL from there.
One of our efforts has been to modify the well-known Smith-Waterman-Gotoh dynamic-programming algorithm for sequence alignment to adapt it for more efficient CUDA implementation[1]. The modification increases the data parallelism of the dynamic-programming computation so as to fully exploit GPU thread parallelism.
In another effort we have used this framework to compute the galaxy-galaxy correlation function for the Sloan Digital Sky Survey data on very large scales. In a few days of analysis we have computed 600 trillion galaxy pairs over the real and Monte-Carlo catalogs of galaxies, providing us with a result with an unprecedented resolution[2].
Data-Scope
We are in the process of building the Data-Scope, a new instrument for data intensive research, a large (5 PB) cluster of servers, containing both extreme I/O performance (450GBytes/sec) and a large number of GPU cards in the server backplanes, providing extremely high throughput coupling between the low level I/O system, and the GPUs. Each server also has a large number of Solid State Disks (SSDs) to improve performance for random access workloads.
[1] Wilton, R., Szalay, A.S. 2010, “Modified Smith-Waterman-Gotoh Algorithm for CUDA Implementation”, NVIDIA GTC Conference, San Jose, Sep 2010.
[2] Tian, H., Neyrinck, M., Budavari, T., Szalay, A.S., 2010, “Evidence for Baryon Acoustic Oscillations through Angular Tomography of the SDSS”, Astrophys. J., submitted.