Bridging the Gap between the Cloud and an eScience Application Platform

  • Yogesh Simmhan ,
  • Catharine van Ingen

MSR-TR-2009-2021 |

The widely discussed scientific data deluge creates not only a need to computationally scale an application from a local desktop or cluster to a supercomputer, but also the need to cope with variable data loads over time. Cloud computing offers a scalable, economic, on-demand model well matched to the evolving eScience needs. Yet cloud computing creates gaps that must be crossed to move science applications to the cloud. In this article, we propose a Generic Worker framework to deploy and invoke science applications in the Cloud with minimal user effort and predictable, cost-effective performance. Our framework is an evolution of Grid computing application factory pattern and addresses the distinct challenges posed by the Cloud such as efficient data transfers to and from the Cloud, and the transient nature of its VMs. We present an implementation of the Generic Worker for the Microsoft Azure Cloud and evaluate its use in a genome sequencing application pipeline. Our results show that the user overhead to port and run the application seamlessly across desktop and the Cloud can be substantially reduced without significant performance penalties, while providing on-demand scalability.