Cloud Computing for e-Science

June 12, 2008
Paul Watson | School of Computing Science, Newcastle University

CARMEN is a $9M project building a scalable science cloud. Its focus is on supporting neuroscientists who will use it to store, share and analyse 100s of TBs of data.

Understanding how the brain works is a major scientific challenge which will benefit medicine, biology and computer science. Globally, over 100,000 neuroscientists are working on this problem. However, the data that forms the basis for their work is rarely shared even though it is difficult and expensive to produce.

The CARMEN project (www.carmen.org.uk) is addressing these challenges by developing scalable cloud architecture to enable data sharing, integration, and analysis supported by metadata. An expandable range of services are provided in the cloud to extract value from raw and transformed data. This promotes the sharing of analysis services as well as data, and allows services to execute close to the data on which they operate. This is essential to avoid having to ship vast quantities (TBs) of data out of the cloud to the user’s machine for analysis.

Internally, the CARMEN cloud is built as a set of Web Services. Through experience of a wide variety of e-scientific projects over the past 8 years, we have identified a core set of generic services that we believe are needed to support science. These are: a data repository for file and structured data, a metadata repository to allow users to locate and interpret data, a service repository with dynamic deployment onto compute resources, a workflow enactment engine, and a security infrastructure.

The talk will describe the design of the CARMEN system explaining how it is designed to support thousands of users analysing TBs of data. We will describe a typical neuroscience scenario and show how it is supported by the CARMEN prototype.

Speaker Details

Paul Watson is Professor of Computer Science at Newcastle University (UK) and Director of the North East Regional e-Science Centre. He graduated in 1983 with a BSc in Computer Engineering from Manchester University, followed by a PhD in 1986. In the 80s, as a Lecturer at Manchester University, he was a designer of the Alvey Flagship and Esprit EDS parallel systems. In 1990 he moved to industry, working for ICL as a system designer of the Goldrush MegaServer parallel database server, which was released as a product in 1994.In August 1995 he moved to Newcastle University, where he is/has been an investigator on research projects worth approximately $20M. His research has been focussed on parallel and distributed systems: in particular on database servers. In recent years his work has focussed on e-science, especially on methods of accessing and integrating large amounts of data held in distributed databases.In total, he has authored over forty refereed publications and three patents. He is a Chartered Engineer, a Fellow of the British Computer Society, and a member of the UK Computing Research Committee.