Computing tools for the life sciences
I recently sponsored an event in Manizales, Colombia, training biologists on .NET Bio and BioHPC, two projects that make computational research easier in the life sciences. As part of the training, Jarek Pillardy—the head of the Cornell Bioinformatics Facility (CBSU) at Cornell University—and some of his staff presented various aspects of BioHPC. I had the opportunity to sit down with Jarek, who is not only the developer of BioHPC but also a long-time user of the .NET Bio project. Here is a recap of that conversation.
Simon: You lead the CBSU—what activities does it support?
Jarek: CBSU is the Cornell University Bioinformatics Facility, and its mission is to support biological research with advanced computational infrastructure and bioinformatics tools and techniques. The facility’s main activities can be divided into maintaining extensive computational infrastructure configured for bioinformatics; providing easy access to the infrastructure through the web via BioHPC Web or interactively through BioHPC Lab; training, mainly through workshops and consulting; direct research collaborations, ranging from small projects to participating in major grants as co-principal investigators; and software and LIMS development.
Simon: What prompted you to develop BioHPC, and what does it do?
Jarek: BioHPC is our main way to deliver computational infrastructure to biologists. It is not easy for an experimental biologist to use computing tools directly and navigate the complicated maze of schedulers, command-line tools, data-storage methods, and other infrastructure. BioHPC simplifies access, both through the web and interactively, and management of the infrastructure (hardware and software). We created BioHPC to make our life easier and to provide services for many more researchers. BioHPC Web gives users a simple way to submit data for processing and for managing jobs and data. BioHPC Lab is a tool to organize access to interactive machines, reserve time, and manage associated resources, like storage and computing time. For us, it provides a convenient platform to deliver computational resources (hardware and software combined) and a set of tools to manage them.
Simon: Do you have any plans to extend the capabilities of BioHPC in the future?
Jarek: BioHPC is constantly evolving to meet the changing needs in bioinformatics and adapt to technological changes. Currently, we are supporting a diverse array of local and remote clusters, but we are planning to add capacity to run computations in the cloud. We are in the final stages of adding Windows Azure to our supported computing infrastructure. We will be also adding new software.
Architectural overview—BioHPC schema
Simon: How do you see the Windows Azure cloud being used in bioinformatics?
Jarek: For direct research computing, I can see two main scenarios. First, there will be advanced users, running their own virtual machines. These probably will be a minority of users. Second, there will be researchers who access Azure resources via an intermediate service like BioHPC. This scenario will involve a lot of task-focused services (for example, analyzing population data, assembling and annotating sequences, or handling a particular software pipeline) running on Azure, with the end-user not even fully aware of that. Azure provides an opportunity to bring data closer to the computing infrastructure.
Simon: How has BioHPC been able to help the Colombian BIOS Center?
Jarek: I think BioHPC may deliver for them the same benefits it does for us: an easy-to-use tool that provides convenient access to infrastructure and simplified management. They are still in the process of setting up and organizing, and we are in close contact with them, providing consultation and help. BIOS’s mission to the Colombia researchers is very similar to what our facility provides to Cornell, so our tools should be very useful to them. I hope they will be able to improve and expand BioHPC in order to meet their particular needs, which will make it much better.
—Simon Mercer, Director of Health and Wellbeing, Microsoft Research Connections