October 11, 2010 – October 13, 2010

eScience Workshop 2010

Location: Berkeley, California, US

escience270x180 Microsoft Research—in partnership with the Berkeley Water Center, the Colleges of Engineering and Natural Resources at UC Berkeley, and the Lawrence Berkeley National Laboratory—held the 2010 Microsoft Research eScience Workshop on October 11–13 in Berkeley, California.

Scaling the Science

Curre exterior nt opportunities in the physical and biological sciences and their technological applications require the means to fundamentally understand processes at the molecular level and to extend those processes to predict performance at larger scales. eScience is developing approaches for conducting this scaling and has been essential in addressing fundamental questions in biology and astronomy. While additional applications remain in the basic sciences, these fields have demonstrated pathways for advances in the applied environmental and social sciences where the linkages between scales and disciplines require focused contributions from the eScience community.

This “Scaling the Science” workshop provided opportunities to observe how eScience has provided scaling across various fields and to explore some of the challenges that remain for realizing the ambitions of the fourth paradigm.

About the Workshop

The goal of this se claremontatsunset venth annual cross-disciplinary workshop was to bring together scientists from diverse research disciplines to share their research and discuss how computing is transforming their work. The event also included the presentation of the third annual Jim Gray eScience Award to a researcher who has made an especially significant contribution to the field of data-intensive computing.

tonyhey_2009_closesquare_72x72 Primary support for the workshop was provided by Microsoft External Research, headed by Corporate Vice President Tony Hey.

Highlights

bourne_tony_award_sm Jim Gray eScience Award 2010

Each year, Microsoft Research presents the Jim Gray eScience Award to a researcher who has made an outstanding contribution to the field of data-intensive computing. Find out who the recipient of this year’s award is.

From 2007 to 2014, The Jim Gray eScience Award recognized eight researchers for their outstanding work in the field of eScience. Recognizing these pioneers in data-intensive science has helped advance the prestige of the field and strengthen the community.

Past award recipients

2014 award

Paul Watson was awarded the 2014 Jim Gray eScience Award

Dr. Paul Watson is professor of Computer Science and director of the Digital Institute at Newcastle University UK, where he also directs the $20M RCUK Digital Economy Hub on Social Inclusion through the Digital Economy. As a Lecturer at Manchester University, he was a designer of the Alvey Flagship and Esprit EDS systems. From 1990 to 1995, he worked in industry for ICL as a designer of the Goldrush MegaServer parallel database server. In August 1995, he moved to Newcastle University, where he has been an investigator on wide range of eScience projects. His research interest is in scalable information management with a current focus on cloud computing. Professor Watson is a Chartered Engineer and a Fellow of the British Computer Society. Learn more… (opens in new tab)

2013 award

Tony Hey presents David Lipman with the 2013 Jim Gray eScience Award

Dr. David Lipman (opens in new tab), M.D., is director of the National Center for Biotechnology Information (NCBI). Under his leadership, NCBI has become one of the world’s premier repositories of biomedical and molecular biology data, providing invaluable information to both the research community and the public. Every day, more than 3 million users access NCBI’s more than 40 databases. Learn more… (opens in new tab)

2012 award

Antony John Williams receives the Jim Gray eScience Award from Tony Hey at the 2012 Microsoft Research eScience Workshop

Antony John Williams (opens in new tab) is vice president of strategic development and head of Chemoinformatics for the Royal Society of Chemistry. He has pursued a career built on rich experience in experimental techniques, implementation of new nuclear magnetic resonance (NMR) technologies, research and development, and teaching, as well as analytical laboratory management. He has been a leader in making chemistry publically available through collective action: his work on ChemSpider helps provide fast text and structure search access to data and links on more than 28 million chemicals, and this resource is freely available to the scientific community and the general public. Learn more… (opens in new tab)

2011 award

Tony Hey presents Mark Abbott with the Jim Gray eScience Award at the 2011 Microsoft Research eScience Workshop

Mark Abbott (opens in new tab) is dean and professor in the College of Oceanic and Atmospheric Sciences at Oregon State University. He is also serving a six-year term on the National Science Board, which oversees the National Science Foundation and provides scientific advice to the White House and to Congress. Throughout his career, Mark has contributed to integrating biological and physical science, made early innovations in data-intensive science, and provided educational leadership. Learn more… (opens in new tab)

2010 award

Phil Bourne accepts the Jim Gray eScience Award from Tony Hey at the 2010 eScience Workshop

Phil Bourne (opens in new tab), the recipient of the third-annual Jim Gray eScience Award, is a professor in the Department of Pharmacology and Skaggs School of Pharmacy and Pharmaceutical Sciences at the University of California at San Diego. Phil is also the Associate Director of the RCSB Protein Data Bank, an Adjunct Professor at the Burnham Institute, and a past president of the International Society for Computational Biology. “Phil’s contributions to open access in bioinformatics and computational biology are legion, and are exactly the sort of groundbreaking accomplishments in data-intensive science that we celebrate with the Jim Gray Award,” notes Tony Hey, Corporate Vice President of External Research. Learn more… (opens in new tab)

2009 award

Jeff Dozier accepts the Jim Gray eScience Award from Tony Hey at the 2009 eScience Workshop

Jeff Dozier (opens in new tab) was presented the 2009 award in recognition of his achievements in advancing environmental science through leading multi-disciplinary research and collaboration. While presenting the award, Tony Hey stated, “Jeff Dozier’s work epitomizes what the Jim Gray eScience Award is all about … using data-intensive computing to accelerate scientific discovery and, ultimately, to help solve some of society’s greatest challenges. By combining environmental science with computer science technologies, Jeff brings a new level of understanding to climate change and its impact on our planet.” Learn about Dozier’s thoughts about environmental science in The Fourth Paradigm: Data Intensive Scientific Discovery, pages 13–19.

2008 award

Tony Hey presents Carole Goble with the 2008 Jim Gray eScience Award

At the 2008 Microsoft eScience Workshop, the Jim Gray eScience Award was presented to Carole Goble (opens in new tab) in recognition of her contributions to the development of workflow tools to advance data-centric research. To learn about her work and the role of workflow tools in scientific research, see my experiment (opens in new tab) and The Fourth Paradigm: Data Intensive Scientific Discovery, pages 137–145.

2007 award

Tony Hey presents Alex Szalay with the 2007 Jim Gray eScience Award

The winner of the first Jim Gray eScience Award was Alex Szalay (opens in new tab), professor in the Department of Physics and Astronomy at The Johns Hopkins University. Alex was recognized for his foundational contributions to interdisciplinary advances in the field of astronomy and groundbreaking work with Jim Gray.

mbf_thumb Using Software to Enhance Healthcare

Johnson & Johnson Pharmaceutical R&D is using Microsoft Research’s Microsoft Biology Foundation to design new chemical compounds that could improve the health and quality of life of patients around the world.

By Rob Knies

October 12, 2010 6:00 AM PT

Researchers at Johnson & Johnson Pharmaceutical Research and Development (opens in new tab) (J&J PRD) faced a challenge. Over the years, they have built a state-of-the-art platform to enable discovery of small-molecule drugs, but the expanding role of biologics in pharmaceutical research required a new set of tools to handle large-molecule compounds.

Developing such functionality from scratch was a daunting proposition. It would take time and resources while delaying development of novel treatments for debilitating diseases and disorders.

Researchers at Microsoft Research had a solution. Their new, open-source library of bioinformatics functions, the Microsoft Biology Foundation (MBF), part of the Microsoft Biology Initiative, was designed to address just such a challenge. When the J&J PRD researchers learned about this, they immediately became intrigued.

This confluence of need and opportunity occurred in late November 2009. Now, less than a year later, the benefit has become manifestly apparent. Instead of spending costly time building a foundation for the new biological infrastructure, J&J PRD was able to focus on delivering value-added functionality needed to facilitate development of innovative treatments that have the potential of improving the health and quality of life of patients around the world.

“By using MBF, we were able to provide our users with a greater level of functionality in less time to our users for our initial development phase in the large-molecule space.” says Jeremy Kolpak, J&J PRD senior analyst, who will be discussing his team’s MBF deployment during the 2010 eScience Workshop, being held in Berkeley, Calif., from Oct. 11-13, “It allowed us to focus on value-added functionality for our scientists and has helped us adapt to new requests quite easily.”

Such testimony brings a smile to the face of Simon Mercer, director of Health and Wellbeing for External Research, a division of Microsoft Research.

“The principal advantage of MBF,” Mercer says, “is that, because it’s free and open-source, as a programmer, you get a certain amount of prewritten functionality that you can just build on top of. It gives you more time to do the real science, because we’ve already supplied the basics.”

It didn’t take long for J&J PRD to grasp the implications of MBF.

“We were in the process of developing our own infrastructure to work with sequences,” Kolpak explains. “This was part of a larger move in our organization to improve how R&D with large molecules was performed and integrate that process with an existing and mature framework for working with small molecules.

“We have been using MBF from the day we heard of it.”

That is precisely the focus of the Health and Wellbeing effort within External Research: to collaborate openly with the bioinformatics community by applying advanced computing technologies to provide unprecedented insight into disease and human healthcare.

MBF, built on the Microsoft .NET Framework and aimed at making it easier to implement biological applications on the Windows platform, was launched in Boston on July 9 during the 11th annual Bioinformatics Open Source Conference. Since then, thousands of bioinformaticians have downloaded the tool kit.

“There are a lot of biologists who start as post-docs but don’t end up going into biological research themselves,” Mercer says. “They end up managing the data and writing the scientific applications that the biologists need to do research. They can be anywhere on the continuum between full biologists with no computing background to full computer scientists with little or no biological background.

“They work alongside the biological scientists, but they won’t necessarily be those scientists. They’ll write scripts and write programs to help the lab run, and they’ll also probably do some data analysis.”

Companies and academics that pursue such work, naturally, are more concerned with the value they can derive from using software tools than with building the tools themselves.

“I’ve heard it over and over again from executives of different pharmaceutical companies,” Mercer says. “Possibly 90 percent of their software stack has been developed in house but offers them no competitive advantage. The real crown jewels in bioinformatics are relatively small compared with the huge bulk of software they have to maintain.

“They’re often in a situation where they want to exchange data with other pharmaceutical companies on a pre-compete level, and they find that hard, because their processing pipelines are uniquely their own. A lot of commercial companies are looking for things like MBF to adopt as a common platform, so they are using the same tools, analyzing the data in the same way, and they are able to share data sets and cut costs.”

In other words, MBF helps make bioinformaticians’ work a bit simpler. That certainly appears to be the case at J&J PRD.

“We have integrated it into our data-analysis and -visualization platform, Third Dimension Explorer, which has been developed in house,” Kolpak says. “This platform is used in a multitude of different contexts.”

With regard to J&J PRD’s large-molecule exploration, he lists the ability to achieve five distinct tasks:
- View sequences with their associated assay data to see how variations across compounds impact targets.
- Align multiple sequences.
- View aligned sequences and their associated metadata, such as complementarity-determining regions.
- Extract and translate regions of sequences.
- Work with sequences of different formats to provide a generic platform for scientists to import and analyze them in one place.
“The goal,” Kolpak says, “is to capture operations that are performed routinely and make it extremely efficient to execute in one place. But at the same time, we are not trying to replace existing sequence-analysis tools for the more complex and less used operations.”

At Johnson & Johnson Pharmaceutical R&D, there are hundreds of users of the Third Dimension Explorer tool. The MBF-related development is still being completed and rolled out, but 40 people already are using the enhanced data-analysis platform—and deriving significant benefits.

“It’s hard to quantify the amount of time it has saved us,” Kolpak says, “due to the fact we work with an agile development methodology and, for each iteration, we are finding new functionality in MBF that we can utilize. I would say that, for our initial rollout, which required a large amount of framework implementation, it saved us around three months during a six-month initial development cycle.”

Biological work might not be the first thing that comes to mind when people think about Microsoft, but it supports such scientists nevertheless.

“Inside Microsoft Research, we’ve done lots of biology,” Mercer says. “It’s not what everybody would expect, but a lot of researchers apply their computer-science research in the biological domain for healthcare. How can you apply Microsoft technologies to scientific research? We often do that through collaborations with academics, where the academic brings the biology, in this case, and Microsoft brings the computer science. Together, hopefully, we advance further than either side would have done independently.

“Eventually, you have to ask yourself the question, ‘Why don’t we just build a platform so that all of the common elements are written once and don’t need to be written again for every single project?’ And once that platform exists, and it’s open-source and free, why not give it away to the community so it can benefit?”

There are specific ways in which MBF can assist in the biological domain, such as with modularity, extensibility, and code maintenance.

“Those sorts of things that professional programmers think of aren’t necessarily the first things in the minds of those who are writing scripts to support a lab,” Mercer continues. “MBF sits in the middle, with prewritten functionality in nice, digestible chunks, very standardized.”

There are quite a few other biological libraries akin to MBF already in use, some of them for a decade or more. But over time, they have grown unwieldy, making it hard to extend them. And they tend to be written in script-based languages that have no type checking. MBF, on the other hand, offers type checking and guarantees, and it’s built atop the common-language runtime, providing the flexibility to handle any of the more than 70 languages that work with .NET, thereby making easy for a heterogeneous community to use without having to conform to a single language.

“We’ve also wrapped the individual bits of MBF as workflow activities for our Trident workflow workbench,” Mercer adds, “which is also free and downloadable. “You don’t even have to be a programmer to use MBF. You can just drag and drop and connect the building blocks together to build workflow pipelines.”

External Research attempts to understand the precise scientific challenge encountered by its MBF partners, a methodology termed scenario-based development that identifies areas where MBF can be made more useful. That methodology will be a key component of the next wave of the tool’s enhancement.

“We’re approaching our partners in the academic community and the commercial world to define those scenarios,” Mercer says, “and that’s what’s driving the direction in MBF v2. We encourage the wider community—people who download the source code, understand it, and start developing their own extensions to support their own science—to participate, because the more of those we get, the more broadly we can develop MBF. It will grow by the actions of the community, to support the science that the community wants to support.”

That, in the example of J&J PRD, is exactly what is happening.

“A lot of what is on our wish list we have been developing in stride,” Kolpak says, “mainly a visualization tool for viewing sequences, in addition to some other sequence file-format supports that contain more than just sequence data. These are all things we plan to contribute back to the MBF development.”

And the community at which MBF is focused expects to use open-source code.

“If we want to run a project that would be recognizable and familiar in form to the academic community,” Mercer says, “then that would be a software-development project that is open-source, because open-source is a very common model there. We want to get contributions from as broad a set of people as possible.

“We want scientists to get a value out of using Windows,” he concludes. “We want scientists to pick up different tools that we have and understand that they can help them do their research more effectively and reach insights more quickly than they would otherwise manage to do. We’ve got a lot of value to offer in that area.”

The folks at Johnson & Johnson Pharmaceutical Research and Development couldn’t agree more.

“I am a software developer by trade,” Kolpak says, “and by using MBF, I have the confidence that what I am providing our users is not just solid code, but also that the science behind it is accurate.”

cloudcomputing_72x72 Studying the Breathing of the Biosphere

Researchers at University of California, Berkeley, work with Microsoft Research to analyze vast amounts of data without supercomputers.

eScience Workshop 2010

Related Events

Scaling the Science

About the Workshop

Highlights

Past award recipients

2014 award

2013 award

2012 award

2011 award

2010 award

2009 award

2008 award

2007 award