eScience Workshop 2009



New Book Expands on Jim Gray’s Vision

On October 16, The Fourth Paradigm: Data-Intensive Scientific Discovery was released at the eScience Workshop. Read more…

Jeff Dozier Accepts Jim Gray eScience Award

Dozier has gained a deep understanding of the role snowfall and snowmelt play in creating healthy ecosystems. Read more…

About the Event

The goal of this cross-disciplinary event is to bring together scientists from diverse research disciplines and give them the opportunity to share their research and their experiences of how computing is shaping their work—and thus to provide new insights into how data-driven computing is facilitating scientific discovery. The discussion will center on how we can take advantage of “big data” and create computing technologies that enable scalable solutions that address a broad range of scientific challenges.

The event includes the presentation of the second Jim Gray eScience award. The award is presented to a researcher who has made an especially significant contribution to the field of data-intensive computing.


Tony Hey

Keynotes and Presentations

Friday, October 16, 2009

Click the linked titles to view the presentation videos.


The Fourth Paradigm: Realizing Jim Gray’s Vision for Data-Intensive Scientific Discovery

View Presentation by, Tony Hey, Jeff Dozier, Ben Shneiderman, Timo Hannay, Christopher Southan

Chair: Daron Green

eOcean: The Challenges Ahead and Below the Waves

View Presentation by, Ellen Prager, President, Earth2Ocean, Inc

Chair: Harold Javid

Parallel Track Sessions

Genomics | Chair: Iain Buchan

Computational Methods for Large-Scale DNA Data Analysis

Xiaohong Qiu, Jaliya Ekanayake, Geoffrey C. Fox, Thilina Gunarathne, Scott Beason – Indiana University

Tools for Scalable Genome Haplotying in the Windows Azure Cloud

Girish Subramanian – Indiana University

Yogesh Simmhan – Microsoft Research

eResearch Services & Applications | Chair: Lee Dirks

Generating Intelligent Multimedia Presentations from Semantic Mashups Using OAI-ORE and SMIL

Jane Hunter, Anna Gerber – The University of Queensland

Comment by Sketch: A Picture Says a Million Words

Stephen Wilson, Jeremy Frey – University of Southampton

eScience Topics | Chair: Yan Xu

Observing Round-the-Clock Expulsion of Matter from a Black Hole: Global Jet Watch

Katherine Blundell – Oxford University

Towards Complete Functional and Structural Imaging of Cortical Circuits

Clay Reid, Davi Bock, Wei-Chung Reid – Harvard Medical School

Advances in Software Support | Chair: Judith Bishop

Software Support for Hybrid Computing

Shujia Zhou – University of Maryland, Baltimore County

Scaling Simulations Through Declarative Processing

Alan Demers, Oliver Gao, Johannes Gehrke, Christoph Koch, Marcos Vaz Salles, Walker White – Cornell University

Tools for Biology | Chair: Simon Mercer

Enhancing BLAST Comprehension with SilverMap

Peter Ansell, Lawrence Buckingham, Xin-yi Chua, James Hogan, Scott Mann, Paul Roe – Queensland University of Technology

Web Service Extension of Computational Biology Application Suite

Robert Bukowksi, Jaroslaw Pillardy – Computational Biology Service Unit, Cornell University

Systems for Author & Document Analysis | Chair: Lee Dirks

Scalable Solution for a Comprehensive Appraisal of Contemporary Documents

Rob Kooper, Peter Bajcsy, Kenton McHenry – National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign

Author Identity and Social Networking at arXiv

Simeon Warner, Nathan Woody, Paul Ginsparg – Cornell University

Thorsten Schwander – Los Alamos National Laboratory

Earth & Environment | Chair: Stewart Tansley

Listening to Nature: Acoustic Monitoring of the Environment

Michael Towsey, Birgit Planitz, Paul Roe, Jiro Sumitomo, Ian Williamson, Jason Wimmer, Jinglan Zhang – Queensland University of Technology

3D Geographical Environments and Geospatial Data Exploration Using Flight Simulators and Geodatabases

Alexandra Diehl, Horacio Abbate, Marta Mejail, Mercedes Sanchez – Universidad de Buenos Aires

Claudio Delrieux – Universidada Nacional del Sur, Argentina

Provenance Management for Scientific Data | Chair: Dean Guo

Simple Provenance in Scientific Databases

Nolan Li, Alexander Szalay – Johns Hopkins University

Provenir Ontology: Towards a Framework for eScience Provenance Management

Satya Sahoo, Amit Sheth – Wright State University

eScience Research at Microsoft | Chair: David Heckerman

The Vedea Visualization Language

Martin Calsyn – Microsoft Research

Tools and Techniques for Computational Biology

Carl Kadie – Microsoft Research

Performing Science in the Cloud: A Haplotype Phasing Case Study

Yogesh Simmhan – Microsoft Research

Collaboration | Chair: Michael Zyskowski

Generalized eScience Collaboration Through SharePoint

Marty Humphrey – University of VirginiaCatharine Van Ingen – Microsoft Research

Deb Agarwal – Lawrence Berkeley National Laboratory

Collaborative Data Analysis with Taverna Workflows

Andrea Wiggins, Kevin Crowston – Syracuse University

Lessons from myExperiment: Research Objects for Data Intensive Research

David De Roure – University of Southampton

Carole Goble – University of Manchester

Semantic Repositories and Modeling | Chair: Evelyne Viegas

Facilitating Next Generation Data Intensive Science Using Semantic Technologies

Deborah McGuinness, Peter Fox – Rensselaer Polytechnic Institute

Integrating Streaming Data and Semantic Repositories

Alejandro Rodriguez, Yong Liu, James Myers – University of Illinois, Urbana Champaign

Using Multipartite Graphs for Recommendation and Discovery

Michael Kurtz, Edwin Henneken, Alberto Accomazzi – Harvard-Smithsonian Center for Astrophysics

Data Intensive Computing | Chair: Peter Lee

Data Intensive Scalable Computing: Applying Google-Style Computing to eScience

Randal E. Bryant – Carnegie Mellon University

Understanding and Maturing the Data-Intensive Scalable Computing Storage Substrate

Garth A. Gibson – Carnegie Mellon University

Low Power Amdahl Blades for Data Intensive Computing

Alexander Szalay, Andreas Terzis, Alainna White, Jan Vandenberg – Johns Hopkins University

Gordon Bell, Jose Blakeley – Microsoft

Howie Huang – George Washington University

Saturday, October 17, 2009

Click the linked titles to view the presentation videos.


Some Challenges in Natural Science

View Presentation by Stephen Emmott

Chair: Tony Hey

From Flops to Petabytes: the Expanding Role of Data in NSF Cyberinfrastructure

View Presentation by Ralph Roskies and Michael Levine

Chair: Peter Lee

Parallel Track Sessions

Bioinformatics | Chair: Yogesh Simmhan

Bayesian Network Modelling of the Factors Influencing the Development of Obesity

David Hoyle, Nicholas Harding, Iain Buchan – University of Manchester

Navigating Across Multi-Scale Biological Systems Using Cognitive Modeling and Systems Approaches Conceptualized in Product Engineering

Nagasuma Chandra – Indian Institute of Science

Satish Chandra – National Aerospace Laboratories

Improvements in the Determination of Fundamental Principles of RNA Structure and the Prediction of RNA Secondary and Tertiary Structure

Robin Gutell, David Gardner – University of Texas, Austin

Stuart Ozer – Microsoft

Development in Search & Semantics for Chemistry | Chair: Daron Green

Navigating the Complex Web of Chemistry Using ChemSpider

Antony Williams, Valery Tkachenko – Royal Society of Chemistry

The oreChem Project: Integrating Chemistry Scholarship with the Semantic Web and Web 2.0

Carl Lagoze – Cornell University

Prasenjit Mitra, William Brouwer – Penn State University

Mark Borkum – University of Southampton

Integrative Data Mining For Drug Discovery Using the Semantic Web

Qian Zhu, Marlon Pierce, Geoffrey C. Fox, David J. Wild – Indiana University

Michael S. Lajiness – Eli Lilly and Company

Extraction of NMR Spectra and Structural Data from Documents for Semantic Representation and Reuse

Mark Borkum – University of Southampton

William Brouwer – Penn State University

Democratizing Semantics for the Scientist | Chair: Evelyne Viegas

Social and Semantic Computing to Support Citizen Science

Joel Sachs, Tejas Lagvankar, Tim Finnin – University of Maryland-Baltimore County

Tupelo: a Framework for E-Science Knowledge Spaces

Joe Futrelle, Jeff Gaynor, Joel Plutchak, James Myers, Robert McGrath – National Center for Supercomputing Applications

Data Integration for E-Science Using Correlated Concepts

Lushan Han, Tim Finin, Anupam Joshi, Yelena Yesha – University of Maryland, Baltimore County

Cloud Computing | Chair: Dean Guo

Construction of a MODIS Scientific Data Reprojection and Reduction Pipeline in Windows Azure Platform

Jie Li – University of Virginia

Youngryel Ryu – University of California, Berkeley

Keith Jackson, Deb Agarwal – Lawrence Berkeley National Laboratory

Catharine Van Ingen – Microsoft Research

Cloud Workflow Service: Automatically Scaling Scientific Workflows on Demand

Cui Lin – Wayne State University

Roger Barga, Dean Guo, Jared Jackson – Microsoft External Research

Cloud Computing for Planetary Defense

Steven Johnston, Kenji Takeda, Hugh Lewis, Simon Cox, Graham Swinerd – University of Southampton

Medical Imaging | Chair: Stewart Tansley

Segmentation of Confocal Stacks and Synaptic Regions for the Connectome Project

Amelio Vazquez-Reina, Won-Ki Jeong, Roanna Ruiz, Bo Wang, Hanspeter Pfister – Harvard University

Eric Miller – Tufts University

A Web-Based System for Biomedical Image Storage, Annotation, Content-Based Retrieval and Exploration

Jorge Camargo, Juan Caicedo, Angel Cruz Roa, Eduardo Romero, Clara Spinel, Fabio Gonzalez – Universidad Nacional de Colombia

David Seligmann, Jessica Forero – Politecnico Grancolombiano

Incorporating Problem-Based Learning in Medical Image Processing: A Case Study on Computing-Centric Engineering Education

Chang Quo, May Wang – Georgia Institute of Technology

Global Public Health | Chair: Kristin Tolle

Current Status of e-Health in Peru

Lady Murrugarra – Universidad Peruana Cayetano Heredia

Model-Driven Support for a Vaccine Study in Kathmandu

Jeremy Gibbons, Jim Davies, Steve Harris, Jane Metz, Matthew Snape, Andrew Pollard – University of Oxford

Advances in Data Intensive Science | Chair: Yan Xu

A New Partnership for Cross-Scale, Cross-Domain eScience

Bill Howe – University of Washington

Extracting Natural Laws from Data: Invariants Are Better Than Predictive Models

Michael Schmidt, Hod Lipson – Cornell University

From Flops to (Peta)Bytes: the Expanding Role of Data in Scientific Analysis

Nick Nystrom – Pittsburgh Supercomputing Center

Biomedical Modeling | Chair: Kristin Tolle

SysMO-DB: Just Enough Exchange for Systems Biology Data and Models

Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs – University of Manchester

Wolfgang Müller, O. Krebs, Isable Rojas – EML

Jacky Snoep – University of Stellenbosch

Healthcare e-Labs: Opening and Integrating Models of Health

Iain Buchan, Carole Goble, David Hoyle, John Ainsworth, Mark Delderfield, Gareth Smith, Lee Kitching – University of Manchester

John Winn, Christopher Bishop – Microsoft Research

Computational Thinking Research | Chair: Tom McMail

How Optimized Environmental Sensing Helps Address Information Overload on the Web

Carlos Guestrin – Carnegie Mellon University

Computational Thinking for a Modern Kidney Exchange

Tuomas Sandholm – Carnegie Mellon University

Parallel Thinking

Guy Blelloch – Carnegie Mellon University

Data Driven Environmental Science | Chair: Christophe Poulain

Opportunities for Integration of Multi-Model Data for the Water Sciences Community

Yong Liu, David Hill, Barbara Minsker – National Center for Supercomputing Applications

Modelling Data-Driven CO2 Sequestration Using Distributed HPC CyberInfrastructure

Yaakoub El Khamra – University of Texas at Austin

Shantenu Jha – Louisiana State University

Beyond Sensors: Curating Ancillary Data for Carbon-Climate Science

Catharine Van Ingen – Microsoft Research

Marty Humphrey – University of Virginia

Deb Agarwal – Lawrence Berkeley National Laboratory

The Fourth Paradigm

The Fourth Paradigm: Data-Intensive Scientific Discovery

Presenting the first broad look at the rapidly emerging field of data-intensive science

Purchase from

fpcover-fullIncreasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.

The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud computing technologies.

In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized.

Critical praise for The Fourth Paradigm

“The individual essays—and The Fourth Paradigm as a whole—give readers a glimpse of the horizon for 21st-century research and, at their best, a peek at what lies beyond. It’s a journey well worth taking.”

James P. Collins
School of Life Sciences, Arizona State University

From the back cover

“The impact of Jim Gray’s thinking is continuing to get people to think in a new way about how data and software are redefining what it means to do science.”

Bill Gates, Chairman, Microsoft Corporation

“I often tell people working in eScience that they aren’t in this field because they are visionaries or super-intelligent—it’s because they care about science and they are alive now. It is about technology changing the world, and science taking advantage of it, to do more and do better.”

Rhys Francis, Australian eResearch Infrastructure Council

“One of the greatest challenges for 21st-century science is how we respond to this new era of data-intensive science. This is recognized as a new paradigm beyond experimental and theoretical research and computer simulations of natural phenomena—one that requires new tools, techniques, and ways of working.”

Douglas Kell, University of Manchester

“The contributing authors in this volume have done an extraordinary job of helping to refine an understanding of this new paradigm from a variety of disciplinary perspectives.”

Gordon Bell, Microsoft Research

Microsoft Research is honored to provide initial website hosting for this book launch.

New Book Expands on Jim Gray’s Vision

By Rob Knies

October 16, 2009 7:00 AM PT

During the 2009 Microsoft eScience Workshop, held Oct. 15-17, Jeff Dozier, professor in the Donald Bren School of Environmental Science & Management at the University of California, Santa Barbara, was named winner of the second annual Jim Gray eScience Award. As a leader in his field, Dozier is a contributor to The Fourth Paradigm, a new book published by Microsoft Research, announced Oct. 16. The External Research Division announced the availability of this book, in which a collection of academic visionaries and Microsoft researchers discuss the implications of Gray’s Fourth Paradigm in science. He postulated that data exploration, or, as he termed it, eScience, is the evolutionary next step in scientific exploration, following the original, empirical stage and its subsequent theoretical and computational phases. Gray, a Turing Award-winning computer scientist, was lost at sea in late January 2007 while sailing his 40-foot yacht Tenacious.

The Fourth Paradigm, the book, was edited by Tony Hey, corporate vice president of the External Research Division, along with Stewart Tansley, a senior research program manager in Hey’s group, and Kristin Tolle, a director in the same group. The book features a total of 70 authors, 43 of them from outside Microsoft, representing 20 separate institutions. Hey recently took a few moments to discuss Dozier’s award and the new book, dedicated simply as “For Jim.”

Q: What does the concept of the Fourth Paradigm mean to you personally, and what is its importance to Microsoft?

Tony Hey: The Fourth Paradigm is something that Jim Gray realized after working with a variety of scientists—biologists, chemists, physicists, astronomers, and engineers. It became clear to Jim that their problems were as much about data as about computation and that they needed new skills to manipulate, visualize, and manage large amounts of scientific data.

It was Ken Wilson, Nobel Prize winner in physics, who coined the phrase Third Paradigm to refer to computational science and the need for computational researchers to know about algorithms, numerical methods, and parallel architectures. The skills needed for manipulating, visualizing, managing, and, finally, conserving and archiving scientific data are very different.

The Fourth Paradigm is also an opportunity for Microsoft, because we have technologies that can democratize the way we do science in the future. We can have usable, extensible, and interoperable technologies that can really make a difference to the lives of working scientists. I think Jim’s emphasis on it being a new paradigm is really right.

Q: Why are you releasing the book? What do you hope to gain from this effort?

Hey: Much of the focus of both funding agencies and the computer-science community is on the need for more computational power, going from petaflops to exaflops. While this focus on computation is clearly important, I think it is also important to show that there is a need for a second focus, on the technologies required for data-intensive science.

In this book, we have paired distinguished scientists with computer scientists to give their vision of how they see their fields being transformed in the next five years. In many cases, some research fields genuinely will go from being data-poor to data-rich during this time frame. This will present scientists with new challenges and the need to manipulate, visualize, and combine data sets. In some ways, the book complements the vision contained in Towards 2020 Science, an influential report from Microsoft Research Cambridge that was the brainchild of Stephen Emmott, head of Computational Science in our Cambridge lab.

In addition to highlighting the requirements of data-intensive science, I firmly believe that Microsoft can make a great contribution to helping scientists in their research by raising the level of abstraction—so that they do not need to write lots of low-level scripts to manipulate their data.

The other revolution the book talks about is in scholarly communication. At the moment, when you want to get data from a scientific paper, you often have to actually take a ruler to the published paper and directly measure the data points on a graph. In future electronic versions of a scientific paper, you should be able to click on a point on a graph and go directly to the data or click on the curve and go to the program that produced the curve.

Documents can and will be much more interactive in the future. In addition, besides links to the data, there will also be many types of contextual information associated with a peer-reviewed paper, such as wikis, blogs, and social networks. Just as there is a revolution happening in data-intensive science, there is also a revolution happening in scholarly communication.

Q: Why are you making the book available for free, and why is it being published under a Creative Commons license?

Hey: We wish this book to be maximally useful and widely cited, and what better way to achieve this than making all of the content available free, for reuse under a Creative Commons license?

We want to spread the debate to the largest possible audience, and we hope that this will be a way of broadcasting the content and generating the widest possible circulation of the book. Wherever you are, you should be able to get hold of a copy; you can download it from the Web, get a print-on-demand version, or maybe download a version for the Amazon Kindle or the Sony Reader.

Q: Is Microsoft Research working on any projects mentioned in the book?

Hey: Microsoft Research is working on quite a few of them, but not all of them. Many articles have paired a research scientist with a Microsoft researcher, but others have no specific Microsoft connection.

The book is not intended to be specifically about Microsoft projects, although some of our projects are used as illustrative exemplars. Our projects are only used to illustrate and are not meant to be definitive.

Q: Many of the authors represented in the book are from universities or other parts of the research community. Do you feel this book is typical of how Microsoft Research collaborates with that community?

Hey: To a large extent, yes. We wanted the contributors to be leaders of the field who were capable of looking a few years into the future and who could give a credible vision of how their fields would develop as a result of imminent advances in IT. The authors are pretty special individuals and are, indeed, typical of the sort of scientists with whom we want to engage.

Q: How does the finished product compare to your original vision?

Hey: It’s pretty close. I have to say that I am very pleased with the way the book has turned out, and I think it succeeds in being both interesting and informative. We asked people to write essays who don’t usually write essays, in order that their contributions would be readable, even by non-experts in their fields. I think all the articles support the prescience of Jim’s vision that, in the future of many research fields, the manipulation and management of scientific data will be the key bottleneck.

One of the particularly good things about the book, to my mind, was Gordon Bell’s insistence that the introductory article should be produced from the transcript of Jim Gray’s last talk, to the National Research Council’s Computer Science and Telecommunications Board, given two weeks before he disappeared. The talk was all about Jim’s vision for data-intensive-science and the scholarly-communication revolution, and I do not think we could have had a better introduction.

Q: How does the winner of this year’s Jim Gray eScience Award reflect the contributions made by Jim’s work?

Hey: We want the winner of the Jim Gray eScience Award to epitomize Jim’s understanding of the importance of data-intensive science. Alex Szalay [Alumni Centennial Professor in the Department of Physics and Astronomy at Johns Hopkins University] received a Lifetime eScience Award from us for his contributions to data-intensive science a year before the award was renamed the Jim Gray eScience Award. Alex was a longtime collaborator of Jim’s, and his research on such things as the Sloan Digital Sky Survey typifies the sort of significant contribution we are looking for in a Jim Gray eScience Award winner.

Carole Goble [professor of Computer Science at the University of Manchester and winner of the first Jim Gray eScience Award] is an expert database researcher who has been applying her computer-science skills to problems in biology, in projects such as myGrid and myExperiment. I believe that Jim would thoroughly have approved of her role in developing powerful workflow and provenance-tracking technologies with the biologists.

Jeff Dozier, this year’s winner, is from the environmental-science community. His article in the book [The Emerging Science of Environmental Applications] is particularly interesting, because he talks about how environmental sciences in the ’80s were split into geophysics and other small disciplines. Then the community realized these subfields all overlapped and interacted with each other, so in the ’90s, the field evolved to become earth-systems science.

Jeff is now calling for a science of environmental applications. Scientists now have to use their knowledge to solve problems that the world cares about. Scientists need to use all their scientific-research data for a specific action, to try to help solve or alleviate the problems of climate change and global warming.

On a personal note, Jeff also was enormously helpful to me way back in 2001, when I was leading the U.K. eScience program. The U.K. Natural Environmental Research Council set up an eScience committee and asked Jeff to be its chairman. In this way, Jeff was able to have enormous positive impact on the U.K. eScience program He is definitely a worthy winner of the Jim Gray eScience Award!

Q: If Jim were still with us today, what would he think about the book?

Hey: I hope Jim would be extremely pleased. It is really a validation of his vision for data-intensive science.

Workshop Tutorials

On Thursday, October 15, 2009, in collaboration with Carnegie Mellon University, Microsoft Research hosted four tutorials that are designed to facilitate scientific discovery by extending the reach and utility of eScience efforts. Below are brief descriptions of the eScience research technologies presented in these tutorial sections, followed by links to the full video presentations.

Session M1: Project Trident: A Scientific Workflow Workbench

Dean Guo and Yan Xu, Microsoft ResearchKeith Grochow, University of Washington

Workflow is a major component of almost every major science project today, covering a wide range of domains. Project Trident leverages Windows Workflow Foundation (WWF)—which is part of the Microsoft .NET Framework—for core workflow support and implements only the functionality required for scientific workflow. Project Trident will be an open source scientific workflow workbench. Currently, the binary is available for free download at Project Trident: A Scientific Workflow Workbench.

This tutorial describes Project Trident architecture, design, and key features. It helps you use the Trident scientific workflow workbench to accomplish the following:

  1. Author workflows and customize workflow activities
  2. Manage workflows on a desktop and scale out to Windows HPC (High Performance Computing) clusters
  3. Provide runtime services, such as provenance and workflow runtime monitoring
  4. Manage workflow versioning and personalized workflow catalog
  5. Expose Project Trident runtime as a Web service and run workflows from a Microsoft Silverlight-enabled browser or other application (such as Microsoft Office Word) for reproducible research
  6. Use myExperiment as a portal for sharing workflows
  7. Use Microsoft Project Trident Connection Point to request features, bug fixes, and share best practices.

View tutorial on Project Trident: A Scientific Workflow Workbench

Session M2: Microsoft Cloud Computing Frameworks for Research

Jared Jackson and Christophe Poulain, Microsoft ResearchSimon Woodman and Hugo Hiden, Newcastle University

Computing-enabled scientific and engineering research has emerged as the third pillar of the scientific process, complementing theory and experiment. The challenge of satisfying the ever-rising demand for research computing and data management—the enabler of scientific discovery continues to grow. Fortuitously, the emergence of cloud computing—software and services hosted by networks of commercial data centers and accessible over the Internet—offers a solution to this conundrum.

This tutorial describes cloud technologies that broaden the scientific and research community’s access to data and compute-intensive resources. We start by providing an overview of cloud computing today. Then we examine cloud application frameworks by looking at Dryad and DryadLINQ and Microsoft Azure. Throughout the tutorial, scientific examples illustrate the potential applications.

View tutorial on Microsoft Cloud Computing Frameworks for Research

Session A1: Tools to Support e-Research – Microsoft Research and the Scholarly Information Ecosystem

Oscar Naim and Lee Dirks, Microsoft Research

Microsoft External Research strongly supports the process of research and its role in the innovation ecosystem, including developing and supporting efforts in open access, open tools, open technology, and interoperability. Microsoft External Research collaborates with universities, national libraries, publishers, and governmental organizations to help develop tools and services to evolve the scholarly information lifecycle. These projects demonstrate our ongoing work towards producing next-generation documents that increase productivity and empower authors to increase the discoverability and appropriate re-use of their work.

This workshop provides a deep view into several freely available tools from Microsoft External Research and demonstrates how they can help supplement and enhance your e-research. The hands-on component of this session helps you gain a deeper technical understanding of the available toolset, which includes the following resources:

  • Research Information Centre (RIC): An online virtual research environment for collaborative work
  • Tools for authors
    • Structured document authoring (based on the NLM-DTD)
    • Ontology integration and markup
    • Repository search integration
    • ORE resource map authoring
    • Chem4Word
    • Article repository submission workflow (via REST and SWORD interfaces)
  • Zentity: A research-output repository platform
    • Version 1.0 is available
  • Other related services
    • Bing Translator (Web service)
    • Document/file format conversion (Web service)

View tutorial on Tools to Support e-Research

Session A2: The Microsoft Biology Initiative: An Open Source Framework and Toolset for Bioinformatics Research

Michael Zyskowski, Simon Mercer, and Jared Jackson, Microsoft ResearchChris Wu, Carnegie Mellon UniversityJim Hogan and Lawrence Buckingham, Queensland University of TechnologyJaroslaw Pillardy and Robert Bukowski, Cornell University

The Microsoft Biology Initiative (MBI) has two distinct components: The Microsoft Biology Framework (MBF) and the Microsoft Biology Toolset (MBT). MBF is a set of Microsoft .NET assemblies implementing file parsers and writers for common formats, common algorithms, and access to a set of common Web services used in bioinformatics—specifically in the domains of DNA sequencing, assembly, annotation, and analysis. It is a framework that provides a common object model for the representation, analysis, and visualization of DNA, RNA, and protein sequences. MBT is a set of tools, some of which are already built upon MBF, for directed scientific analysis and discovery of relationships that are related to the human genome. The tools and framework comprise a set of extensible, open source technologies that will enable Microsoft and third parties to conduct rich genomic science research on the Windows platform. This workshop is intended to introduce the audience to the foundational components of the MBF; show how the framework can be extended to address specific scientific analysis problems; and to provide an overview of the underlying code, which can be extended into areas not yet addressed via the open-source nature of the project.

View tutorial on the Microsoft Biology Initiative