Trident: Scientific Workflow Workbench for Oceanography
We are at the beginning of a new era of “e-science.” Advances in technology are transforming discovery in nearly all scientific fields, in two key ways. First, massive experiments are being carried out to simulate the real world using computer systems with thousands of processors. Second, large numbers of sensors are being deployed to gather data on the sea floor , on glaciers in the Swiss Alps , and across the landscape on which we walk . These approaches share a common trait: they produce enormous amounts of data that must be captured, transported, stored, accessed, visualized and interpreted to extract knowledge. This computational knowledge extraction is at the heart of 21st-century discovery.
Scientific workflows are proving to be the preferred vehicle for computational knowledge extraction and for enabling science at a large scale. Workflows provide a scientist with a useful and flexible method to author complex data analysis pipelines composed of heterogeneous steps ranging from data capture from sensors or computer simulations to data cleaning, to transport and storage, and provide a foundation upon which results can be analyzed and validated. However, building and maintaining robust scientific workflows systems is proving to be extremely costly, and the long-term sustainability of academic research prototypes is an open question. In this project we are implementing a scientific workflow workbench on top of a commercial workflow engine, specifically Microsoft Windows Workflow , to leverage existing functionality. Trident, our scientific workflow workbench, only implements the functionality and services required for scientific workflow management. In doing so it offers a robust platform for science groups to spend more time on their science and less time writing code.
Scientific Workflow for Project NEPTUNE
NEPTUNE is the first Regional Cabled Observatory, on the Juan de Fuca plate off the coast of Washington. NEPTUNE will place thousands of chemical, geological and biological sensors on 2000 kilometers of fiber optic cable on the sea floor, continuously streaming data back to shore for analysis. NEPTUNE will transform oceanography from a data-poor to a data-rich science. It will help unlock secrets about the ocean’s ability to absorb greenhouse gases, and about how stresses on the seafloor cause earthquakes and tsunamis along Pacific coastlines.
Trident is part of a collaborative project between The University of Washington, Monterey Bay Aquarium Research Institute and Microsoft, to provide Project NEPTUNE with a scientific workflow workbench for Oceanography. Trident, implemented on top of Windows Workflow Foundation, allows scientists to explore and visualize oceanographic data in real-time and provides an environment to visually compose, run and catalog workflows. Trident uses Microsoft Silverlight , which is a freely-downloadable cross-browser, cross-platform, and cross-device plug-in for delivering .NET-based applications over the Web, so a scientist using Windows, MacOS or Linux can use Trident to compose, run and catalog experiments from any web browser.
Other features in Trident for data-intensive research include: automatic provenance capture, "smart" re-running of different versions of a workflow, on-the-fly updatable parameters, cost estimation of the resources a workflow will require, monitoring of long-running tasks, and support for fault-tolerance and recovery from failures.
We are exploring how Trident can address real-world issues that oceanographers encounter in attempting to turn a sea of data streaming from sensors in the ocean into visualizations and data products to support their research. In addition, we are identifying exactly how to leverage a commercial workflow enactment engine to support scientific workflows. Highlights of the project are described in the remainder of this section, but this list of features we will demonstrate is not exhaustive.
Easy and Rapid Ad-Hoc Workflow Design
We will allow scientists to visually author workflows using a catalog of existing activities and complete workflows, using only a web browser. The authoring of a workflow is an important aspect of any workflow system, as it allows the researcher to specify both steps and control dependencies in the data analysis pipeline. Easily finding and adapting an existing workflow is key to effective workflow prototyping. As the end users for Trident are not seasoned programmers, it offers a graphical interface that enables visual programming, as well as a web-based portal for authoring and launching workflows via a browser. The web-based portal employs Silverlight so researchers running a Firefox or Mozilla browser on Linux or Mac OS can use our system. Trident also provides a tiered library that hides the complexity of different workflow activities and services for ease of use.
System-wide Registry for Sensors to Services
We provide a registry, which can include all objects of interest for Project Neptune, ranging from individual sensors in the ocean, Web Services providing access to data and models, to workflows, and even versioned results from running a specific workflow.
There are an increasing number of tools and databases in the sciences available as Web Services. As a result, researchers are not merely faced with a data deluge but also face a service deluge and need a tool to organize, curate and search for services of value to their research. Trident provides a registry that enables the scientist to include services from his or her particular domain. The registry enables a researcher to search on tags, keywords and annotations to see what services are available. Semantic tagging enables researchers to find a service based on what it does, or is meant to do, and what it consumes as inputs and produces as outputs. Annotations allow a researcher to understand how to operate it, configure it correctly; the registry records when and by whom a service was created, its version history and tracks its version. The Trident registry service also includes a harvester that automatically extracts WSDL for a service, to allow scientists to use any service as it was presented. Users simply provide the URI of the service, and the harvester extracts the WSDL and creates an entry in the registry for the service. Curation tools are available to review and semantically describe the service before moving it to the public area of the registry.
An increasing number of scientific tools and databases are available as Web services. As a result, researchers are faced with both a data deluge and a service deluge. They need a tool to organize, curate, and search for services of value to their research. Trident provides a registry that enables scientists to include services from their particular domains. The registry enables researchers to search on tags, keywords, and annotations to locate available services.
Visualization of Oceanographic Data
One of the primary goals of Trident is to convert raw sensor data into useful data products, in particular visualizations. COVE  is a tool that provides visualization of ocean data. We enable an oceanographer using COVE to create on-demand visualizations by invoking workflows on Trident. Together, Cove and Trident contribute to Jim Gray’s vision of an “Ocean Scientists’ Workbench” to enable collaborative research.
Over the course of the next several months, our team will be working to extend the functionality of both Trident and COVE to meet the requirements of the ocean scientists, and provide general scientific workflow support for any science project. During this time we plan to present regular demos and presentations on our project to share ideas and gather input from researchers and possible end users. In March, we demonstrated Trident and COVE at Microsoft Research's TechFest 2008, the company’s annual showcase of emerging technologies unveiling more than 100 innovations. In July 2008, we will demonstrate the latest version of Trident at IEEE 2008 Second International Workshop on Scientific Workflows, to get feedback and input from the broader research community. We will present more detailed information and progress as our collaboration proceeds, including plans to release the Trident software as a research development kit (RDK) for other science projects to freely use.
 Project NEPTUNE (neptune.washington.edu)
 Swiss Experiment (swiss-experiment.ch)
 Life Under Your Feet (LifeUnderYourFeet.org)
 Microsoft Windows Workflow Foundation (Wikipedia.org)
 COVE Oceanographic Visualization Workbench (cs.washington.edu)