Biological pathways represent our current understanding of biological processes. A large amount of biological pathway data has been accumulated either by curation of the scientific literature or by automatic machine inference of high-throughput laboratory experiments. There exist over 180 biological pathway databases, covering metabolic pathways, signal transductions, protein-protein interactions, and regulatory pathways. The data have been collected by diverse research organizations with particular interests, various techniques, incompatible schemas, and different access methods. Biologists utilize the pathway data to formulate hypotheses, verify experiment results, and share research outcome. Due to the incompatibility, depth and breath of database coverage, it is not uncommon for biologists to query multiple datasets, a time-consuming and error-prone process, to address intriguing biological problems.
The Gateway to Biological Pathways project leverages the BioPAX standard in storing and providing pathways datasets consumable by Semantic Web applications, and offers a unified interface to query biological pathway data. The proposed BioPAX standard provides a common format for exchange biological pathway datasets. The BioPAX ontology, written in W3C recommended Web Ontology Language (OWL), supports the vision of Semantic Web. With the BioPAX, a pathway is composed of a number of entities and relationships among the entities.
In the Gateway application, the pathway entities are the basic data unit that is naturally stored in its XML format in a native XML datatype column of a SQL Server 2005 database. The support of native XML format eases the database design by which the number of tables can be reduced while relationships among pathway entities are still maintained. Storing XML data in XML datatype column provides an efficient way of accessing and processing data. The XQuery provided facilitates diverse searching functionality with the XML datasets. The Gateway application provides a Web service by which biological pathways can be queried and the data returned are of BioPAX format. In addition, the HTTP GET and POST methods are implemented for directly querying the pathway data. The pathway datasets of E. coli and Human from BioCyc are currently available at the Gateway, and more data are to be added. A client capable of consuming the BioPAX format data is being developed for visualizing and navigating biological pathways.