The Gateway to Biological Pathways: A Platform to Enable Semantic Web-Based Biological Pathway Datasets


October 7, 2005


Keyuan Jiang




Biological pathways represent our current understanding of biological processes. A large amount of biological pathway data has been accumulated either by curation of the scientific literature or by automatic machine inference of high-throughput laboratory experiments. There exist over 180 biological pathway databases, covering metabolic pathways, signal transductions, protein-protein interactions, and regulatory pathways. The data have been collected by diverse research organizations with particular interests, various techniques, incompatible schemas, and different access methods. Biologists utilize the pathway data to formulate hypotheses, verify experiment results, and share research outcome. Due to the incompatibility, depth and breath of database coverage, it is not uncommon for biologists to query multiple datasets, a time-consuming and error-prone process, to address intriguing biological problems.

The Gateway to Biological Pathways project leverages the BioPAX standard in storing and providing pathways datasets consumable by Semantic Web applications, and offers a unified interface to query biological pathway data. The proposed BioPAX standard provides a common format for exchange biological pathway datasets. The BioPAX ontology, written in W3C recommended Web Ontology Language (OWL), supports the vision of Semantic Web. With the BioPAX, a pathway is composed of a number of entities and relationships among the entities.

In the Gateway application, the pathway entities are the basic data unit that is naturally stored in its XML format in a native XML datatype column of a SQL Server 2005 database. The support of native XML format eases the database design by which the number of tables can be reduced while relationships among pathway entities are still maintained. Storing XML data in XML datatype column provides an efficient way of accessing and processing data. The XQuery provided facilitates diverse searching functionality with the XML datasets. The Gateway application provides a Web service by which biological pathways can be queried and the data returned are of BioPAX format. In addition, the HTTP GET and POST methods are implemented for directly querying the pathway data. The pathway datasets of E. coli and Human from BioCyc are currently available at the Gateway, and more data are to be added. A client capable of consuming the BioPAX format data is being developed for visualizing and navigating biological pathways.


Keyuan Jiang

Dr. Keyuan Jiang received his Ph.D. in Biomedical Engineering from Vanderbilt University, Nashville, Tennessee, and is Assistant Professor of Computer Information Systems and Information Technology at Purdue University Calumet, Hammond, Indiana. Dr. Jiang has conducted a number of research projects in the area of computer applications in biomedicine, ranging from the knowledge-based system for synthetic gene design, bedside graphical nursing charting system, to the communication log system for clinical studies. His current interests are focused on Semantic Web in life sciences and bioinformatics Web services. Dr. Jiang is a member of IEEE Engineering in Medicine and Biology Society, and is serving on the editorial board of IEEE Transactions on Information Technology in Biomedicine. As a faculty member at Purdue University Calumet, he has taught courses of software development and bioinformatics. Prior to his current position, Dr. Jiang was a Technical Advisor at two private companies in delivering e-business solutions using Microsoft technologies.