A Platform for Computational Comparative Genomics on the Web

Date

October 7, 2005

Speaker

Sun Kim

Affiliation

Indiana University

Overview

We have been developing a Web-based system for comparing multiple genomes, PLATCOM, where users can choose genomes and perform analysis of the selected genomes with a suite of computational tools. PLATCOM is built on internal databases such as GenBank, COG, KEGG, and Pairwise Comparison Database (PCDB) that contains all pairwise comparisons (97,034 entries) of protein sequence files (.faa) and whole genome sequence files (.fna) of 312 replicons. The pre-computed PCDB makes it possible to complete genome analysis very fast even on the web, so that users can choose any combination of genomes and analyze them with data mining tools. Genome comparison requires combining many sequence analysis tools. However, combining multiple tools for sequence analysis requires a significant amount of programming work and knowledge on each tool, thus it is very challenging to provide a service for comparing genomes on the web by using standard sequence analysis tools. Thus, to make genome comparison be done on the web, well-defined data mining concept and tools are very important since they can make genome comparison much easier. It is also important that the data mining tools for genome comparison should be scalable. We have been developing such scalable tools: a sequence clustering algorithm BAG, a metabolic pathway analysis tool MetaPath, a gene fusion event detection tool FuzFinder, a gene neighborhood navigation tool OperonViz, an algorithm for mining correlated gene sets MCGS, a genome sequence alignment tool GAME, a multiple genome sequence alignment algorithm by clustering local matches mgAlign, and a pairwise genome visulization tool COMPAM. The analysis results are summarized with visualization tools. We are currently working on integrating the data mining modules such that users can combine these in a very flexible way. In addition to sequence data, PLATCOM will include more data types such as gene expression data.

Speakers

Sun Kim

Sun Kim is currently Associate Director of Bioinformatics Program , Assistant Professor in School of Informatics, Associate Faculty at the Center for Genomics and Bioinformatics at Indiana University – Bloomington. Prior to IU, he worked at DuPont Central Research as Senior Computer Scientist from 1998 to 2001, and at the University of Illinois at Urbana-Champaign from 1997 to 1998 as Director of Bioinformatics and Postdoctoral Fellow at the Biotechnology Center and a Visiting Assistant Professor of Animal Sciences .Sun Kim received B.S. and M.S. and Ph.D. in Computer Science from Seoul National University, Korea Advanced Institute of Science and Technology (KAIST) , and the University of Iowa respectively. Sun Kim is a recipient of Outstanding Junior Faculty Award at Indiana University 2004-2005, NSF CAREER Award DBI-0237901 from 2003 to 2008, and Achievement Award at DuPont Central Research in 2000.