Playing “Hide and Seek” – The Hidden Genome

August 30, 2012
Michal Linial | Dept of Biological Chemistry, The Surarsky Center for Computational biology, The Hebrew University of Jerusalem, Israel
Microsoft Research Colloquium

The overwhelming increase in sequencing methodology resulted in the accumulation of millions of DNA sequences. These sequences are collected from thousands of genomes that (ideally) sample the ‘tree of life’. I will briefly discuss the ‘minimal set of instructions’ by which a linear sequence is transformed into a functional protein. What happen when the statistical noise is too high, thus classical procedures to predict protein sequences fail? I will focus on the challenge of identifying short proteins that remain buried in the genomic data. For illustration, I will take you for a ‘treasure hunt’ for short proteins.

Many short proteins share fuzzy features that are common to most animal venom. I will discuss the limitation in using classical tools that are based on string comparison, or pattern finding to identify short proteins. For this task, statistical machine learning methods were useful in identifying hidden bioactive sequences in several genomes. Evidently, such sequences are attractive candidates for novel therapy. The test case of short proteins illustrates the importance of a cycle that starts by a biological hypothesis, then uses a computational formulation and finalizes by an experimental validation. Finally, I will discuss our genomes with respect to our ‘partners’ (viruses, bacteria). Once the interaction of these genomes is considered, the source for the dynamic nature of human evolution becomes evident.

Related publications:

Rappoport N, Karsenty S, Stern A, Linial N, Linial M. (2012) Nucl. Acids Res. 40:D313-D320.
Rappoport N, Linial M. (2012) PLoS Comput Biol. 8:e1002364.
Naamati G, Askenazi M, Linial M. (2010) Bioinformatics 26:i482-i488.
Naamati G, Askenazi M, Linial M (2009) Nucl. Acids Res. 37:W363-368.
Kaplan N, Morpurgo N, Linial M. (2007) J Mol Biol. 369:553-566.

Speaker Details

Michal Linial is a Professor of Biochemistry, The Hebrew University, Jerusalem, Israel and a Director of the SCCB, the Sudarsky Center for Computational Biology.
ML had published over 150 scientific papers and abstracts on diverse topics in molecular biology, cellular biology, bioinformatics, neuroscience the integration of tools to improve knowledge extractions.
M. Linial has an experimental and computational laboratory. M.L is the leader and the founder of the first established educational program in Israel for Computer Science and Life Science (from 1999) for Undergraduate-Graduate studies.
Her expertise in the synapse let to the study of protein families, protein-protein interactions with a global view on protein networks and their regulation. Molecular biology, cell biology and biochemical methods are applied in all research initiated in her laboratory. She and her laboratory are developing new computational and technological tools for large-scale cell biological research M. Linial and her colleagues apply MS based and genomics (DNA Chip) approaches for studying changes in neuronal development, and disease oriented research. She published over 180 scientific papers including book chapters and numerous reviews.
The solid informatics approaches are used for large database storage and constant updating of several systems in view of classification, validation and functional predictions. M.L. and her students has been an active participant in NIH structural genomics initiatives and she participated in Structural Genomics effort Task for target selections. She and her colleagues have created several global classification systems that are used by the biomedical and biology communities. Most notably are the ProtoNet, EVEREST, ProTarget and PANDORA, mirror, ClanTox and more. All those developed web systems are provided as an open source for investigators.