Determining Fundamental Principles of RNA Structure with Comparative Sequence Analysis
Molecular biology is in the midst of a revolution, the ramifications of which will change our understanding of the regulation of the cell. While the complexity of the organism increases significantly from a simple worm to mammals, it is now recognized that the number of protein coding genes remains approximately the same, at around 20,000. This observation may be counter to our expectation, but in fact the complexity in an organism scales with the number of RNAs.
Like DNA, RNA can carry information, form a regular helix and form base pairs with other nucleic acid sequences; like protein, RNA can fold into three-dimensional structures capable of catalyzing chemical reactions. Many new types of RNA are now being discovered with a wide range of roles within the cell, and it has been estimated that the number of microRNAs in humans could be approximately double the number of protein-encoding genes. Our original and simpler understanding that a fragment of DNA is transcribed into a segment of RNA, which is then translated into a single protein, is now being replaced with a very complicated RNA and protein assembly and regulatory system.
This major change in our understanding is occurring in tandem with increases in the amount of sequence information and experimental studies associating structure and function with RNA. This type of multi-dimensional information has been studied successfully with comparative analysis to identify structural components that are conserved in different RNA families, and decipher structural, functional, and evolutionary relationships.
Taken all together, an understanding of these substantial complexities in the cell requires significant improvements in our knowledge about RNA. The tremendous increase in available biological information creates opportunities to increase the resolution and detail of the structure, function and evolution of cellular components while presenting new computational challenges for performance and scalability. To fully utilize this large increase in knowledge, it must be organized and integrated for analysis.
The Gutell Lab, in association with Microsoft Research, has designed and implemented the RNA Comparative Analysis Database (rCAD) which supports comparative analysis of RNA sequence and structure. This innovative system unites, for the first time in a single environment, multiple dimensions of information necessary for alignment viewing, sequence metadata, structural annotations, structure prediction studies, structural statistics of different motifs, and phylogenetic analysis. With this system, The Gutell Lab has begun to move away from the more traditional mixture of file formats, analysis software and processing scripts to a Microsoft SQL Server- based comparative analysis system able to make discoveries that were not possible before. The transition is shown below:
The ability of the rCAD system to store extremely large amounts of multi-dimensional data and analyze them within this integrated system has great potential for important discoveries in the study of RNA. Over time we plan to expand this database system into a fully-integrated RNA comparative analysis system that can be used by many scientific laboratories studying the structure, function, and evolution of RNA.
For more information: