Literome: extracting knowledge from biomedical publications
As any researcher knows, keeping up with scientific knowledge isn’t easy. This is especially true in the field of medical genetics, where advances in DNA sequencing technology have led to an exponential growth of genomics data. Such data hold the key to identifying disease genes and drug targets, because complex diseases inevitably stem from synergistic perturbations of pathways and other gene networks. Many of these interactions are known, but most of this knowledge resides in academic journals, the number of which has undergone its own exponential growth. It thus has become increasingly difficult for researchers to find relevant knowledge for genomic interpretation and to keep up with new genomics findings. Fortunately, help has arrived with the Literome Project.*
Literome is an automatic curation system that both extracts genomic knowledge from PubMed (one of the world’s largest repositories of medical and life science journal articles) and makes this knowledge available in the cloud, with a website to facilitate browsing, searching, and reasoning. Currently, Literome focuses on the two types of knowledge most pertinent to genomic medicine: directed genic interactions, such as pathways, and genotype-phenotype associations. Users can search for interacting genes and the nature of the interactions, as well as for diseases and drugs associated with a given gene or single nucleotide polymorphism (SNP). Users can also search for indirect connections between two entities; for example, they can look to see if a gene and a disease might be linked by searching for known associations between an interacting gene and a related disease.
Literome builds on Microsoft Research natural language processing (NLP) technology, extracting information from PubMed abstracts via our Statistical Parsing and Linguistics Analysis Toolkit (SPLAT), and uses the Microsoft Azure cloud platform to store, analyze, and disseminate the information.
Scientists can use Literome in a number of ways, from exploratory browsing, to corroborating or refuting new discoveries, to programmatically integrating pathways and genotype-phenotype associations for making discoveries from genomics data. Literome is freely available for noncommercial use through an online service, or downloadable web services. It is our hope that Literome will help researchers search genomic medical findings that can lead to new understanding and treatment of genetically mediated diseases.
—Hoifung Poon, Researcher, Microsoft Research
- Literome: PubMed-Scale Genomic Knowledge Base in the Cloud (white paper)
- The Literome Project
- Microsoft Research SPLAT
- Microsoft Azure
- Health and Wellbeing at Microsoft Research
*The Literome Project is a joint project from Hoifung Poon, Chris Quirk, Charlie DeZiel, and David Heckerman of Microsoft Research.