Biomedical Natural Language Processing

Biomedical Natural Language Processing



The biomedical sciences are beginning to undergo a major transformation. Precision medicine has the potential to make treatments much more effective by better understanding patients, biological mechanisms, and therapeutic effects. However, current approaches only reach a small fraction of the patient population.  Consider the molecular tumor board: dozens of highly paid specialists create a custom treatment plan for an individual patient, combing the research literature for research advances that are relevant to the cancer of that patient. Yet 1.6 million people are diagnosed with cancer every year. Patient data is flooding in far faster than doctors can process it.

With intelligent tools, information retrieval for a given patient profile could be largely automated, saving hours of doctor time. Potential diagnoses and treatments could be provided to the experts, and feedback from those suggestions could lead to continual system improvements. Such a system would require subsystems that understand a broad range of information about the biological and medical phenomena behind diseases and their treatments. Currently most biomedical knowledge is stored in natural language text, from the scientific literature that explains biological processes and therapeutic mechanisms of action to the electronic health records that document patients’ journeys through our healthcare systems.

Thus, our first goal is to build systems that can read natural language text to extract biomedical facts, finding the latest research on drug-protein interactions and combing through electronic health records to identify lifestyle and environmental factors. Our research directions include advanced techniques for information extraction, such as deep neural networks that take graph structured inputs. Equally important, though, is to make the knowledge accessible to people. To that end, we are building interfaces to browse and curate the resulting knowledge bases.

Our broader goal is to build decision support systems: machine learning systems that leverage large data sources to provide recommendations to experts and learn from the resulting interactions. One such project is a collaboration with the Knight Cancer Institute on combination therapy prediction for patients diagnosed with Acute Myeloid Leukemia. Given patient-specific genomic data and facts from the scientific literature, our system ranks selections from the exponentially growing space of drug combinations.  Then, the recommended combinations can be reviewed carefully by an oncologist. This kind of man-machine collaboration has amazing transformative potential for the field of biomedical science.



Research Team

Microsoft Health Initiative Collaborators