Medicine today is imprecise. For the top 20 prescription drugs in the U.S., 80% of patients are non-responders. Recent disruptions in sensor technology have enabled precise categorization of diseases and treatment effects. For example, sequencing technology has reached the exciting disruption point of $1000 person genome. However, progress in precision medicine is difficult, as genome-scale knowledge and reasoning become the ultimate bottlenecks in deciphering cancer and other complex diseases. Today, it takes hours for a molecular tumor board of 10-20 highly trained specialists to review one patient’s omics data and make treatment decisions. With 1.6 million new cancer cases and 600 thousand deaths in the U.S. each year, this is clearly not scalable.

My research interest can be summed up as developing “The Curing AI for Precision Medicine“, to overcome these bottlenecks. For example, machine reading automates extracting knowledge from biomedical literature and converting free-text clinical notes into structured databases, whereas sophisticated machine learning methods can integrate rich prior knowledge with experimental data, for personalized cancer treatment and chronic disease management.

I have given invited talks at various places including UIUC, J. Craig Venter Institute, University of Colorado at Denver, University of Maryland, Johns Hopkins, University of Massachusetts, MIT, University of Washington. Here are the slides for an MIT talk in 2015 (thanks Regina Barzilay for inviting me), and the video for a talk in NIPS.

In our most recent Project Hanover, we focus on three interwoven agenda:

  • Machine reading: Develop information extraction methods that do not require annotated examples, by leveraging prior knowledge and other available structured resources.
  • Cancer decision support: Develop machine learning methods to integrate genomics knowledge with experimental data, for
    personalizing drug combinations in Acute Myeloid Leukemia (AML), where treatment hasn’t improved in the past three decades. We are collaborating with the Knight Cancer Institute, a pioneer in cancer precision medicine.
  • Chronic disease management: Develop machine learning methods for modeling chronic disease progression, based on EMRs and other health sensor data.

Results from our machine reading work can be found in Literome, an Azure-based cloud service for knowledge extraction from PubMed. Currently, it focuses on two types of knowledge more pertinent to genomic medicine: gene-gene interactions (as in biological pathways) and genotype-phenotype associations, such as single nucleotide polymorphism (SNP) vs. disease predisposition or drug reaction [Bioinformatics Paper].

I’m excited to participate in the DARPA Program on automating the construction of “Big Mechanisms” for cancer systems biology by reading literature, integrating ontologies and knowledgebases, and deciphering experimental data. I am a co-PI in a team led by Andrey Rzhetsky.

My past work has been recognized with Best Paper Awards in top NLP and machine learning conferences such as NAACL, EMNLP, and UAI.

I spent some truly amazing years in the Department of Computer Science and Engineering at the University of Washington.
My Ph.D. advisor is Pedro Domingos.
My dissertation is: Markov Logic for Machine Reading.

For more information, check out my publications and LinkedIn profile.