About

Medicine today is imprecise. For the top 20 prescription drugs in the U.S., 80% of patients are non-responders. Recent disruptions in sensor technology have enabled precise categorization of diseases and treatment effects. For example, sequencing technology has reached the exciting disruption point of $1000 person genome. However, progress in precision medicine is difficult, as genome-scale knowledge and reasoning become the ultimate bottlenecks in deciphering cancer and other complex diseases. Today, it takes hours for a molecular tumor board of many highly trained specialists to review one patient’s omics data and make treatment decisions. With 1.7 million new cancer cases and 600 thousand deaths in the U.S. each year, this is clearly not scalable.

My research interest can be summed up as developing “AI for Precision Medicine“, to overcome these bottlenecks. For example, machine reading automates extracting knowledge from biomedical literature and converting free-text clinical notes into structured databases, whereas sophisticated machine learning methods can integrate rich prior knowledge with experimental data, for personalized cancer treatment and chronic disease management.

See our recent tutorial on NLP for Precision Medicine at ACL-2017. I have also given invited talks at various places including UIUC, J. Craig Venter Institute, University of Colorado at Denver, University of Maryland, Johns Hopkins, University of Massachusetts, MIT, University of Washington. Here are the slides for an MIT talk in 2015 (thanks Regina Barzilay for inviting me), and the video for a talk in NIPS.

In our most recent Project Hanover, we focus on three interwoven agenda:

  • Machine reading: Develop information extraction methods that do not require annotated examples, by leveraging prior knowledge and other available structured resources.
  • Cancer decision support: Develop machine learning methods to integrate genomics knowledge with experimental data, for personalizing drug combinations in Acute Myeloid Leukemia (AML), where treatment hasn’t improved in the past three decades. We are collaborating with the Knight Cancer Institute, a pioneer in cancer precision medicine.
  • Chronic disease management: Develop machine learning methods for modeling chronic disease progression, based on EMRs and other health sensor data.

Results from our machine reading work can be found in Literome, an Azure-based cloud service for knowledge extraction from PubMed. Currently, it focuses on two types of knowledge more pertinent to genomic medicine: gene-gene interactions (as in biological pathways) and genotype-phenotype associations, such as single nucleotide polymorphism (SNP) vs. disease predisposition or drug reaction [Bioinformatics Paper].

I’m excited to participate in the DARPA Program on automating the construction of “Big Mechanisms” for cancer systems biology by reading literature, integrating ontologies and knowledgebases, and deciphering experimental data. I am a co-PI in a team led by Andrey Rzhetsky.

My past work has been recognized with Best Paper Awards in top NLP and machine learning conferences such as NAACL, EMNLP, and UAI.

I spent some truly amazing years in the Department of Computer Science and Engineering at the University of Washington.
My Ph.D. advisor is Pedro Domingos.
My dissertation is: Markov Logic for Machine Reading.

For more information, check out my publications and LinkedIn profile.

Selected Press Coverage: Bloomberg Technology, Microsoft News, Verge, ZDNet, eWeek, Puget Sound Business Journal, SWE Magazine cover story, Popular Mechanics.

Publications

  • Classification of common human diseases derived from shared genetic and environmental determinants. [Paper]
    Kanix Wang, Hallie Gaitsch, Hoifung Poon, Nancy J Cox, and Andrey Rzhetsky.
    In Nature Genetics, August 2017.
  • Molecularly targeted drug combinations demonstrate selective effectiveness for myeloid- and lymphoid-derived hematologic malignancies. [Paper]
    Stephen Kurtz et al.
    In Proceedings of the National Academy of Sciences of the United States of America (PNAS), July 2017.
  • Wide-Open: accelerating public data release by automating detection of overdue datasets. [Paper] (Nature News, The Scientist, UW Today)
    Maxim Grechkin, Hoifung Poon, and Bill Howe.
    In PLOS Biology, June 2017.
  • Cross-Sentence N-ary Relation Extraction with Graph LSTMs. [Paper]
    Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Scott Yih.
    In Transactions of the Association for Computational Linguistics (TACL), 2017.
  • Distant Supervision for Relation Extraction beyond the Sentence Boundary. [Paper]
    Chris Quirk and Hoifung Poon
    In Proceedings of the Fifteenth Conference of the European Association for Computational Linguistics (EACL), 2017.
  • Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text. [Paper]
    Kristina Toutanova, Xi Victoria Lin, Wen-Tau Yih, Hoifung Poon, and Chris Quirk.
    In Proceedings of the Fifty Fourth Annual Meeting of the Association for Computational Linguistics (ACL), 2016.
  • Representing Text for Joint Embedding of Text and Knowledge Bases. [Paper]
    Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon.
    In Proceedings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), 2015.
  • Model Selection for Type-Supervised Learning with application to POS Tagging. [Paper]
    Kristina Toutanova, Waleed Ammar, Pallavi Chourdhury, and Hoifung Poon.
    In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2015.
  • Grounded Semantic Parsing for Complex Knowledge Extraction. [Paper]
    Ankur Parikh; Hoifung Poon; Kristina Toutanova
    In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015.
  • Distant Supervision for Cancer Pathway Extraction from Text. [Paper]
    Hoifung Poon, Kristina Toutanova, and Chris Quirk
    In Proceedings of the Pacific Symposium on Biocomputing, 2015.
  • Literome: PubMed-Scale Genomic Knowledge Base in the Cloud. [Paper]
    Hoifung Poon, Chris Quirk, Charlie DeZiel, and David Heckerman
    Bioinformatics 2014; doi: 10.1093/bioinformatics/btu383
  • Grounded Unsupervised Semantic Parsing. [Paper]
    Hoifung Poon.
    In Proceedings of the Fifty First Annual Meeting of the Association for Computational Linguistics (ACL), 2013.
  • Probabilistic Frame Induction. [Paper]
    Jackie Cheung, Hoifung Poon and Lucy Vanderwende.
    In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2013.
  • An Exhaustive Epistatic SNP Association Analysis on Expanded Wellcome Trust Data. [Paper]
    Christoph Lippert, Jennifer Listgarten, Robert Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, David Heckerman.
    In Scientific Reports, 2013, doi:10.1038/srep01099.
  • Sum-Product Networks: A New Deep Architecture. [Paper] [Slides] [Download code and results]
    Hoifung Poon and Pedro Domingos.
    In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, (UAI), 2011.
    Best Paper Award
  • Unsupervised Ontology Induction from Text. [Paper]
    Hoifung Poon and Pedro Domingos.
    In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2010.
  • Joint Inference for Knowledge Extraction from Biomedical Literature. [Paper]
    Hoifung Poon and Lucy Vanderwende.
    In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies Conference (NAACL-HLT), 2010.
  • Unsupervised Semantic Parsing. [Paper] [Slides] [Download data and code]
    Hoifung Poon and Pedro Domingos.
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2009.
    Best Paper Award
  • Unsupervised Morphological Segmentation with Log-Linear Models. [Paper]
    Hoifung Poon, Colin Cherry, and Kristina Toutanova.
    In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies Conference (NAACL-HLT), 2009.
    Best Paper Award
  • Language ID in the Context of Harvesting Language Data off the Web. [Paper]
    Fei Xia, William Lewis, and Hoifung Poon.
    In Proceedings of the Conference of European Association for Computational Linguistics (EACL), 2009.
  • Joint Unsupervised Coreference Resolution with Markov Logic. [Paper]
    Hoifung Poon and Pedro Domingos.
    In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008.
  • A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC. [Paper]
    Hoifung Poon, Pedro Domingos, and Marc Sumner.
    In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI), 2008.
  • Markov Logic. [Book Chapter]
    Pedro Domingos, Stanley Kok, Daniel Lowd, Hoifung Poon, Matthew Richardson, Parag Singla.
    In L. De Raedt, P. Frasconi, K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic Programming, 2008.
  • Joint Inference in Information Extraction. [Paper] [Online Appendix]
    Hoifung Poon and Pedro Domingos.
    In Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI), 2007.
  • Sound and Efficient Inference with Probabilistic and Deterministic Dependencies. [Paper]
    Hoifung Poon and Pedro Domingos.
    In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 2006.
  • Unifying Logical and Statistical AI. [Paper]
    Pedro Domingos, Stanley Kok, Hoifung Poon, Matthew Richardson, Parag Singla.
    In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 2006.
    Invited paper.

Projects

Biomedical Natural Language Processing

The biomedical sciences are beginning to undergo a major transformation. Precision medicine has the potential to make treatments much more effective by better understanding patients, biological mechanisms, and therapeutic effects. However, current approaches only reach a small fraction of the patient population.  Consider the molecular tumor board: dozens of highly paid specialists create a custom treatment plan for an individual patient, combing the research literature for research advances that are relevant to the cancer of…

Literome

In the Literome Project, we have developed an automatic curation system to extract genomic knowledge from PubMed articles and make this knowledge available on Azure to facilitate exploration and computation in precision medicine. For more details, check out our Azure site.

Publications

2017

2016

2015

2013

Probabilistic Frame Induction
Jackie Chi Kit Cheung, Hoifung Poon, Lucy Vanderwende, in Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2013., ACL/SIGPARSE, January 1, 2013, View abstract, Download PDF

2010

2009

Projects

Downloads

NCI-PID-PubMed Genomics Knowledge Base Completion Dataset

October 2016

This dataset includes a database of regulation relationships among genes and corresponding textual mentions of pairs of genes in PubMed article abstracts.

    Click the icon to access this download

  • Website