Portrait of Hoifung Poon

Hoifung Poon

Director, Precision Health NLP

About

Medicine today is imprecise. For the top 20 prescription drugs in the U.S., 80% of patients are non-responders. The advent of big data heralds a new era of precision medicine, where treatments become increasingly effective by tailoring to individual patients. For example, the rapid advance in sequencing technology has reached the exciting disruption point of $1000 person genome, making it affordable to identify genetic mutations in individual tumors. Unfortunately, big data also leads to information overload, making it hard to separate signal from noise. Today, it takes hours for a molecular tumor board of many highly trained specialists to review a patient’s genomics data and make treatment decisions. With 1.7 million new cancer cases and 600 thousand deaths in the U.S. each year, this is clearly not scalable.

My research interests lie in advancing machine learning and NLP to overcome the knowledge and reasoning bottlenecks in precision medicine. In particular, I’m very excited about the emerging area of “Curation-as-a-Service” (CaaS), which, by extracting valuable structured information from text, can empower a broad range of biomedical and healthcare practioners. Three representative areas stand out:

  • Molecular tumor board: Interpreting tumor mutations requires curating precision cancer knowledge from a vast biomedical literature, which comprises of tens of millions of papers and grows at 4,000 per day.
  • Real-world evidence: Developing an FDA-approved drug now takes over a decade and costs more than $2 billion. Randomized-controlled trials are the gold standard of medicine, but they are expensive and time-consuming to run, while covering only a tiny fraction of patients. Electronic medical records (EMRs) contain valuable clinical observations that can be used to augment clinical trial data, with potential applications in drug repurposing, synthetic control, post-market surveillance, and pragmatic trials.
  • Clinical trial matching: Over 20% of clinical trials fail due to insufficient patients. Patient recruitment is largely done by word of mouth, relying on physicians and patients to keep track of thousands of open trials and match elaborate eligibility criteria against a patient’s medical records.

Currently, curation of knowledge and patient information is done manually, which is hard to scale. Assisted curation powered by machine reading can drastically accelerate curation efficiency. However, standard machine reading approaches require painstakingly annotating many labeled examples, which limits their applicability. At Microsoft, I lead Project Hanover, where we overcome the annotation bottleneck by exploiting indirect supervision from readily available resources such as ontologies and existing databases. We developed a general framework for incorporating diverse forms of indirect supervision, by combining deep learning with probabilistic logic. We expanded the scope of machine reading from single sentences to cross-sentence and document-level. We proposed novel neural architectures such as graph LSTMs for incorporating and reasoning with linguistic constraints.

Building on past work in Literome, these advances enable us to create literature machine readers for a variety of domains, from fundamental biology (e.g., genetic pathways) to translational medicine (e.g., precision oncology), all without labeled examples. Our latest system reads all publicly available biomedical literature (30 million PubMed abstracts and 5 million PMC full-text articles). In a matter of minutes, Hanover found several times as many facts as a whole year of manual curation at an NCI-designated cancer center. These facts can be quickly validated by expert curators in an assisted curation interface, potentially increasing curation efficiency and coverage by a wide margin.

Our team now starts exploring clinical machine reading for harnessing real-world evidence and facilitating clinical trial matching. In the long run, we are also interested in leveraging machine reading results in cancer decision support and chronic disease modeling.

For more information, check out our recent tutorials in AAAI-18 and ACL-17 (Slides). I have given invited talks at various places including UIUC, J. Craig Venter Institute, University of Colorado at Denver, University of Maryland, Johns Hopkins, University of Massachusetts, MIT, and University of Washington. Here are the slides for an MIT talk in 2015 (thanks Regina Barzilay for inviting me), and the video for a talk in NIPS-14.

I obtained B.S. with Distinction in Computer Science from Sun Yat-Sen University, and Ph.D. in Computer Science and Engineering (my dissertation) from University of Washington, advised by Pedro Domingos. I am an affiliated faculty at UW Medicine, and serve as co-PI for various academic projects such as DARPA Big Mechanisms. My past work spans diverse topics in machine learning and NLP, and has been recognized with Best Paper Awards in top conferences such as NAACL, EMNLP, and UAI.

For more information, check out my publications and LinkedIn profile.

Selected Press Coverage: Bloomberg Technology, Microsoft News, Verge, ZDNet, eWeek, Puget Sound Business Journal, SWE Magazine cover story, Popular Mechanics, Der Spiegel, Medscape.

Publications
Augmenting subnetwork inference with information extracted from the scientific literature. [Paper]
Sid Kiblawi, Deborah Chasman, Amanda Henning, Eunju Park, Hoifung Poon, Michael Gould, Paul Ahlquist, Mark Craven.
In PLOS Computational Biology, June 2019.

Document-Level N-ary Relation Extraction with Multiscale Representation Learning. [Paper, Code]
Robin Jia, Cliff Wong, Hoifung Poon.
In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), June 2019.

Deep Probabilistic Logic: A Unifying Framework for Indirect Supervision. [Paper, Code]
Hai Wang and Hoifung Poon.
In Proceedings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), November 2018.

EZLearn: Exploiting Organic Supervision in Automated Data Annotation. [Paper]
Maxim Grechkin, Hoifung Poon, Bill Howe.
In the 27th International Joint Conference on Artificial Intelligence (IJCAI), July 2018.

Estimating Accuracy from Unlabeled Data: A Probabilistic Logic Approach. [Paper]
Emmanouil A. Platanios, Hoifung Poon, Tom M. Mitchell, Eric Horvitz.
In NIPS, December 2017.

Classification of common human diseases derived from shared genetic and environmental determinants. [Paper]
Kanix Wang, Hallie Gaitsch, Hoifung Poon, Nancy J Cox, and Andrey Rzhetsky.
In Nature Genetics, August 2017.

Molecularly targeted drug combinations demonstrate selective effectiveness for myeloid- and lymphoid-derived hematologic malignancies. [Paper]
Stephen Kurtz et al.
In Proceedings of the National Academy of Sciences of the United States of America (PNAS), July 2017.

Wide-Open: accelerating public data release by automating detection of overdue datasets. [Paper] (Nature News, The Scientist, UW Today)
Maxim Grechkin, Hoifung Poon, and Bill Howe.
In PLOS Biology, June 2017.

Cross-Sentence N-ary Relation Extraction with Graph LSTMs. [Paper, Code]
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Scott Yih.
In Transactions of the Association for Computational Linguistics (TACL), 2017.

Distant Supervision for Relation Extraction beyond the Sentence Boundary. [Paper]
Chris Quirk and Hoifung Poon
In Proceedings of the Fifteenth Conference of the European Association for Computational Linguistics (EACL), 2017.

Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text. [Paper]
Kristina Toutanova, Xi Victoria Lin, Wen-Tau Yih, Hoifung Poon, and Chris Quirk.
In Proceedings of the Fifty Fourth Annual Meeting of the Association for Computational Linguistics (ACL), 2016.

Representing Text for Joint Embedding of Text and Knowledge Bases. [Paper]
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon.
In Proceedings of the Annual Conference of Empirical Methods in Natural Language Processing (EMNLP), 2015.

Model Selection for Type-Supervised Learning with application to POS Tagging. [Paper]
Kristina Toutanova, Waleed Ammar, Pallavi Chourdhury, and Hoifung Poon.
In Proceedings of the SIGNLL Conference on Computational Natural Language Learning (CoNLL), 2015.

Grounded Semantic Parsing for Complex Knowledge Extraction. [Paper]
Ankur Parikh; Hoifung Poon; Kristina Toutanova
In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015.

Distant Supervision for Cancer Pathway Extraction from Text. [Paper]
Hoifung Poon, Kristina Toutanova, and Chris Quirk
In Proceedings of the Pacific Symposium on Biocomputing, 2015.

Literome: PubMed-Scale Genomic Knowledge Base in the Cloud. [Paper]
Hoifung Poon, Chris Quirk, Charlie DeZiel, and David Heckerman
Bioinformatics 2014; doi: 10.1093/bioinformatics/btu383

Grounded Unsupervised Semantic Parsing. [Paper]
Hoifung Poon.
In Proceedings of the Fifty First Annual Meeting of the Association for Computational Linguistics (ACL), 2013.

Probabilistic Frame Induction. [Paper]
Jackie Cheung, Hoifung Poon and Lucy Vanderwende.
In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2013.

An Exhaustive Epistatic SNP Association Analysis on Expanded Wellcome Trust Data. [Paper]
Christoph Lippert, Jennifer Listgarten, Robert Davidson, Scott Baxter, Hoifung Poon, Carl M. Kadie, David Heckerman.
In Scientific Reports, 2013, doi:10.1038/srep01099.

Sum-Product Networks: A New Deep Architecture. [Paper] [Slides] [Download code and results]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, (UAI), 2011.
Best Paper Award

Unsupervised Ontology Induction from Text. [Paper]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2010.

Joint Inference for Knowledge Extraction from Biomedical Literature. [Paper]
Hoifung Poon and Lucy Vanderwende.
In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies Conference (NAACL-HLT), 2010.

Unsupervised Semantic Parsing. [Paper] [Slides] [Download data and code]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2009.
Best Paper Award

Unsupervised Morphological Segmentation with Log-Linear Models. [Paper]
Hoifung Poon, Colin Cherry, and Kristina Toutanova.
In Proceedings of the North American Chapter of the Association for Computational Linguistics – Human Language Technologies Conference (NAACL-HLT), 2009.
Best Paper Award

Language ID in the Context of Harvesting Language Data off the Web. [Paper]
Fei Xia, William Lewis, and Hoifung Poon.
In Proceedings of the Conference of European Association for Computational Linguistics (EACL), 2009.

Joint Unsupervised Coreference Resolution with Markov Logic. [Paper]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008.

A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC. [Paper]
Hoifung Poon, Pedro Domingos, and Marc Sumner.
In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (AAAI), 2008.

Markov Logic. [Book Chapter]
Pedro Domingos, Stanley Kok, Daniel Lowd, Hoifung Poon, Matthew Richardson, Parag Singla.
In L. De Raedt, P. Frasconi, K. Kersting and S. Muggleton (eds.), Probabilistic Inductive Logic Programming, 2008.

Joint Inference in Information Extraction. [Paper] [Online Appendix]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI), 2007.

Sound and Efficient Inference with Probabilistic and Deterministic Dependencies. [Paper]
Hoifung Poon and Pedro Domingos.
In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 2006.

Unifying Logical and Statistical AI. [Paper]
Pedro Domingos, Stanley Kok, Hoifung Poon, Matthew Richardson, Parag Singla.
In Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI), 2006.
Invited paper.