{"id":959091,"date":"2023-08-10T09:00:00","date_gmt":"2023-08-10T16:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=959091"},"modified":"2023-12-19T08:30:09","modified_gmt":"2023-12-19T16:30:09","slug":"microsoft-at-kdd-2023-advancing-health-at-the-speed-of-ai","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/microsoft-at-kdd-2023-advancing-health-at-the-speed-of-ai\/","title":{"rendered":"Microsoft at KDD 2023: Advancing health at the speed of AI"},"content":{"rendered":"\n<p><strong><em>This content was given as a keynote at the Workshop of Applied Data Science for Healthcare and covered during a tutorial at the <\/em><\/strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/kdd.org\/kdd2023\/\"><strong><em>29<sup>th<\/sup> ACM SIGKDD Conference on Knowledge Discovery and Data Mining<\/em><\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><strong><em>, a premier forum for advancement, education, and adoption of the discipline of knowledge discovering and data mining.<\/em><\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo.jpg\" alt=\"Microsoft at KDD 2023: Advancing health at the speed of AI\" class=\"wp-image-959100\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-BlogHeroFeature-1400x788-no-logo-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"margin-callout\">\n\t<article class=\"annotations__list card depth-16 bg-body p-4 annotations__list--right\">\n\t\t<div class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Group<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/real-world-evidence\/\" data-bi-cN=\"Real-world Evidence\" data-external-link=\"false\" data-bi-aN=\"margin-callout\" data-bi-type=\"annotated-link\" class=\"annotations__link font-weight-semibold text-decoration-none\"><span>Real-world Evidence<\/span>&nbsp;<span class=\"glyph-in-link glyph-append glyph-append-chevron-right\" aria-hidden=\"true\"><\/span><\/a>\t\t\t\t\t<\/div>\n\t<\/article>\n<\/div>\n\n\n\n<p>Recent and noteworthy advancements in generative AI and large language models (LLMs) are leading to profound transformations in various domains. This blog explores how these breakthroughs can accelerate progress in precision health. In addition to the keynote I delivered, &#8220;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/dshealthkdd.github.io\/dshealth-2023\/\" target=\"_blank\" rel=\"noopener noreferrer\">Applications and New Fronters of Generative Models for Healthcare<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,&#8221; it includes part of a <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/kdd.org\/kdd2023\/tutorials\/\" target=\"_blank\" rel=\"noopener noreferrer\">tutorial<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> (LS-21) being given at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/kdd.org\/kdd2023\/\" target=\"_blank\" rel=\"noopener noreferrer\">KDD 2023<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. This tutorial surveys the broader research area of \u201cPrecision Health at the Age of Large Language Models,\u201d delivered by <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shezhan\/\" target=\"_blank\" rel=\"noreferrer noopener\">Sheng Zhang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jagonz\/\" target=\"_blank\" rel=\"noreferrer noopener\">Javier Gonz\u00e1lez Hern\u00e1ndez<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tristan\/\" target=\"_blank\" rel=\"noreferrer noopener\">Tristan Naumann<\/a>, and myself.&nbsp;<\/p>\n\n\n\n<p>A longstanding objective within precision health is the development of a continuous learning system capable of seamlessly integrating novel information to enhance healthcare delivery and expedite advancements in biomedicine. The National Academy of Medicine has gathered leading experts to explore this key initiative, as documented in its <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/nam.edu\/programs\/value-science-driven-health-care\/learning-health-system-series\/\" target=\"_blank\" rel=\"noopener noreferrer\">Learning Health System<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> series. However, the current state of health systems is far removed from this ideal.&nbsp;The burden of extensive unstructured data and labor-intensive manual processing hinder progress. This is evident, for instance, in the context of cancer treatment, where the traditional standard of care frequently falls short, leaving clinical trials as a last resort. Yet a lack of awareness renders these trials inaccessible, with only 3 percent of US patients finding a suitable trial. This enrollment deficiency contributes to nearly 40 percent of trial failures, as shown in Figure 1. Consequently, the process of drug discovery is exceedingly slow, demanding billions of dollars and a timeline of over a decade.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"410\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-1-cancer-trial-termination-reason-1024x410.jpeg\" alt=\"Figure 1: This pie chart shows the reasons for clinical trial termination for cancer treatment. Insufficient enrollment accounts for 38.7% of these failures.\" class=\"wp-image-959136\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-1-cancer-trial-termination-reason-1024x410.jpeg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-1-cancer-trial-termination-reason-300x120.jpeg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-1-cancer-trial-termination-reason-768x307.jpeg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-1-cancer-trial-termination-reason-240x96.jpeg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-1-cancer-trial-termination-reason.jpeg 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 1: This pie chart shows the reasons for clinical trial termination for cancer treatment. Insufficient enrollment accounts for 38.7% of these failures.&nbsp;<\/figcaption><\/figure>\n\n\n\n<p>On an encouraging note, advances in generative AI provide unparalleled opportunities in harnessing real-world observational data to improve patient care\u2014a long-standing goal in the realm of real-world evidence (RWE), which the US Food and Drug Administration (FDA) relies on to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.fda.gov\/science-research\/science-and-research-special-topics\/real-world-evidence\" target=\"_blank\" rel=\"noopener noreferrer\">monitor and evaluate post-market drug safety<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. Large language models (LLMs) like GPT-4 have the capability of \u201cuniversal structuring,\u201d enabling efficient abstraction of patient information from clinical text at a large scale. This potential can be likened to the transformative impact LLMs are currently making in other domains, such as software development and productivity tools.<\/p>\n\n\n\n<div style=\"height:10px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1160910\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">video series<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/on-second-thought\/\" aria-label=\"On Second Thought\" data-bi-cN=\"On Second Thought\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/MFST_feature_SecondThought_1400x788.jpg\" alt=\"On Second Thought with Sinead Bovell\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">On Second Thought<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"on-second-thought\" class=\"large\">A video series with Sinead Bovell built around the questions everyone\u2019s asking about AI. With expert voices from across Microsoft, we break down the tension and promise of this rapidly changing technology, exploring what\u2019s evolving and what\u2019s possible.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/on-second-thought\/\" aria-describedby=\"on-second-thought\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"On Second Thought\" target=\"_blank\">\n\t\t\t\t\t\t\tExplore the series\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h3 class=\"wp-block-heading\" id=\"digital-transformation-leads-to-an-intelligence-revolution\">Digital transformation leads to an intelligence revolution<\/h3>\n\n\n\n<p>The large-scale digitization of human knowledge on the internet has facilitated the pretraining of powerful large language models. As a result, we are witnessing revolutionary changes in general software categories like programming and search. Similarly, the past couple of decades have seen rapid digitization in biomedicine, with advancements like sequencing technologies, electronic medical records (EMRs), and health sensors. By unleashing the power of generative AI in the field of biomedicine, we can achieve similarly amazing transformations in precision health, as shown in Figure 2.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"445\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-2-digital-transformation-leads-to-intelligence-revolution-1024x445.png\" alt=\"Figure 2: Left shows digital transformation in biomedicine, as signified by genome sequences, electronic medical records, and health sensors. Right shows how LLMs can accelerate progress towards precision health by improving access, safety, and preventative care.\" class=\"wp-image-959139\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-2-digital-transformation-leads-to-intelligence-revolution-1024x445.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-2-digital-transformation-leads-to-intelligence-revolution-300x131.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-2-digital-transformation-leads-to-intelligence-revolution-768x334.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-2-digital-transformation-leads-to-intelligence-revolution-240x104.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-2-digital-transformation-leads-to-intelligence-revolution.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 2: Large-scale digitization of biomedical data, such as genome sequences and electronic medical records, enables accelerated progress towards precision health fueled by generative AI and LLMs.&nbsp;<\/figcaption><\/figure>\n\n\n\n<p>Microsoft is at the forefront of exploring the applications of LLMs in the health field, as depicted in Figure 3. Our PubMedBERT models, pretrained on biomedical&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext\" target=\"_blank\" rel=\"noopener noreferrer\">abstracts<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>&nbsp;and&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext\" target=\"_blank\" rel=\"noopener noreferrer\">full texts<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, were released three years ago. They have sparked immense interest in biomedical pretraining and continue to receive an overwhelming number of downloads each month, with over one million in July 2023 alone. Numerous recent investigations have followed suit, delving deeper into this promising direction. Now, with&nbsp;next-generation models like GPT-4 being widely accessible, progress can be further accelerated.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"535\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-3-LLMs-in-Health-1024x535.png\" alt=\"Figure 3: Progress in LLMs for health application, from Microsoft\u2019s PubMedBERT (left, 2020) to an explosion of recent biomedical LLMs (middle, 2022) to the latest GPT-4 (right, 2023)\" class=\"wp-image-959142\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-3-LLMs-in-Health-1024x535.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-3-LLMs-in-Health-300x157.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-3-LLMs-in-Health-768x401.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-3-LLMs-in-Health-240x125.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-3-LLMs-in-Health.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 3: Microsoft is among the first to explore large language models in health applications.<\/figcaption><\/figure>\n\n\n\n<p>Although pretrained on general web content, GPT-4 has demonstrated impressive competence in biomedical tasks straightaway and has the potential to perform previously unseen natural language processing (NLP) tasks in the biomedical domain with exceptional accuracy. Notably, research studies show that GPT-4 can achieve <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/capabilities-of-gpt-4-on-medical-challenge-problems\/\">expert-level performance on medical question-answer datasets<\/a>, like MedQA (USMLE exam), without the need for costly task-specific fine-tuning or intricate self-refinement.<\/p>\n\n\n\n<p>Similarly, with simple prompts, GPT-4 can effectively <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/scaling-clinical-trial-matching-using-large-language-models-a-case-study-in-oncology\/\">structure complex clinical trial matching logic from eligibility criteria<\/a>, surpassing prior state-of-the-art systems like Criteria2Query, which were specifically designed for this purpose, as shown in Figure 4.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"317\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-4-CTM-structuring-results-1024x317.png\" alt=\"Figure 4: Table showing test results on structuring clinical trial eligibility criteria comparing GPT-4 with prior state-of-the-art systems such as Criteria2Query.\" class=\"wp-image-959145\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-4-CTM-structuring-results-1024x317.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-4-CTM-structuring-results-300x93.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-4-CTM-structuring-results-768x238.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-4-CTM-structuring-results-240x74.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-4-CTM-structuring-results.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 4: Comparison of test results on structuring clinical trial eligibility criteria. GPT-4 outperformed the previous state-of-the-art method without requiring any specialized training.<\/figcaption><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"transforming-real-world-data-into-a-discovery-engine\">Transforming real-world data into a discovery engine<\/h3>\n\n\n\n<p>In the context of clinical trial matching, besides structuring trial eligibility criteria, the bigger challenge lies in structuring patient records at scale. Cancer patients may have hundreds of notes where critical information like histopathology or staging may be scattered across multiple entries, as shown in Figure 5. To tackle this, Microsoft and Providence, a large US-based health system, have developed state-of-the-art self-supervised LLMs like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.cell.com\/patterns\/fulltext\/S2666-3899(23)00066-1\" target=\"_blank\" rel=\"noopener noreferrer\">OncoBERT<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> to extract such details. More recently, preliminary studies have found that GPT-4 can also excel at structuring such vital information. Drawing on these advancements, we developed a research system for clinical trial matching, powered by LLMs. This system is now used daily on a molecular tumor board at Providence, as well as in high-profile trials such as this adoptive T-cell trial, as reported by the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.nytimes.com\/2022\/06\/01\/health\/pancreatic-cancer-treatment.html\" target=\"_blank\" rel=\"noopener noreferrer\">New York Times<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"466\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-5-cancer-patient-has-many-notes-1024x466.png\" alt=\"Figure 5: A graphic illustrating a de-identified example cancer patient with hundreds of clinical notes spanning across many note types.\" class=\"wp-image-959133\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-5-cancer-patient-has-many-notes-1024x466.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-5-cancer-patient-has-many-notes-300x137.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-5-cancer-patient-has-many-notes-768x349.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-5-cancer-patient-has-many-notes-240x109.png 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-Blog_Fig-5-cancer-patient-has-many-notes.png 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 5: Vital information about a cancer patient may be scattered among hundreds of clinical notes, as illustrated by this de-identified example.<\/figcaption><\/figure>\n\n\n\n<p>Clinical trial matching is important in its own right, and the same underlying technologies can be used to unlock other beneficial applications. For example, in collaboration with Providence researchers, we demonstrated how real-world data can be harnessed to simulate prominent lung cancer trials under various eligibility settings. By combining the structuring capabilities of LLMs with state-of-the-art causal inference methods, we effectively transform real-world data into a discovery engine. This enables instant evaluation of clinical hypotheses, with applications spanning clinical trial design, synthetic control, post-market surveillance, comparative effectiveness, among others.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"towards-precision-health-copilots\">Towards precision health copilots<\/h3>\n\n\n\n<p>The significance of generative AI lies not in achieving incremental improvements, but in enabling entirely new possibilities in applications. LLM\u2019s universal structuring capability allows for the scaling of RWE generation from patient data at the population level. Additionally, LLMs can serve as \u201c<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/distilling-large-language-models-for-biomedical-knowledge-extraction-a-case-study-on-adverse-drug-events\/\">universal annotators<\/a>,\u201d generating examples from unlabeled data to train high-performance student models. Furthermore, LLMs possess <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/causal-reasoning-and-large-language-models-opening-a-new-frontier-for-causality\/\">remarkable reasoning capabilities<\/a>, functioning as \u201cuniversal reasoners\u201d and accelerating causal discovery from real-world data at the population level. These models can also <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/self-verification-improves-few-shot-clinical-information-extraction\/\">fact-check their own answers<\/a>, providing easily verifiable rationale to enhance their accuracy and facilitate human-in-the-loop verification and interactive learning.<\/p>\n\n\n\n<p>Beyond textual data, there is immense growth potential for LLMs in health applications, particularly when dealing with multimodal and longitudinal patient data. Crucial patient information may reside in various information-rich modalities, such as imaging and multi-omics. We have explored pretraining large biomedical multimodal models by assembling the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2303.00915\" target=\"_blank\" rel=\"noopener noreferrer\">largest collection of public biomedical image-text pairs<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> from biomedical research articles, comprising 15 million images and over 30 million image-text pairs. Recently, we investigated using GPT-4 to generate instruction-following data to train a multimodal conversational copilot called <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/llava-med-training-a-large-language-and-vision-assistant-for-biomedicine-in-one-day\/\">LLaVA-Med<\/a>, enabling researchers to interact with biomedical imaging data. Additionally, we are collaborating with clinical stakeholders to train LMMs for precision immuno-oncology, utilizing multimodal fusion to combine EMRs, radiology images, digital pathology, and multi-omics in longitudinal data on cancer patients.<\/p>\n\n\n\n<p>Our ultimate aspiration is to develop precision health copilots that empower all stakeholders in biomedicine and scale real-world evidence generation, optimizing healthcare delivery and accelerating discoveries. We envision a future where clinical research and care are seamlessly integrated, where every clinical observation instantly updates a patient\u2019s health status, and decisions are supported by population-level patient-like-me information. Patients in need of advanced intervention are continuously evaluated for just-in-time clinical trial matching. Life sciences researchers have access to a global real-world data dashboard in real time, initiating in silico trials to generate and test counterfactual hypotheses. Payors and regulators base approval and care decisions on the most comprehensive and up-to-date clinical evidence at the finest granular level. This vision embodies the dream of evidence-based precision health. Generative AI, including large language models, will play a pivotal role in propelling us towards this exciting and transformative future.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This content was given as a keynote at the Workshop of Applied Data Science for Healthcare and covered during a tutorial at the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (opens in new tab), a premier forum for advancement, education, and adoption of the discipline of knowledge discovering and data mining. Recent [&hellip;]<\/p>\n","protected":false},"author":42735,"featured_media":994005,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Hoifung Poon","user_id":"32016"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13553],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[261673],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-959091","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-medical-health-genomics","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[849856],"msr_impact_theme":["Health"],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[952050],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Hoifung Poon","user_id":32016,"display_name":"Hoifung Poon","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/hoifung\/\" aria-label=\"Visit the profile page for Hoifung Poon\">Hoifung Poon<\/a>","is_active":false,"last_first":"Poon, Hoifung","people_section":0,"alias":"hoifung"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"290\" height=\"163\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-MSRNewsletterThumb-290x163-1.png\" class=\"img-object-cover\" alt=\"The KDD2023 logo in white with the dates August 6-10 and the city Long Beach, CA on a green and blue gradient background\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-MSRNewsletterThumb-290x163-1.png 290w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/08\/KDD23-MSRNewsletterThumb-290x163-1-240x135.png 240w\" sizes=\"auto, (max-width: 290px) 100vw, 290px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/hoifung\/\" title=\"Go to researcher profile for Hoifung Poon\" aria-label=\"Go to researcher profile for Hoifung Poon\" data-bi-type=\"byline author\" data-bi-cN=\"Hoifung Poon\">Hoifung Poon<\/a>","formattedDate":"August 10, 2023","formattedExcerpt":"This content was given as a keynote at the Workshop of Applied Data Science for Healthcare and covered during a tutorial at the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (opens in new tab), a premier forum for advancement, education, and adoption&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/959091","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/42735"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=959091"}],"version-history":[{"count":32,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/959091\/revisions"}],"predecessor-version":[{"id":994011,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/959091\/revisions\/994011"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/994005"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=959091"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=959091"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=959091"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=959091"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=959091"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=959091"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=959091"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=959091"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=959091"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=959091"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=959091"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}