{"id":1160723,"date":"2026-01-27T09:00:00","date_gmt":"2026-01-27T17:00:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=1160723"},"modified":"2026-01-26T18:32:05","modified_gmt":"2026-01-27T02:32:05","slug":"unirg-scaling-medical-imaging-report-generation-with-multimodal-reinforcement-learning","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/unirg-scaling-medical-imaging-report-generation-with-multimodal-reinforcement-learning\/","title":{"rendered":"UniRG: Scaling medical imaging report generation with multimodal reinforcement learning"},"content":{"rendered":"\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1400\" height=\"788\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1.jpg\" alt=\"Three white icons on a blue\u2011green gradient: a ribcage scan, a circuit\u2011style document, and a neural network diagram\" class=\"wp-image-1160804\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1.jpg 1400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w\" sizes=\"auto, (max-width: 1400px) 100vw, 1400px\" \/><\/figure>\n\n\n\n<div style=\"padding-bottom:0; padding-top:0\" class=\"wp-block-msr-immersive-section alignfull row wp-block-msr-immersive-section\">\n\t\n\t<div class=\"container\">\n\t\t<div class=\"wp-block-msr-immersive-section__inner wp-block-msr-immersive-section__inner--narrow\">\n\t\t\t<div class=\"wp-block-columns mb-10 pb-1 pr-1 is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\" style=\"box-shadow:var(--wp--preset--shadow--outlined)\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h2 class=\"wp-block-heading h3\" id=\"at-a-glance\">At a glance<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>AI-driven medical image report generation can help medical providers become more efficient and productive.<\/li>\n\n\n\n<li>Current models are difficult to train because reporting practices vary widely among providers.<\/li>\n\n\n\n<li>Universal Report Generation (UniRG) uses reinforcement learning to align model training with real-world radiology practice rather than proxy text-generation objectives.<\/li>\n\n\n\n<li>UniRG&nbsp;has&nbsp;achieved&nbsp;state-of-the-art&nbsp;performance across datasets, metrics, diagnostic tasks, longitudinal settings, and demographic subgroups.<\/li>\n\n\n\n<li>Test results show that reinforcement learning, guided by clinically meaningful reward signals, can substantially improve the reliability and generality of medical vision\u2013language models.<\/li>\n<\/ul>\n<\/div>\n<\/div>\t\t<\/div>\n\t<\/div>\n\n\t<\/div>\n\n\n\n<p>AI can be used to produce clinically meaningful radiology reports using medical images like chest x-rays. Medical image report generation can reduce reporting burden while improving workflow efficiency for healthcare professionals. Beyond the real-world benefits, report generation has also become a critical benchmark for evaluating multimodal reasoning in healthcare AI.<\/p>\n\n\n\n<p>Despite recent advances driven by large vision\u2013language models, current systems still face major limitations in real-world clinical settings. One challenge stems from the wide variation in radiology reporting practices across institutions, departments, and patient populations. A model trained with supervised fine-tuning on one set of data may learn its specific phrasing and conventions instead of more general patterns\u2014a problem known as <em>overfitting<\/em>. As a result, the model performs well on that data but delivers poor results when evaluated on unseen institutions or external datasets. Moreover, since model training is often aimed at producing text that looks similar to existing reports, some well written but clinically inaccurate reports can slip through.<\/p>\n\n\n\n<p>In this blog, we introduce <strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/aka.ms\/unirg-paper\">Universal Report Generation (UniRG)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>, a reinforcement learning\u2013based framework for medical imaging report generation. This work is a research prototype intended to advance medical AI research and is not validated for clinical use. UniRG uses reinforcement learning as a unifying mechanism to directly optimize clinically grounded evaluation signals, aligning model training with real-world radiology practice rather than proxy text-generation objectives. Using this framework, we train <strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/aka.ms\/unirg-paper\">UniRG-CXR<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>, a state-of-the-art chest x-ray report generation model at scale, spanning over 560,000 studies, 780,000 images, and 226,000 patients from more than 80 medical institutions.<\/p>\n\n\n\n<p>To our knowledge, this is the first report generation model to achieve consistent state-of-the-art performance across report-level metrics, disease-level diagnostic accuracy, cross-institution generalization, longitudinal report generation, and demographic subgroups. These results demonstrate that reinforcement learning, when guided by clinically meaningful reward signals, can substantially improve both the reliability and generality of medical vision\u2013language models.<\/p>\n\n\n\n\t<div class=\"border-bottom border-top border-gray-300 mt-5 mb-5 msr-promo text-center text-md-left alignwide\" data-bi-aN=\"promo\" data-bi-id=\"1144028\">\n\t\t\n\n\t\t<p class=\"msr-promo__label text-gray-800 text-center text-uppercase\">\n\t\t<span class=\"px-4 bg-white display-inline-block font-weight-semibold small\">PODCAST SERIES<\/span>\n\t<\/p>\n\t\n\t<div class=\"row pt-3 pb-4 align-items-center\">\n\t\t\t\t\t\t<div class=\"msr-promo__media col-12 col-md-5\">\n\t\t\t\t<a class=\"bg-gray-300 display-block\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/the-ai-revolution-in-medicine-revisited\/\" aria-label=\"The AI Revolution in Medicine, Revisited\" data-bi-cN=\"The AI Revolution in Medicine, Revisited\" target=\"_blank\">\n\t\t\t\t\t<img decoding=\"async\" class=\"w-100 display-block\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2025\/06\/Episode7-PeterBillSebastien-AIRevolution_Hero_Feature_River_No_Text_1400x788.jpg\" alt=\"Illustrated headshot of Bill Gates, Peter Lee, and S\u00e9bastien Bubeck\" \/>\n\t\t\t\t<\/a>\n\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"msr-promo__content p-3 px-5 col-12 col-md\">\n\n\t\t\t\t\t\t\t\t\t<h2 class=\"h4\">The AI Revolution in Medicine, Revisited<\/h2>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<p id=\"the-ai-revolution-in-medicine-revisited\" class=\"large\">Join Microsoft\u2019s Peter Lee on a journey to discover how AI is impacting healthcare and what it means for the future of medicine.<\/p>\n\t\t\t\t\n\t\t\t\t\t\t\t\t<div class=\"wp-block-buttons justify-content-center justify-content-md-start\">\n\t\t\t\t\t<div class=\"wp-block-button\">\n\t\t\t\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/story\/the-ai-revolution-in-medicine-revisited\/\" aria-describedby=\"the-ai-revolution-in-medicine-revisited\" class=\"btn btn-brand glyph-append glyph-append-chevron-right\" data-bi-cN=\"The AI Revolution in Medicine, Revisited\" target=\"_blank\">\n\t\t\t\t\t\t\tListen now\t\t\t\t\t\t<\/a>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\t\t\t\t<\/div><!--\/.msr-promo__content-->\n\t<\/div><!--\/.msr-promo__inner-wrap-->\n\t<\/div><!--\/.msr-promo-->\n\t\n\n\n<h2 class=\"wp-block-heading\" id=\"a-unified-framework-for-scaling-medical-image-report-generation\">A unified framework for scaling medical image report generation<\/h2>\n\n\n\n<p>UniRG&nbsp;builds&nbsp;state-of-the-art&nbsp;report generation models by combining supervised fine-tuning with reinforcement learning, which&nbsp;optimizes&nbsp;a composite reward that integrates rule-based metrics, model-based semantic metrics, and LLM-based clinical error signals. This approach allows the resulting model&nbsp;UniRG-CXR to learn from diverse data sources, move beyond dataset-specific reporting patterns, and learn representations that generalize across institutions, metrics, and clinical contexts. Notably,&nbsp;UniRG-CXR sets a new state of&nbsp;the art on the authoritative&nbsp;<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/rexrank.ai\/\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>ReXrank leaderboard<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,&nbsp;a&nbsp;public leaderboard for chest X-ray image interpretation,&nbsp;as of&nbsp;01\/22\/2026, surpassing&nbsp;previous&nbsp;best models&nbsp;by&nbsp;substantial&nbsp;margins (Figure 1).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2141\" height=\"2560\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-scaled.png\" alt=\"Fig 1: Overview diagram of the UniRG-CXR framework showing training data sources, reinforcement-learning\u2013based training with composite rewards, evaluation on multiple datasets, and a results panel demonstrating state-of-the-art performance across benchmarks.\" class=\"wp-image-1160837\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-scaled.png 2141w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-251x300.png 251w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-857x1024.png 857w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-768x918.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-1285x1536.png 1285w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-1713x2048.png 1713w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig1_UPDATED-151x180.png 151w\" sizes=\"auto, (max-width: 2141px) 100vw, 2141px\" \/><figcaption class=\"wp-element-caption\">Figure 1. Overview of UniRG-CXR. (a) Training Data: UniRG-CXR is trained on the training splits of MIMIC-CXR, CheXpert Plus, and ReXGradient-160k, covering diverse institutions and patient demographics. (b) Training and Rewards: Taking input from the current image, clinical context (e.g., indication), and optionally prior studies, UniRG-CXR uses GRPO reinforcement learning to optimize composite rewards that combine rule-based, model-based, and LLM-based metrics. (c) Evaluation: We assess UniRG-CXR on held-out test sets (MIMIC-CXR, CheXpert Plus, ReXGradient), and unseen datasets (IU Xray and proprietary data). Report quality measured using ReXrank metrics and an LLM-based clinical-error metric, while diagnostic ability is evaluated via F1-based disease classification from generated reports. (d) ReXrank Results: UniRG-CXR achieves SOTA performance across four datasets and two generation settings (findings only and findings + impression), showing substantial gains over prior state-of-the-art systems.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"universal-improvements-across-metrics-and-clinical-errors\">Universal improvements across metrics and clinical errors<\/h2>\n\n\n\n<p>Rather than excelling on one metric at the expense of others, UniRG-CXR delivers balanced improvements across many different measures of report quality. More importantly, it produces reports with substantially fewer clinically significant errors. This indicates that the model is not just learning how to sound like a radiology report, but is better capturing the underlying clinical facts. Explicitly optimizing for clinical correctness helps the model avoid common failure modes where fluent language masks incorrect or missing findings (Figure 2).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"871\" height=\"835\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig2.png\" alt=\"Fig 2: Multi-panel figure showing UniRG-CXR\u2019s state-of-the-art performance: leaderboard gains across metrics, ablation studies demonstrating benefits of combined reinforcement-learning rewards, improved training dynamics with fewer clinical errors, qualitative case studies with error-free reports, and a distribution showing fewer high-error reports compared to prior models.\" class=\"wp-image-1160707\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig2.png 871w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig2-300x288.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig2-768x736.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig2-188x180.png 188w\" sizes=\"auto, (max-width: 871px) 100vw, 871px\" \/><figcaption class=\"wp-element-caption\">Figure 2. UniRG-CXR achieves state-of-the-art performance, delivering consistent and comprehensive performance gains across metrics. (a) On the ReXrank leaderboard, UniRG-CXR (green) shows robust, universal improvement across all evaluation metrics.\u202f (b). Starting from the same SFT checkpoint, RL with our combined reward achieves more balanced gains across metrics and the highest RadCliQ-v1 score compared to RL on single metrics. This ablation study is trained and tested on MIMIC (c). Ablation study on the training dynamics shows RL full (UniRG-CXR) achieves significantly better RadCliQ-v1 score than RL only on BLEU. (d). During training, RL full (UniRG-CXR) shows a steady decrease in clinical errors per report as compared with a fluctuating trajectory without consistent improvement from an ablation run without error awareness (i.e. removing CheXprompt metric optimization). Both (c) and (d) show results on 1024 MIMIC validation set from ablations that are trained on MIMIC. (e). Case studies illustrate that UniRG-CXR can produce error-free reports, unlike MedVersa and MedGemma. (f). UniRG-CXR yields a substantially higher proportion of reports with $\\leq 1$ error and fewer with $\\geq 4$ errors than prior models.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"strong-performance-in-longitudinal-report-generation\">Strong performance in longitudinal report generation<\/h2>\n\n\n\n<p>In clinical practice, radiologists often compare current images with prior exams to determine whether a condition is improving, worsening, or unchanged. UniRG-CXR is able to incorporate this historical information effectively, generating reports that reflect meaningful changes over time. This allows the model to describe new findings, progression, or resolution of disease more accurately, moving closer to how radiologists reason across patient histories rather than treating each exam in isolation (Figure 3).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2548\" height=\"2560\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-scaled.png\" alt=\"Fig 3: Multi-panel results demonstrating UniRG-CXR\u2019s advantages in longitudinal chest X-ray report generation, including superior performance over prior models and a non-longitudinal ablation across encounters, consistent gains at increasing follow-up complexity, improved handling of temporal disease changes, and qualitative examples of accurate longitudinal predictions.\" class=\"wp-image-1160838\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-scaled.png 2548w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-300x300.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-1019x1024.png 1019w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-150x150.png 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-768x772.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-1529x1536.png 1529w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-2039x2048.png 2039w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-180x180.png 180w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/fig3_UPDATED-179x180.png 179w\" sizes=\"auto, (max-width: 2548px) 100vw, 2548px\" \/><figcaption class=\"wp-element-caption\">Figure 3. UniRG-CXR enhances longitudinal report generation. (a). Comparing UniRG-CXR and its non-longitudinal ablation with prior models on longitudinal report generation, we show UniRG-CXR exhibits the best performance and the longitudinal information is beneficial to the performance. (b). UniRG-CXR achieves the best performance across different longitudinal encounter points ranging from the first encounter to the more complex 5th+ encounters, showcasing its improvements are across the board. In comparison, prior models such as GPT-5, GPT-4o and MedGemma are barely surpassing the copy prior report baseline (grey lines).\u202f (c). Compared with prior models which barely improve over the copy prior baseline (dashed line), UniRG-CXR significantly and consistently improves performance across different temporal disease change categories including new development, no change, progression and regression (categorized by GPT-5 on ground truth report). Qualitative examples are shown for each category where UniRG-CXR correctly predicts the temporal change based on the input. All results in this figure are on MIMIC test set with prior information where available.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"robust-generalization-across-institutions-and-populations\">Robust generalization across institutions and populations<\/h2>\n\n\n\n<p>UniRG-CXR maintains strong performance even when applied to data from institutions it has never seen before. This suggests that the model is learning general clinical patterns rather than memorizing institution-specific reporting styles. In addition, its performance remains stable across different patient subgroups, including age, gender, and race. This robustness is critical for real-world deployment, where models must perform reliably across diverse populations and healthcare environments (Figure 4).<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"871\" height=\"761\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig4.png\" alt=\"Fig 4: Multi-panel figure showing UniRG-CXR\u2019s generalization and robustness: zero-shot evaluation with strong performance on unseen datasets, superior condition-level diagnostic F1 scores, and consistent accuracy across gender, age, and race subgroups compared with prior models.\" class=\"wp-image-1160706\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig4.png 871w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig4-300x262.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig4-768x671.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG_fig4-206x180.png 206w\" sizes=\"auto, (max-width: 871px) 100vw, 871px\" \/><figcaption class=\"wp-element-caption\">Figure 4. Generalization and robustness of UniRG-CXR. (a). We evaluate UniRG-CXR in a zero-shot setting on two datasets from previously unseen institutions: IU-Xray and PD (proprietary data). UniRG-CXR consistently outperforms prior models, maintaining substantial performance gains in this challenging setup. (b) and (c) present condition-level F1 scores on MIMIC-CXR and PD and highlight that UniRG-CXR remains the overall top-performing model in condition-level diagnostic accuracy. (d). UniRG-CXR demonstrates stable and robust performance across gender, age, and race subgroups, all of which exceed the performance of the second-best model (the dashed lines).<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"unirg-is-a-promising-step-toward-scaling-medical-imaging-report-generation\">UniRG is a promising step toward scaling medical imaging report generation<\/h2>\n\n\n\n<p>UniRG introduces a reinforcement learning\u2013based framework that rethinks how medical imaging report generation models are trained and evaluated. By directly optimizing clinically grounded reward signals, UniRG-CXR achieves state-of-the-art performance across datasets, metrics, diagnostic tasks, longitudinal settings, and demographic subgroups, addressing longstanding limitations of supervised-only approaches.<\/p>\n\n\n\n<p>Looking ahead, this framework can be extended to additional imaging modalities and clinical tasks, and combined with richer multimodal patient data such as prior imaging, laboratory results, and clinical notes. More broadly, UniRG highlights the promise of reinforcement learning as a core component of next-generation medical foundation models that are robust, generalizable, and clinically aligned.<\/p>\n\n\n\n<p>UniRG reflects Microsoft\u2019s larger commitment to <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/ai.nejm.org\/doi\/full\/10.1056\/AI-S2300233\" target=\"_blank\" rel=\"noopener noreferrer\">advancing multimodal generative AI for precision health<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, with other exciting progress such as <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/gigapath-whole-slide-foundation-model-for-digital-pathology\/\">GigaPath<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/biomedclip-a-multimodal-biomedical-foundation-model-pretrained-from-fifteen-million-scientific-image-text-pairs\/\">BiomedCLIP<\/a>, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.nature.com\/articles\/s41467-025-58344-x\" target=\"_blank\" rel=\"noopener noreferrer\">LLaVA-Rad<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/biomedjourney-counterfactual-biomedical-image-generation-by-instruction-learning-from-multimodal-patient-journeys\/\">BiomedJourney<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/biomedparse-a-foundation-model-for-smarter-all-in-one-biomedical-image-analysis\/\">BiomedParse<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/trialscope-a-unifying-causal-framework-for-scaling-real-world-evidence-generation-with-biomedical-language-models\/\">TrialScope<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/generative-medical-event-models-improve-with-scale\/\">Curiosity<\/a>.<\/p>\n\n\n\n<p>Paper co-authors: <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/qianchuliu\/\">Qianchu Liu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shezhan\/\">Sheng Zhang<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/guanghuiqin\/\">Guanghui Qin<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yugu1\/\">Yu Gu<\/a>, Ying Jin, Sam Preston, Yanbo Xu, Sid Kiblawi, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/yimwenwai\/\">Wen-wai Yim<\/a>, Tim Ossowski, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tristan\/\">Tristan Naumann<\/a>, Mu Wei, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/hoifung\/\">Hoifung Poon<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI can help generate medical image reports, but today\u2019s models struggle with varying reporting schemes. Learn how UniRG uses reinforcement learning to boost performance of medical vision-language models.<\/p>\n","protected":false},"author":43518,"featured_media":1160804,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_hide_image_in_river":null,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13562,13545,13553],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[269148,243984,269142],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-1160723","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-language-technologies","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-post-option-approved-for-river","msr-post-option-blog-homepage-featured","msr-post-option-include-in-river"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199565,849856],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[952050],"related-projects":[],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Sheng Zhang","user_id":39087,"display_name":"Sheng Zhang","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/shezhan\/?lang=fr-ca\" aria-label=\"Visitez la page de profil pour Sheng Zhang\">Sheng Zhang<\/a>","is_active":false,"last_first":"Zhang, Sheng","people_section":0,"alias":"shezhan"},{"type":"user_nicename","value":"Flora Liu","user_id":43488,"display_name":"Flora Liu","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/qianchuliu\/?lang=fr-ca\" aria-label=\"Visitez la page de profil pour Flora Liu\">Flora Liu<\/a>","is_active":false,"last_first":"Liu, Flora","people_section":0,"alias":"qianchuliu"},{"type":"user_nicename","value":"Guanghui Qin","user_id":43527,"display_name":"Guanghui Qin","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/guanghuiqin\/?lang=fr-ca\" aria-label=\"Visitez la page de profil pour Guanghui Qin\">Guanghui Qin<\/a>","is_active":false,"last_first":"Qin, Guanghui","people_section":0,"alias":"guanghuiqin"},{"type":"guest","value":"mu-wei-2","user_id":"1160724","display_name":"Mu Wei","author_link":"<a href=\"https:\/\/www.linkedin.com\/in\/mu-wei-038a3849\/\" aria-label=\"Visitez la page de profil pour Mu Wei\">Mu Wei<\/a>","is_active":true,"last_first":"Wei, Mu","people_section":0,"alias":"mu-wei-2"},{"type":"user_nicename","value":"Hoifung Poon","user_id":32016,"display_name":"Hoifung Poon","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/hoifung\/?lang=fr-ca\" aria-label=\"Visitez la page de profil pour Hoifung Poon\">Hoifung Poon<\/a>","is_active":false,"last_first":"Poon, Hoifung","people_section":0,"alias":"hoifung"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-960x540.jpg\" class=\"img-object-cover\" alt=\"Three white icons on a blue\u2011green gradient: a ribcage scan, a circuit\u2011style document, and a neural network diagram\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-1024x576.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-240x135.jpg 240w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2026\/01\/UniRG-BlogHeroFeature-1400x788-1.jpg 1400w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"January 27, 2026","formattedExcerpt":"AI can help generate medical image reports, but today\u2019s models struggle with varying reporting schemes. Learn how UniRG uses reinforcement learning to boost performance of medical vision-language models.","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160723","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/43518"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=1160723"}],"version-history":[{"count":13,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160723\/revisions"}],"predecessor-version":[{"id":1160915,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/1160723\/revisions\/1160915"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/1160804"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=1160723"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=1160723"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=1160723"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=1160723"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=1160723"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=1160723"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=1160723"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=1160723"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=1160723"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=1160723"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=1160723"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}