{"id":858441,"date":"2022-07-04T11:14:47","date_gmt":"2022-07-04T18:14:47","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/"},"modified":"2023-06-24T07:49:58","modified_gmt":"2023-06-24T14:49:58","slug":"making-the-most-of-text-semantics-to-improve-biomedical-vision-language-processing","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/making-the-most-of-text-semantics-to-improve-biomedical-vision-language-processing\/","title":{"rendered":"Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing"},"content":{"rendered":"<p><strong>ABSTRACT:<\/strong> Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision-language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-specific language understanding. In this paper, we show that principled textual semantic modelling can substantially improve contrastive learning in self-supervised vision-language processing. We release a language model that achieves state-of-the-art results in radiology natural language inference through its improved vocabulary and novel language pretraining objective leveraging semantics and discourse characteristics in radiology reports. Further, we propose a self-supervised joint vision&#8211;language approach with a focus on better text modelling. It establishes new state of the art results on a wide range of publicly available benchmarks, in part by leveraging our new domain-specific language model. We release a new dataset with locally aligned phrase grounding annotations by radiologists to facilitate the study of complex semantic modelling in biomedical vision-language processing. A broad evaluation, including on this new dataset, shows that our contrastive learning approach, aided by textual-semantic modelling, outperforms prior methods in segmentation tasks, despite only using a global-alignment objective.<\/p>\n<div class=\"yt-consent-placeholder\" role=\"region\" aria-label=\"Video playback requires cookie consent\" data-video-id=\"by-JzcUJQpw\" data-poster=\"https:\/\/img.youtube.com\/vi\/by-JzcUJQpw\/maxresdefault.jpg\"><iframe aria-hidden=\"true\" tabindex=\"-1\" title=\"Making the Most of Text Semantics to Improve Biomedical Vision-Language Processing\" width=\"500\" height=\"281\" data-src=\"https:\/\/www.youtube-nocookie.com\/embed\/by-JzcUJQpw?feature=oembed&rel=0&enablejsapi=1\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<div class=\"yt-consent-placeholder__overlay\"><button class=\"yt-consent-placeholder__play\"><svg width=\"42\" height=\"42\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" aria-hidden=\"true\" focusable=\"false\"><g fill=\"none\" fill-rule=\"evenodd\"><circle fill=\"#000\" opacity=\".556\" cx=\"21\" cy=\"21\" r=\"21\"\/><path stroke=\"#FFF\" d=\"M27.5 22l-12 8.5v-17z\"\/><\/g><\/svg><span class=\"yt-consent-placeholder__label\">Video playback requires cookie consent<\/span><\/button><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<h3>Motivation<\/h3>\n<p><strong>Clinical motivation:<\/strong> Growing backlogs of medical image reporting puts pressure on radiologists and leads to errors and omissions.<\/p>\n<p><strong>Scalability:<\/strong> ML models require a vast number of manual annotations (clinicians\u2019 time is precious).\u00a0 Existing models are often limited to a fixed set of abnormalities or body-part.<\/p>\n<p><strong>Domain-specific challenges:<\/strong> Lack of foundation models suitable for health data (e.g., image and text), smaller scale datasets, domain specific-language.<\/p>\n<h3>Approach<\/h3>\n<h4>CXR-BERT language model<\/h4>\n<p><strong>CXR-BERT-General <\/strong>is a language encoder model trained specifically with biomedical text data (e.g., PubMed abstracts, and MIMIC clinical notes) to learn domain specific vocabulary and semantics. In the proposed framework, this canonical model is continually pre-trained on MIMIC-CXR dataset to specialise to chest X-ray radiology reports via masked language modelling (MLM), contrastive learning and text augmentations (sentence shuffle). We have made two models available on Hugging Face at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/biovil-models\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/aka.ms\/biovil-models<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>:<\/p>\n<ul>\n<li>The new chest X-ray (CXR) domain-specific language model, <strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/BiomedVLP-CXR-BERT-specialized\" target=\"_blank\" rel=\"noopener noreferrer\">CXR-BERT-specialized<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong> (Fig. 1)<\/li>\n<li>A canonical model that can be used for other applications, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/BiomedVLP-CXR-BERT-general\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>CXR-BERT-general<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-882855\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT.png\" alt=\"CXR-BERT\" width=\"950\" height=\"354\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT.png 2227w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT-300x112.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT-1024x381.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT-768x286.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT-1536x572.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT-2048x762.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/CXR-BERT-240x89.png 240w\" sizes=\"auto, (max-width: 950px) 100vw, 950px\" \/><\/p>\n<p><em>Figure 1: The proposed CXR-BERT text encoder has three phases of pretraining: (I) Pre-training with biomedical corpora (e.g., PubMed Abstracts, MIMIC-III clinical notes), (II) building a biomedical\/clinical vocabulary, and (III) further specialising to chest radiology domain by performing contrastive learning between radiology reports and leveraging text augmentations.<\/em><\/p>\n<h4>BioViL<\/h4>\n<p>A <strong>self-supervised Vision-Language Processing (VLP) <\/strong>approach for paired biomedical data (BioViL, Fig.2). <strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/biovil-code\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/aka.ms\/biovil-code<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-882861\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL.png\" alt=\"BioVIL\" width=\"950\" height=\"304\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL.png 2232w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL-300x96.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL-1024x328.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL-768x246.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL-1536x491.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL-2048x655.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/BioVIL-240x77.png 240w\" sizes=\"auto, (max-width: 950px) 100vw, 950px\" \/><\/p>\n<p><em>Figure 2: BioViL leverages our radiology-specific text encoder (CXR-BERT) in a multi-modal contrastive learning framework to train image and text encoders that can be aligned in the joint latent space. The proposed learning framework can be coupled with local-contrastive objectives as well.<\/em><\/p>\n<h4>MS-CXR dataset<\/h4>\n<p>MS-CXR is a <strong>phrase grounding dataset <\/strong>for chest X-ray data. It allows fine-grained evaluation of joint text-image understanding in a biomedical domain. This dataset was manually annotated and curated by two expert radiologist and comprises 1162 image bounding-box & sentence pairs with 8 different high-level clinical findings.<\/p>\n<p>This dataset is released on PhysioNet: <strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/ms-cxr\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/aka.ms\/ms-cxr<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-882867\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR.png\" alt=\"MS-CXR\" width=\"953\" height=\"219\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR.png 2176w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR-300x69.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR-1024x236.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR-768x177.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR-1536x354.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR-2048x472.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/MS-CXR-240x55.png 240w\" sizes=\"auto, (max-width: 953px) 100vw, 953px\" \/><\/p>\n<p><em>Figure 3: Examples from the MS-CXR dataset. \u00a0The overlaid colormap is showing cosine similarities of the embeddings obtained from image patches and radiology sentences, where the red colour represents better agreement between the two modalities.<\/em><\/p>\n<h3>Getting started<\/h3>\n<p>The best way to get started is by running the\u00a0<strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/hi-ml\/tree\/main\/hi-ml-multimodal\/notebooks\/phrase_grounding.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">phrase grounding notebook<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>. All the dependencies will be installed upon execution, so Python 3.7 and\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/jupyter.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Jupyter<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0are the only requirements to get started.<\/p>\n<p>The notebook can also be run on\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/mybinder.org\/v2\/gh\/microsoft\/hi-ml\/HEAD?labpath=hi-ml-multimodal%2Fnotebooks%2Fphrase_grounding.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\">Binder<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, without the need to download any code or install any libraries:<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/mybinder.org\/v2\/gh\/microsoft\/hi-ml\/HEAD?labpath=hi-ml-multimodal%2Fnotebooks%2Fphrase_grounding.ipynb\" target=\"_blank\" rel=\"noopener noreferrer\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-882843\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/LaunchBinder.png\" alt=\"LaunchBinder\" width=\"109\" height=\"20\" \/><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<h3>Resources:<\/h3>\n<ul>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/hi-ml.readthedocs.io\/en\/latest\/multimodal.html\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>HI-ML Multimodal Toolki<\/strong>t<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a Python library for multi-modal learning for healthcare and life sciences.\n<ul>\n<li>The multimodal toolkit is part of the more general <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/hi-ml-oss-toolbox\/\"><strong>HI-ML Open-source Toolkit<\/strong><\/a> that helps to simplify deep learning models for healthcare and life sciences.<\/li>\n<\/ul>\n<\/li>\n<li>We have made two models available on Hugging Face at <strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/biovil-models\" target=\"_blank\" rel=\"noopener noreferrer\">https:\/\/aka.ms\/biovil-models<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong>\n<ul>\n<li>The new chest X-ray (CXR) domain-specific language model, <strong><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/BiomedVLP-CXR-BERT-specialized\" target=\"_blank\" rel=\"noopener noreferrer\">CXR-BERT-specialized<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/strong> (Fig. 1)<\/li>\n<li>A canonical model that can be used for other applications, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/huggingface.co\/microsoft\/BiomedVLP-CXR-BERT-general\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>CXR-BERT-general<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n<\/li>\n<li>MS-CXR dataset is released on PhysioNet: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/aka.ms\/ms-cxr\" target=\"_blank\" rel=\"noopener noreferrer\"><strong>https:\/\/aka.ms\/ms-cxr<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/li>\n<\/ul>\n<h3>Acknowledgements<\/h3>\n<p>We would like to thank <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jagonz\/\">Dr Javier Gonz\u00e1lez<\/a> and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/fperezgarcia\/\">Fernando P\u00e9rez-Garcia<\/a> for their valuable feedback and contributions, Hannah Richardson for helping with the compliance review of the datasets, Dr Matthew Lungren for their clinical input and data annotations provided to this study, and lastly <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/kenjitak\/\">Dr Kenji Takeda<\/a> for preparing the web-content, helping with its presentation and supporting the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/hi-ml-oss-toolbox\/\">HI-ML OSS<\/a> program.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>ABSTRACT: Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision-language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-specific [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"The European Conference on Computer Vision (ECCV)","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2022-10-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"https:\/\/eccv2022.ecva.net\/","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13562,13553],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[246694,246691,246688,256045,246808],"msr-conference":[262684],"msr-journal":[],"msr-impact-theme":[261673],"msr-pillar":[],"class_list":["post-858441","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-medical-health-genomics","msr-locale-en_us","msr-field-of-study-artificial-intelligence","msr-field-of-study-computer-science","msr-field-of-study-computer-vision","msr-field-of-study-healthcare","msr-field-of-study-natural-language-processing"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-10-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/pdf\/2204.09817v4.pdf","label_id":"252679","label":0}],"msr_related_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/aka.ms\/biovil-code","label_id":"264520","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/07\/biovil_eccv_2022_poster_final.pdf","label_id":"265542","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/aka.ms\/ms-cxr","label_id":"243118","label":0},{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/aka.ms\/cxr-bert","label_id":"243118","label":0}],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":882669,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2022\/10\/biovil_eccv_2022_poster_final.pdf"}],"msr-author-ordering":[{"type":"guest","value":"benedikt-boecking","user_id":882687,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=benedikt-boecking"},{"type":"user_nicename","value":"Naoto Usuyama","user_id":38670,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Naoto Usuyama"},{"type":"user_nicename","value":"Shruthi Bannur","user_id":39213,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shruthi Bannur"},{"type":"user_nicename","value":"Daniel Coelho de Castro","user_id":39811,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Daniel Coelho de Castro"},{"type":"user_nicename","value":"Anton Schwaighofer","user_id":31059,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Anton Schwaighofer"},{"type":"user_nicename","value":"Stephanie Hyland","user_id":38458,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Stephanie Hyland"},{"type":"text","value":"Maria Teodora Wetscherek","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Tristan Naumann","user_id":37929,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Tristan Naumann"},{"type":"user_nicename","value":"Aditya Nori","user_id":30829,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Aditya Nori"},{"type":"user_nicename","value":"Javier Alvarez-Valle","user_id":32137,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Javier Alvarez-Valle"},{"type":"user_nicename","value":"Hoifung Poon","user_id":32016,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Hoifung Poon"},{"type":"user_nicename","value":"Ozan Oktay","user_id":38706,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ozan Oktay"}],"msr_impact_theme":["Health"],"msr_research_lab":[199561,849856],"msr_event":[],"msr_group":[780706,952050,1143270],"msr_project":[978063],"publication":[],"video":[],"msr-tool":[917604,930696],"msr_publication_type":"inproceedings","related_content":{"projects":[{"ID":978063,"post_title":"Project MAIRA","post_name":"project-maira","post_type":"msr-project","post_date":"2023-11-24 01:00:00","post_modified":"2026-02-03 08:28:34","post_status":"publish","permalink":"https:\/\/www.microsoft.com\/en-us\/research\/project\/project-maira\/","post_excerpt":"Multimodal AI for Radiology Applications Project MAIRA is a research project from Microsoft Health Futures that builds innovative, multimodal AI technology to assist radiologists in delivering effective patient care and to empower them in their work. The goal of the project is to leverage rich healthcare data \u2013 including medical domain knowledge, temporal sequences of medical images and corresponding radiology reports, and other clinical context information \u2013 as inputs to developing multimodal frontier models that&hellip;","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/978063"}]}}]},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/858441","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":18,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/858441\/revisions"}],"predecessor-version":[{"id":884352,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/858441\/revisions\/884352"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=858441"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=858441"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=858441"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=858441"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=858441"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=858441"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=858441"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=858441"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=858441"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=858441"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=858441"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=858441"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=858441"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}