Artificial intelligence has rapidly transformed the medical field, with foundation models driving breakthroughs in natural language processing, computer vision, and multimodal learning. Medical scenarios often face challenges such as data scarcity, long-tailed tasks, and high accuracy requirements, making traditional approaches less effective. Foundation models, through large-scale pretraining and fine-tuning, enable knowledge transfer and generalization under limited labelled data, improving decision-making and patient outcomes.

However, medical data is highly diverse and fragmented—spanning clinical text, imaging, genomics, and physiological signals—while privacy and regulatory constraints hinder integration. This fragmentation limits model performance and cross-institutional collaboration, highlighting the need for secure, scalable frameworks for multimodal fusion. Emerging agentic systems powered by LLMs and VLMs offer new possibilities by simulating expert collaboration for clinical decision-making and education. These systems enhance interpretability, reliability, and provide immersive learning environments for medical students and professionals. The digital transformation of medical education addresses resource limitations and inconsistent evaluation in traditional training. AI-driven personalized learning, virtual patient simulations, and real-time feedback accelerate skill acquisition, reduce costs, and enable standardized assessment. Finally, clinical deployment remains critical for creating impact in real-world clinical applications. Despite research progress, adapting and integrating foundation models into healthcare workflows face challenges in validation, privacy, interoperability, and compliance. Close collaboration with healthcare institutions is essential to translate technological advances into improved patient care.
In this project, we aim to explore and innovate across several research themes, including medical LLMs, multi-modal fusion, biomedical data synthesis, agentic medical systems, agent based medical education, and clinical translation and deployment.
If you are passionate about harnessing cutting-edge multimodal foundation models to accelerate medical discovery and clinical translation, we warmly invite you to join the Microsoft Research Asia StarTrack Scholars Program. Applications are now open for the 2025 program. For more details and to submit your registration, visit our official website: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research.
1) Foundation Model: Foundation models have rapidly evolved from task-specific systems to versatile, general-purpose learners that unify diverse data modalities. Their emergence marks a structural shift in AI, moving from narrow pipelines to pretrain-then-adapt paradigms that scale with data and compute. This shift is especially consequential for healthcare because data are sparse and siloed, tasks are long-tailed, and accuracy is critical. By distilling broad medical knowledge from large, ethically curated datasets, foundation models reduce annotation burdens, improve generalization across institutions and patient populations, and provide a common substrate for downstream clinical tasks.
Our research includes adapting general-purpose models such as LLMs to medical applications as well as developing specialized foundation models for domains like medical imaging and bioinformatics. In medical imaging, we are exploring a disease-centric image foundation model and a versatile dermatology foundation model. Among these modalities, time-series medical data—including ECG, EEG, vital signs, and laboratory results—pose unique challenges due to their irregular sampling, heterogeneous frequencies, and frequent missingness. To address these issues, the team has developed MIRA, a medical time-series foundation model equipped with continuous-time positional encoding, frequency-specific expert routing, and a Neural ODE–based extrapolation module to capture complex temporal dynamics. Pretrained on large-scale clinical corpora covering hundreds of billions of time points, MIRA achieves superior forecasting accuracy and cross-domain generalization, paving the way for robust and adaptive medical AI. Further efforts are to build foundation modelling paradigm to more rich modalities, including medical imaging and clinical narratives.
By advancing efficient training strategies, improving the use of medical data, and rigorously studying how to adopt foundation models in real clinical settings, we aim to deliver tangible benefits in real world care: higher diagnostic accuracy, faster and more reliable decision-making, reduced clinician burden, and more equitable access to high-quality healthcare, ultimately improving patient outcomes.
2) Multi-Modality Fusion: The advancement of AI healthcare research faces significant challenges due to extensive data fragmentation across institutions and regions, as well as the necessity for customized integration of domain-specific knowledge for different modalities. As a result, constructing unified large-scale datasets for foundational model training remains infeasible. To overcome this challenge, we propose a new research paradigm that pre-trains foundation specialists on independent data sources and adaptively integrates them for specific medical research tasks. Specifically, we introduce a domain knowledge–augmented hybrid pre-training framework that develops foundation specialists tailored to individual medical modalities and domains. Through a modality-anchored alignment mechanism, multiple specialists can be integrated to collaboratively address targeted problems, enabling effective interaction while preserving their pre-trained knowledge . This approach enables scalable and flexible deployment across diverse medical scenarios while preserving modality-specific strengths.
3)Medical Data Synthesis. Medical data synthesis plays an increasingly critical role in advancing healthcare AI, addressing persistent barriers such as data scarcity, privacy constraints, and class imbalance. By generating realistic yet privacy-preserving data, synthetic modelling helps accelerate AI development while ensuring ethical compliance.
For temporal modalities, the team introduced a target-oriented diffusion framework, TarDiff, that moves beyond statistical replication to generate task-beneficial synthetic data. Through influence-guided generation, TarDiff quantifies each sample’s contribution to downstream model performance and steers the diffusion process toward data that enhance clinical prediction and reasoning. This approach improves model generalization and rare-condition representation while maintaining temporal and physiological fidelity, achieving up to 30–50% performance gains in imbalanced and rare-disease prediction tasks by augmenting limited real-world datasets with high-quality synthetic samples.
For imaging modalities, the team introduced AURAD, an anatomy–pathology unified radiology synthesis framework designed to augment datasets and strengthen model generalization in data-limited clinical environments. AURAD jointly models anatomical structures and multi-pathology coexistence, achieving clinically aligned and fine-grained control over image synthesis. Through prompt-guided mask generation and expert model filtering, it ensures visual realism, clinical plausibility, and strong utility for downstream imaging tasks. Continuing development focuses on extending AURAD to model disease trajectories and therapeutic responses over time, laying the foundation for a causally coherent generative system.
4) Agentic System: This them focuses on building a multi-agentic system powered by LLMs and VLMs to support doctors, medical educators and researchers in both their professional development and clinical practice. The system simulates collaborative environments composed of expert agents representing clinicians, researchers, and patients, enabling dynamic and knowledge-grounded interactions. In the context of clinical decision-making, the system instantiates multiple doctor agents that collaboratively engage in the processes of diagnosis and treatment planning. These expert agents integrate diverse reasoning strategies such as planning, reflection, debate, and specialized tool invocation to gather collective medical intelligence and achieve evidence-based consensus. The communication and coordination among agents are optimized for efficiency, interpretability, and clinical effectiveness, ensuring reliable and explainable multi-agent collaboration. Beyond clinical applications, the agentic system extends naturally to medical education and training. It offers a simulation-driven environment where students interact with educator and patient agents to strengthen clinical knowledge, reasoning, empathy, and communication skills.

5) AI-Transformed Medical Education. We aim to harness AI to revolutionize medical education for both students and professionals. Building on an agentic system, we are creating an agentic medical skills room—an immersive, outcomes-driven training environment that brings this vision to life. Key features of the skills room: 1) Personalized learning pathways powered by adaptive LLMs. 2) Immersive simulations through agent-based role-play (educators, patients, peers). 3) Real-time feedback and structured assessment to strengthen clinical reasoning and communication skills. During each encounter, learners receive immediate guidance on missed red flags, empathy phrasing, and teach-back quality, while competency rubrics generate standardized, comparable scores. A dedicated multimodal foundation model–based educator agent enhances fluency, clarity, and multilingual interactions, reflecting our expertise in speech and multimodal AI. Completed modules produce verifiable evidence and integrate seamlessly with educator dashboards for cohort management, progression tracking, and secure export to institutional systems under robust governance. By combining adaptive tutoring, realistic simulation, and continuous assessment, the Skills Room delivers measurable skill gains, accelerates time-to-competency, and scales professional development for both students and clinicians.

6) Clinical Translation and Deployment for Improved Patient Care. We focus on bridging the gap between foundational AI research and real-world clinical applications by enabling the translation and deployment of AI systems in clinical settings. This involves close collaboration with healthcare institutions, rigorous adaptation and validation of foundation models, and seamless integration of solutions into clinical workflows.
We are partnering with hospitals and medical institutes across the region to co-develop customised AI solutions that adhere to clinical standards and prioritise patient safety. Our approach centres on adapting foundation models and agentic AI systems to diverse clinical contexts and multi-modality inputs, including electronic health records, medical imaging, and genomics data, to perform critical tasks that improve diagnostic accuracy and treatment outcomes.
This initiative is built on four key pillars: Collaborative Partnerships to ensure close alignment with hospitals, medical institutions, and clinicians; Model Adaptation and Validation to rigorously test performance across modalities while maintaining ethical and regulatory compliance; Workflow Integration to embed AI seamlessly into clinical processes to support decision-making and reduce clinician burden; and Continuous Feedback and Improvement to establish adaptive learning loops with clinicians to refine models for sustained impact. By focusing on these pillars, we aim to transform medical AI research into practical tools that personalise care, enhance operational efficiency, and ultimately elevate patient outcomes.
Through these research themes, our project aims to achieve several key goals including cultivating future medical talents, fostering technological breakthroughs and accelerating translation to practical applications by developing SOTA performance models for improved patient care and publishing papers in top-tier conferences and journals.
Microsoft Research Asia StarTrack Scholars advocates an open attitude, encouraging dialogue and joint experimentation with researchers from various disciplines to discover viable solutions. Now visit our official website to know more: Microsoft Research Asia StarTrack Scholars Program – Microsoft Research
Theme Team
- Shujie Liu, Principal Researcher, Microsoft Research Asia
- Xinxing Xu, Principal Research Manager, Microsoft Research Asia
- Zilong Wang, Senior Researcher, Microsoft Research Asia
- Xinyang Jiang, Senior Researcher, Microsoft Research Asia
- Jinglu Wang, Senior Researcher, Microsoft Research Asia
- Jingjing Fu, Senior Researcher, Microsoft Research Asia
- Chang Xu, Senior Researcher, Microsoft Research Asia
If you have any questions, please email Ms. Yanxuan Wu, program manager of the Microsoft Research Asia StarTrack Scholars Program, at v-yanxuanwu@microsoft.com