AI for Science: Learning the language of nature

At Microsoft, we believe that the ability of generative AI to learn the language of humans is equally matched by its ability to learn the language of nature.


Health

Diagnosis and treatment

Generative AI has revolutionized machines’ ability to understand human language and images, particularly in medicine, showing promise for improving patient outcomes and clinician experience.

Virchow

Microsoft Research, in collaboration with Paige (opens in new tab), a global leader in clinical AI applications for cancer, is advancing the state-of-the-art in computational foundation models. The first contribution of this collaboration is a model named Virchow. Virchow serves as a significant proof point for foundation models in pathology, as it demonstrates how a single model can be useful in detecting both common and rare cancers, fulfilling the promise of generalizable representations.

GigaPath

GigaPath (opens in new tab) is a novel vision transformer that attains whole-slide modeling by leveraging dilated self-attention to keep computation tractable. In joint work with Providence Health System and the University of Washington, we have developed Prov-GigaPath, an open-access whole-slide pathology foundation model pretrained on more than one billion 256 X 256 pathology images tiles in more than 170,000 whole slides from real-world data at Providence.  All computation was conducted within Providence’s private tenant, approved by Providence Institutional Review Board (IRB).

BiomedParse

BiomedParse is a new approach for holistic image analysis by treating object as the first-class citizen. By unifying object recognition, detection, and segmentation into a single framework, BiomedParse allows users to specify what they’re looking for through a simple, natural-language prompt. The result is a more cohesive, intelligent way of analyzing medical images that supports faster, more integrated clinical insights.

MAIRA

Project MAIRA is a project from Microsoft Research that builds innovative, multimodal AI technology to assist radiologists in delivering effective patient care and to empower them in their work. The goal of the project is to leverage rich healthcare data – including medical domain knowledge, temporal sequences of medical images and corresponding radiology reports, and other clinical context information – as inputs to developing multimodal frontier models that can be scaled and fine-tuned to many different radiology applications.

By leveraging the Mayo Clinic (opens in new tab)‘s medical expertise and Microsoft Research’s AI advancements, including the multimodal foundation model MAIRA-2 and the recently published RAD-DINO encoder in Nature Machine Intelligence, we aim to explore and unlock new frontiers in radiology.

Precision health

Precision health focuses on delivering the right treatment to the right patient at the right time. To achieve this, we need to learn from people to treat the patient, ensuring personalized and effective care.

Multidisciplinary Tumor Board

Medicine today is imprecise. Cancer is the poster child of this challenge, where often the majority of patients don’t respond to their treatments. Multidisciplinary tumor board is key to advancing precision oncology by assimilating diverse expertise such as radiology, pathology, genomics to identify precision treatment options. However, first-gen tumor boards are operating manually and hard to scale. E.g., when standard of care fails, the last hope lies in clinical trials. But triaging a single patient could take hours. Consequently, only 3% of US cancer patients were able to find a matching trial, whereas 40% of cancer trial failures stem from insufficient enrollment.

GenAI could help scale “universal abstraction” to structure all medical data for patients and trials, thus facilitating just-in-time clinical trial matching and democratizing tumor board. Microsoft Health Futures has pioneered this frontier exploration in close collaboration with large health systems and life sciences companies. A recent joint study (opens in new tab) by Providence, Microsoft, and Illumina shows that with AI powering Providence’s tumor board for genomics interpretation and clinical trial matching, Providence researchers were able to identify actionable biomarkers for 67% of late-stage cancer patients in the study, leading to precision treatment for 52% of patients and 47% increase in overall survival. Progress along this direction also opens new possibilities in rapidly assessing how therapies are working in the wild and identifying which subpopulations benefit the most from different interventions, with myriad applications such as clinical trial design and simulation for unlocking population-scale real-world evidence.

Collaboration across industry


Discovery

Drug discovery

AI is showing promise to drastically speed up drug discovery, making previously undruggable targets druggable and vastly improving our efficiency in addressing new diseases, combating drug resistance, and advancing medical knowledge.

AI2BMD

AI2BMD, short for “AI-powered ab-initio bio-molecular dynamics,” is a groundbreaking AI framework developed by Microsoft Research AI for Science. It leverages generative AI to simulate protein movements with unprecedented accuracy and speed, revolutionizing the field of drug discovery and protein design.

TamGen

TamGen, short for “target-aware molecule generation,” is a state-of-the-art AI framework designed to accelerate drug design by overcoming the limitations of traditional methods. Developed by Microsoft Research, TamGen leverages advanced AI techniques to predict and generate novel drug molecules with significantly improved binding affinities.

In collaboration with GHDDI, TamGen has successfully generated small molecule inhibitors for Mycobacterium tuberculosis. Notably, one molecule was 125 times more effective at inhibiting the TB Clp protease compared to the starting molecule. TamGen has also been used to design novel compounds targeting SARS-CoV-2. These compounds feature unique structures compared to existing ones and exhibit an eightfold improvement in bioactivity.

Materials discovery

New technologies are being developed to accelerate materials discovery and design, making it thousands of times faster, paving the way for new materials with desired properties in weeks rather than years.

MatterGen

MatterGen is a diffusion model specifically designed for generating materials. Crucially, the model is able to generate materials satisfying a broad range of design requirements, such as target chemistry, symmetry, and properties. 

MatterGen reaches state-of-the-art performance in the de-novo generation of novel materials, and outperforms traditional computational methods such as screening.

Additionally, thanks to MatterGen, researchers were for the first time able to experimentally synthesize a novel material proposed by a generative model, observed to have target properties within 20% of design constraints — quite close from an experimental point of view.

The code is available on Github (opens in new tab), and coming to Azure AI Foundry soon.

MatterSim

Microsoft Research developed MatterSim, a deep-learning model for accurate and efficient materials simulation and property prediction over a broad range of elements, temperatures, and pressures to enable the in silico materials design. MatterSim employs deep learning to understand atomic interactions from the very fundamental principles of quantum mechanics, across a comprehensive spectrum of elements and conditions—from 0 to 5,000 Kelvin (K), and from standard atmospheric pressure to 10,000,000 atmospheres. In our experiment, MatterSim efficiently handles simulations for a variety of materials, including metals, oxides, sulfides, halides, and their various states such as crystals, amorphous solids, and liquids. Additionally, it offers customization options for intricate prediction tasks by incorporating user-provided data.


Earth

Atmosphere prediction

We are advancing Earth system modelling with a single AI model that can predict not only weather but also tropical cyclones, air pollution, and ocean waves.

Aurora

Aurora is a pioneering AI model developed by Microsoft Research, designed to revolutionise environmental prediction such as weather forecasting. By leveraging deep-learning-based AI similar to large language and vision models, Aurora provides highly accurate and efficient predictions of atmospheric and oceanic conditions, outperforming traditional models in both speed and accuracy.

ECMWF is the European Centre for Medium Range Weather forecasting. They run and are responsible for designing a model called IFS which is widely considered to be the best traditional forecasting system. More recently they have created AIFS, an AI approach. They also evaluate in-house the leading AI models such as Aurora, AIFS, GraphCast, etc. In the course of this, they found that Aurora was outperforming all the others on key metrics.