Scalable emulation of protein equilibrium ensembles with BioEmu

  • Maya Murad, Microsoft; Frank Noé, Microsoft

Frank Noé, Partner Research Manager at Microsoft Research AI for Science, shares a major milestone in biomolecular simulation: BioEmu, an emulator that predicts protein shape changes and stabilities with near-experimental accuracy, running up to 100,000x faster than traditional simulations. This opens new possibilities for drug discovery and molecular biology.

Explore more

Scalable emulation of protein equilibrium ensembles with generative deep learning (opens in new tab)
Science | July 2025 

BioEmu on GitHub (opens in new tab) | Azure AI Foundry (opens in new tab) | ColabFold (opens in new tab)

Transcript

Scalable emulation of protein equilibrium ensembles with BioEmu

[MUSIC] 

MAYA MURAD: One of the most exciting frontiers for AI is in science, where it’s starting to accelerate discovery itself. Take proteins. Understanding how they move and change shape is essential for drug discovery, but traditionally, simulating those dynamics is a slow and incredibly compute-intensive process.  

To share a breakthrough in this space, we’ll hear from Frank Noé, partner research manager in Microsoft Research AI for Science, located in Berlin. Frank is here to share BioEmu, a biomolecular emulator that can model proteins with near experimental accuracy at speeds 100,000 times faster than traditional simulation.  

Let’s check it out. 

[MUSIC] 

FRANK NOÉ: Hello, my name is Frank Noé. I’m a partner research manager at Microsoft Research AI for Science, and at AI for Science, we firmly believe that deep learning will lead to breakthroughs in scientific discovery in the next years, and therefore, we are developing deep learning models for the sciences to pursue this mission. Today, I’ll introduce you to BioEmu, a biomolecular emulator, which is one of our recent models towards this goal. 

Our bodies are built up of tissues and cells, and on the size of a nanometer—that’s a billionth of a meter—you’ll find biomolecules, such as DNA and proteins, and these are really nanomachines that make life work.  

We characterize the study of proteins and other biomolecules in three aspects: sequence, structure, and function. The Human Genome Project gave us the ability to sequence the DNA. DNA has segments called genes, and each gene can be translated and transcribed into a chain of amino acids. This is a protein. 

Depending on the amino acid code, the protein will fold up into three-dimensional structures. But experimentally determining those 3D structures is very time consuming and difficult. With the AlphaFold breakthrough, these protein structures can now be accurately predicted. So we have scalable ways to determine protein sequence and structure. But understanding how they work—understanding their function—that still remains a challenge.  

What is protein function, and how does it relate to structure? Let’s look at an example. 

This is actin, a protein that plays a key role in the formation of muscle fibers. Like most proteins, actin does not have a single structure. It can open and close. It prefers to be in the closed state when a small molecule called ATP is bound to it. Actins with ATP are shown in blue. When actin is closed, it also likes to bind to other actins and form filaments. These filaments are a building block of our muscles. In the filament, a reaction can be triggered that converts ATP into ADP. Actins with ADP are shown in green. Actins with ADP prefer to be open, so they like less to be bound and dissociate from the filament, where they can open. ADP can be exchanged with ATP. Actin closes and binds again. The cycle repeats. 

This example shows that the biological function of proteins depends on their ability to change conformations and the fact that different conformations change the way proteins bind to other proteins. So these protein conformations and the transitions between them can be revealed with experimental measurements and with simulations, molecular dynamics simulations. But these techniques are very time consuming and expensive. 

For example, this is a tiny protein simulated with molecular dynamics for one-millionth of a second, a microsecond. This simulation took two entire days on a modern GPU, and it shows very little motion. Only when simulating much longer, like milliseconds, you will see functionally relevant events such as this. This is protein unfolding. Also protein folding, binding, conformational changes. These are all rare events that take much longer to simulate. But this takes years of compute time, making this approach really unpractical to scale it to many proteins.  

At Microsoft Research AI for Science, we develop AI emulators. These are deep learning models that behave like simulators, but they are much, much faster. BioEmu is a biomolecular emulator. It’s been trained with a huge set of high-quality data, including the AlphaFold database, which contains AlphaFold’s structure predictions for over 100 million proteins on a vast set of molecular dynamics simulations with 200 milliseconds length total and on half a million experimental measurements of protein stabilities.  

At inference time, we query BioEmu with a protein sequence. BioEmu will then generate a large ensemble of protein structures from which various properties of the protein can be computed. And these property predictions are very cheap and fast compared to MD simulation or experiments. Therefore, BioEmu amortizes the upfront cost of generating its training data. 

We’ve tested BioEmu in three different ways. First of all, BioEmu predicts experimentally known conformational changes such as this one. So this is a receptor domain, and you’ll see an interpolation of BioEmu samples between two known experimental structures that are shown in gray. BioEmu predicts large-scale domain motions such as this one, also local unfolding transitions and the formation of hidden binding pockets, which are relevant for drug molecules, and does so with high probability. 

We’ve also asked the question if BioEmu can quantitatively emulate MD simulations, these very expensive simulations. So if you run simulations like this one for several GPU years, you can compute a statistic from it that is called an energy landscape. These energy landscapes are basically maps, and each point of the map corresponds to a set of similar protein structures, and the color will tell you how much probability is in each point so how probable it is to find the protein at these structures. Now, BioEmu will sample these protein structures very quickly, and in less than one GPU hour, you can compute a very similar energy landscape. So it emulates molecular dynamics simulation, in this case, 100,000 times faster. 

BioEmu also predicts protein stabilities. Protein stability is basically the probability to find the protein in its folded state versus its unfolded state. And this is something that can be experimentally quite well measured. So we can compare BioEmu’s predictions with experiments, and we find that we can predict the folding stability with less than one kilocalorie per mole error. This is called experimental accuracy, and by analyzing the ensemble of structures that we are generating, we can also ask the question whether a particular change in the protein sequence, such as a mutation, leads to destabilization or stabilization and why.  

Many experts agree that after the protein sequence and structure revolutions, understanding protein dynamics and function is the next frontier and is key to understanding how biology works, how diseases work, and how to develop drugs to treat them. Also, many experts agree that current tools such as MD simulations or experiments are not scalable enough to get us there. BioEmu is one step in this direction.  

It helps us to understand how single proteins change their shape and how stable they are, and these are key aspects of protein functions. More work is needed to integrate how these protein dynamics interact with binding of other proteins and drug molecules and how this leads them to biological function. We think this is really important work that can help us advance science and medicine, and we’re very excited that Microsoft Research gives us the resources to be able to do it.