Agent Lightning: One learning system that makes all agents evolve

March 3, 2026
Anson Ho, Microsoft; Luna K. Qiu, Microsoft
Microsoft Research Forum | Season 2, Episode 3

Agent Lightning is an agent optimization framework that enables agents to learn from their experiences through reinforcement learning and other methods. By treating agents as first-class citizens, optimization becomes automatic for any agent with minimal code changes.

Explore more

All Research Forum sessions

Transcript

Agent Lightning: One learning system that makes all agents evolve

Learning from experience is something humans do naturally, but for AI agents, it’s often bolted on as an afterthought. Joining us from MSR Asia in Shanghai, Luna will introduce Agent Lightning, an open source framework that makes learning a first class capability for agents.

With minimal code changes. Agents can improve over time using reinforcement learning and related methods. This project has taken off quickly in the community and is a great example of MSR research scaling through open source and shaping how people build a genetic systems in practice. Luna, over to you.

Hello from Shanghai. I’m Luna Qiu, technical program manager of Microsoft Research Asia. And today let me introduce Agent Lightning, one learning system that makes all agents evolve. Our vision is straightforward. We want to build one system for every person and every organization to be able to evolve their own agents using their unique experience data and for individuals.

This means agents that genuinely understand your preferences, your needs. Through continuous learning. And for enterprises, that means distilling your specific workflows, your customers, your edge cases, into an intelligence layer. And that can be a compounding mode that competitors cannot replicate.

Agents exist on a wide spectrum. They vary in domains, implementation approaches, complexity, and more, while they all generate experience data. Turning that data into improvements can be extremely difficult. That’s why at Microsoft Research, we open source agent lightning. To solve this, it connects any agent with any optimization method, making learning from experience practical with almost no code change.

We achieve this compatibility by going back to first principles, specifically Reinforcement Learning 101. Every RL setup can be formulated as a partially observable Markov decision process, where you can call it a POMDP. The RL agent takes actions, receive state and rewards from the RL environment without knowing all the details about how the environment works, and that is exactly how Agent Lightning goes.

It abstracts such a POMDP from all AI executions. It treats only the part that needs to learn as scenario agent. Everything else becomes the RL environment. This design hides all the differences between AI agents inside the RL environment, so it is possible to optimize any agent using any learning method.

And there are three key benefits. One: a unified data pipeline that captures all agent activities using observability tools. It is non-intrusive to the original agent code. Two: algorithm flexibility that allows teams to plug in different methods without rebuilding everything. They can choose, from classical RL algorithms to novel methods that are tailored for agents.

This is important because the field is changing very fast and what the best approach is varies by use case. Three and finally, infrastructure disaggregation that separates agents and optimizers cleanly, providing modular components and a clear interface allowing independent scaling of each part as you need.

Here are some use cases from us and from the community showing how diverse the applications are. Let’s start with the classic multi-agent customer service setting. So for this task, we used Agent Lightning to train a very small model. In just eight epochs, the agent with a 1.5 bit model achieves comparable performance to an agent using GPT 4 series, and we also enable a new algorithm called EMPO2.

It is the first RL algorithm that can train agents with memory that significantly improves exploration with new out of distribution environments, and optimization is not limited to model parameters. One team at Microsoft is building a specialized coding agent for writing formal verification in rust, and with Agent Lightning prompt tuning at less than $6 per task, they improved the average success rate by over ten points.

And multimodal robotic agents also use it need to automatically tune their prompts. This improves the reasoning action coordination, doubling task success rate and significantly reduce completion time.

The community response has been extraordinary since open sourcing six months ago, Agent Lightning has earned more than 14,000 GitHub stars, ranking among Microsoft’s top 50 most starred projects. It was featured as the number one project on GitHub and the trending research paper on Hugging Face.

This work reflects what we aspire to do at Microsoft Research. We tackle fundamental problems with open tools that serve the community and push the frontiers of what’s possible. Our goal remains simple ,build one learning system that makes all agents evil, empowering every person and every organization to build intelligence that are truly their own.

Agent lightning is fully open source, so we welcome you to contribute and help shape what comes next. That’s all I’m going to talk today. Thank you.