Microsoft Research Blog

VITRA Redefines VLA Pre-training Paradigms via Human Video Reconstruction

May 29, 2026

When you see robots participating in running races or performing folk dances on stage, you might envision a future where a simple natural language command is all it takes for a robot to tidy up a desk, clean a room, or even serve tea. For…

Recent Posts

Filter by Research Area

VITRA Redefines VLA Pre-training Paradigms via Human Video Reconstruction

May 29, 2026

When you see robots participating in running races or performing folk dances on stage, you might envision a future where a simple natural language command is all it takes for a robot to tidy up a desk, clean a room, or even serve tea. For…
Fara1.5 – A family of frontier computer use agent models

May 21, 2026

By: Ahmed Awadallah, Sahil Gupta, Yash Lara, Yadong Lu, Hussein Mozannar, Akshay Nambi, Zach Nussbaum, Yash Pandya, Aravind Rajeswaran, Corby Rosset, Alexey Taymanov, Luiz do Valle, Vibhav Vineet, Spencer Whitehead, Andrew Zhao We are excited to introduce the Fara1.5 family of computer use agent (CUA)…
Whimsical Strategies Break AI Agents: Generating Out-of-Distribution Adversarial Strategies at Scale

May 6, 2026

By Zachary Huang, Tyler Payne, Gagan Bansal, Will Epperson, Wenyue Hua, Adam Fourney, Amanda Swearngin, Maya Murad, Ece Kamar, Saleema Amershi As AI agents are increasingly deployed to handle real transactions and negotiations, they can exhibit vulnerabilities that traditional safety testing struggles to fully capture. Our prior work on Magentic Marketplace found significant vulnerability for smaller…
Webwright: A Terminal Is All You Need For Web Agents

May 4, 2026

By Yadong Lu1, Lingrui Xu2, Chao Huang2, Ahmed Awadallah11Microsoft Research, 2The University of Hong Kong Instead of solving web tasks by predicting where to click one at a time, we only give the model a terminal where it has the full freedom to spawn browser…
Evaluating Proactive AI Mediators in Multi-Party Conversation with ProMediate

April 21, 2026

By Ziyi Liu (opens in new tab), Bahar Sarrafzadeh, Pei Zhou, Longqi Yang (opens in new tab), Ashish Sharma Imagine you are in a high-stakes group discussion, stuck in a circular argument with no consensus in sight. Now, imagine an AI agent sitting at that table. Unlike traditional tools that…
The Art of Building Verifiers for Computer Use Agents

April 21, 2026

By Corby Rosset, Pratyusha Sharma, Andrew Zhao, Miguel Gonzalez-Fernandez, Ahmed Awadallah We share lessons learned from building a best-in-class verifier for computer use agent trajectories on the web, called the Universal Verifier. False positive rates drop to near zero (vs. ≥45% for WebVoyager, ≥22% for…
Memento: Teaching LLMs to Manage Their Own Context

April 8, 2026

Vasilis Kontonis, Yuchen Zeng, Shivam Garg, Lingjiao Chen, Hao Tang, Ziyan Wang, Ahmed Awadallah, Eric Horvitz, John Langford, Dimitris Papailiopoulos We taught models to compress their own chain-of-thought mid-generation. Peak KV cache drops 2–3x, throughput nearly doubles, and the erased reasoning blocks leave traces…
Actions Speak Louder Than Prompts: Rethinking How LLMs Reason Over Graph Data

March 3, 2026

By Ben Finkelshtein (opens in new tab) (University of Oxford), Silviu Cucerzan, Sujay Kumar Jauhar, and Ryen W. White (Microsoft) Think about the last time you opened a shared document at work. Behind that simple action lies a complex network of relationships: the colleagues who edited the file before you, the team site on…
Experiential Reinforcement Learning

February 20, 2026

By Taiwei Shi, Sihao Chen, Longqi Yang, Jaime Teevan Reinforcement Learning is at the core of building and improving frontier AI models and products. Yet most state-of-the-art RL methods learn primarily from outcomes: a scalar reward signal that says whether an attempt worked, not why…
From One to Many

February 9, 2026

By Jaime Teevan, Chief Scientist & Technical Fellow In recent years we’ve all lived through the transition to cloud computing, a sudden shift to remote work, and now the rapid rise of AI. Each individually has felt like a seismic event, but in reality they…
Phi-Ground: Improving how AI agents navigate screen interfaces

January 18, 2026

Imagine an AI assistant that can navigate a computer the same way humans do—clicking buttons, filling out forms, and moving between applications—all by simply interpreting what's on the screen. This vision is becoming a reality through computer use agents—AI systems designed to operate software interfaces…
Deep Video Discovery: Using agentic search to analyze long-form video

December 19, 2025

Extracting useful information from long videos, whether meeting recordings, experimental data, or lecture content, requires painstaking manual review. AI tools offer some help: language-vision models can summarize short clips or answer questions when videos are divided into clear scenes or chapters. But for hours‑long recordings…

Explore More

Events & conferences

Meet our community of researchers, learn about exciting research topics, and grow your network
Podcasts

Ongoing conversations at the cutting edge of research
Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI

Microsoft Research Blog

Follow Microsoft Research

Subscribe to our newsletter

Recent Posts

Explore More