Microsoft Research Blog

Multimodal reinforcement learning with agentic verifier for AI agents

January 20, 2026

Argos improves multimodal RL by evaluating whether an agent’s reasoning aligns with what it observes over time. The approach reduces visual hallucinations and produces more reliable, data-efficient agents for real-world applications.

Recent Posts

Filter by Research Area

Multimodal reinforcement learning with agentic verifier for AI agents

January 20, 2026

Argos improves multimodal RL by evaluating whether an agent’s reasoning aligns with what it observes over time. The approach reduces visual hallucinations and produces more reliable, data-efficient agents for real-world applications.
OptiMind: A small language model with optimization expertise

January 15, 2026

OptiMind is a small language model that converts business operation challenges, described naturally, into mathematical formulations that optimization software can solve. It reduces formulation time & errors & enables fast, privacy-preserving local use.
Agent Lightning: Adding reinforcement learning to AI agents without code rewrites

December 11, 2025

By decoupling how agents work from how they’re trained, Agent Lightning turns each step an agent takes into data for reinforcement learning. This makes it easy for developers to improve agent performance with almost zero code changes.
Promptions helps make AI prompting more precise with dynamic UI controls

December 10, 2025

Promptions helps developers add dynamic, context-aware controls to chat interfaces so users can guide generative AI responses. It lets users shape outputs quickly without writing long instructions.
GigaTIME: Scaling tumor microenvironment modeling using virtual population generated by multimodal AI

December 9, 2025 | Hoifung Poon, Jeya Maria Jose Valanarasu, Naoto Usuyama, and Sheng Wang

Using AI-generated virtual populations, Microsoft researchers uncovered hidden cellular patterns that could reshape how we understand and treat cancer.
Reducing Privacy leaks in AI: Two approaches to contextual integrity

November 25, 2025

New research explores two ways to give AI agents stronger privacy safeguards grounded in contextual integrity. One adds lightweight, inference-time checks; the other builds contextual awareness directly into models through reasoning and RL.
Fara-7B: An Efficient Agentic Model for Computer Use

November 24, 2025

Fara-7B is our first agentic small language model for computer use. This experimental model includes robust safety measures to aid responsible deployment. Despite its size, Fara-7B holds its own against larger, more resource-intensive agentic systems.
MMCTAgent: Enabling multimodal reasoning over large video and image collections

November 12, 2025 | Akshay Nambi, Kavyansh Chourasia, and Tanuja Ganu

MMCTAgent enables dynamic multimodal reasoning with iterative planning and reflection. Built on Microsoft’s AutoGen framework, it integrates language, vision, and temporal understanding for complex tasks like long video and image analysis.
BlueCodeAgent: A blue teaming agent enabled by automated red teaming for CodeGen AI

November 11, 2025

BlueCodeAgent is an end-to-end blue-teaming framework built to boost code security using automated red-teaming processes, data, and safety rules to guide LLMs’ defensive decisions. Dynamic testing reduces false positives in vulnerability detection.
When industry knowledge meets PIKE-RAG: The innovation behind Signify’s customer service boost

November 6, 2025 | Industry Innovation Center

A collaboration between Signify and Microsoft Research shows how PIKE-RAG improves enterprise knowledge systems, delivering a 12% increase in accuracy and faster, more reliable answers.
Magentic Marketplace: an open-source simulation environment for studying agentic markets

November 5, 2025

AI agents are poised to transform digital marketplaces. To explore what can happen when AI agents interact and transact at scale, we built Magentic Marketplace, an open-source simulation environment for studying agentic market designs.
RedCodeAgent: Automatic red-teaming agent against diverse code agents

November 4, 2025

Code agents help streamline software development workflows, but may also introduce critical security risks. Learn how RedCodeAgent automates and improves “red-teaming” attack simulations to help uncover real-world threats that other methods overlook.

Explore More

Events & conferences

Meet our community of researchers, learn about exciting research topics, and grow your network
Podcasts

Ongoing conversations at the cutting edge of research
Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI

Microsoft Research Blog

Follow Microsoft Research

Subscribe to our newsletter

Recent Posts

Explore More