Microsoft Research Blog

Artificial intelligence

  1. Improving Language Agents Through BREW 

    September 29, 2025

    Large Language Model (LLM)-based agents are increasingly applied to tasks requiring structured reasoning, tool use, and environmental adaptation, such as data manipulation, multistep planning, and computer-use automation. However, despite their versatility, current training paradigms for model weight optimization methods, like PPO and GRPO, remain relatively…

  2. STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with Feedback 

    September 22, 2025

    Large Language Models (LLMs) are increasingly used for complex software engineering tasks but often generate incorrect or outdated code. Retrieval-Augmented Generation systems attempt to solve this by using external knowledge bases (KB) like API documentation, but in the fast-paced world of software development, this documentation…

  3. Learning from other domains to advance AI evaluation and testing 

    August 11, 2025 | Office of Responsible AI

    Drawing on our analysis of eight case studies prepared by independent academic and industry experts, this white paper proposes next steps to address AI evaluation and testing challenges and opportunities by: Synthesizing insights from the eight case studies, also published separately, and extracting lessons relevant…

  4. Closed-loop optimization using machine learning for the accelerated design of sustainable cements incorporating algal biomatter 

    July 7, 2025 | Meng-Yen Lin, Kristen Severson, Paul Grandgeorge, and Eleftheria Roumeli

    The substantial embodied carbon of cement, coupled with the ever-increasing need for construction materials, motivates the need for more sustainable cementitious materials. An emerging strategy to mitigate CO2 emissions involves incorporating carbon-negative biomatter; however, this introduces new challenges due to complex hydration-strength relationships and the combinatorial…

  5. Scaling Textual Gradients via Sampling-Based Momentum 

    June 1, 2025

    As prompts play an increasingly critical role in large language models (LLMs), optimizing textual prompts has become a crucial challenge. The Textual Gradient Descent (TGD) framework has emerged as a promising data-driven approach that iteratively refines textual prompts using LLM - suggested updates (or textual…