Microsoft Research Blog

English

Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models

March 12, 2026

Cross-entropy (CE) training provides dense and scalable supervision for language models, but it optimizes next-token prediction under teacher forcing rather than sequence-level behavior under model rollouts. We introduce a feature-matching objective for language-model fine-tuning that targets sequence-level statistics of the completion distribution, providing dense semantic…
FlashMotion: Few-Step Controllable Video Generation with Trajectory Guidance

March 12, 2026

Recent advances in trajectory-controllable video generation have achieved remarkable progress. Previous methods mainly use adapter-based architectures for precise motion control along predefined trajectories. However, all these methods rely on a multi-step denoising process, leading to substantial time redundancy and computational overhead. While existing video distillation…
Does LLM Alignment Really Need Diversity? An Empirical Study of Adapting RLVR Methods for Moral Reasoning

March 11, 2026

Reinforcement learning with verifiable rewards (RLVR) has achieved remarkable success in logical reasoning tasks, yet whether large language model (LLM) alignment requires fundamentally different approaches remains unclear. Given the apparent tolerance for multiple valid responses in moral reasoning, a natural hypothesis is that alignment tasks…
PlugMem: Transforming raw agent interactions into reusable knowledge

March 10, 2026

It seems counterintuitive: giving AI agents more memory can make them less effective. As interaction logs accumulate, they grow large, fill with irrelevant content, and become increasingly difficult to use. More memory means that agents must search through larger volumes of past interactions to find information…
How people use Copilot for Health

March 10, 2026

We analyze over 500,000 de-identified health-related conversations with Microsoft Copilot from January 2026 to characterize what people ask conversational AI about health. We develop a hierarchical intent taxonomy of 12 primary categories using privacy-preserving LLM-based classification validated against expert human annotation, and apply LLM-driven topic-clustering…
Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor

March 10, 2026 | Xiaoying Wang, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri

Index tuning is critical for the performance of modern database systems. Industrial index tuners, such as the Database Tuning Advisor (DTA) developed for Microsoft SQL Server, rely on the"what-if"API provided by the query optimizer to estimate the cost of a query given an index configuration,…
Social-R1: Towards Human-like Social Reasoning in LLMs

March 10, 2026

While large language models demonstrate remarkable capabilities across numerous domains, social intelligence - the capacity to perceive social cues, infer mental states, and generate appropriate responses - remains a critical challenge, particularly for enabling effective human-AI collaboration and developing AI that truly serves human needs.…
SynthCraft: an AI partner for synthetic data generation to support data access and augmentation in healthcare

March 9, 2026

Access to high-quality data provides the foundation for biomedical research. But data access is often limited or challenging due to privacy constraints, whilst the data themselves may be unrepresentative or sparse. Synthetic data can support both privacy-preserving data access and advanced analytical workflows, including data…
Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference

March 9, 2026

Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost tradeoffs. In this paper, we introduce a route to rigorously study such approaches using the lens of…
StreamReady: Learning What to Answer and When in Long Streaming Videos

March 9, 2026 | Shehreen Azad, Vibhav Vineet, and Y. Rawat

Streaming video understanding often involves time-sensitive scenarios where models need to answer exactly when the supporting visual evidence appears: answering before the evidence reflects speculation, answering after it has passed reduces real-time utility. To capture this behavior, we introduce a readiness-aware formulation of streaming video…
Scaling Data Difficulty: Improving Coding Models via Reinforcement Learning on Fresh and Challenging Problems

March 8, 2026

Training next-generation code generation models requires high-quality datasets, yet existing datasets face difficulty imbalance, format inconsistency, and data quality problems. We address these challenges through systematic data processing and difficulty scaling. We introduce a four-stage Data Processing Framework encompassing collection, processing, filtering, and verification, incorporating…
Probabilistic Inference and Learning with Stein’s Method

March 8, 2026 | Qiang Liu, Lester Mackey, and C. Oates

This monograph provides a rigorous overview of theoretical and methodological aspects of probabilistic inference and learning with Stein's method. Recipes are provided for constructing Stein discrepancies from Stein operators and Stein sets, and properties of these discrepancies such as computability, separation, convergence detection, and convergence…

No results